The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

본문 미리보기

arXiv:2606.28710v1 Announce Type: new Abstract: We ask under what conditions an agent with a harm-minimizing policy can displace an approval-seeking (RLHF) agent in a competitive market, and when that policy is sufficient to prevent community harm. We use evolutionary game theory (finite-population Moran-Fermi pairwise comparison) to formalize this subject to assumptions of wisher hindsight, peer testimony, a monotone harm ledger, sufficient information density of community feedback, and a fini

The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

본문 미리보기

관련 글

What Drives Interactive Improvement from Feedback?

Contrastive Reflection for Iterative Prompt Optimization

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

BayesBench: Evaluating LLM Belief Trajectories Under Multi-Turn Evidence Accumulation