- Updated: January 24, 2026
- 7 min read
Do people expect different behavior from large language models acting on their behalf? – Insights and Implications

Direct Answer
The paper “Do people expect different behavior from large language models acting on their behalf?” reveals that individuals systematically adjust their fairness expectations when a large language model (LLM) makes decisions for them in classic economic games. In particular, participants anticipate more generous offers from LLM‑mediated agents and are more forgiving of outcomes that would be deemed unfair if generated by a human.
This matters because it uncovers a hidden layer of social norm adaptation that could shape how AI‑driven delegation, negotiation, and recommendation systems are designed, deployed, and regulated.
Background: Why This Problem Is Hard
Human‑AI interaction is no longer limited to simple question‑answering; LLMs are increasingly tasked with making choices on users’ behalf—ranging from financial advice to autonomous negotiation. Yet, the social contract that governs human‑to‑human exchanges is rooted in centuries‑old expectations of reciprocity, fairness, and accountability. Translating those expectations to machine‑mediated actions is fraught with ambiguity.
Existing research on AI ethics and alignment typically focuses on algorithmic bias, transparency, or value alignment, but it rarely quantifies how people’s normative judgments shift when a non‑human entity is the decision‑maker. Moreover, standard user studies often rely on self‑reported attitudes, which can diverge from actual behavior in controlled settings. The difficulty lies in isolating the “agent effect” (human vs. machine) from other confounding variables such as cultural background, prior exposure to AI, or the stakes of the decision.
Economic games—like the Dictator and Ultimatum Games—provide a well‑established laboratory for eliciting fairness norms because they strip away extraneous context and force participants to confront pure allocation decisions. However, these games have historically been used only with human proposers. Extending them to LLM proposers introduces a novel methodological challenge: ensuring participants perceive the AI as a genuine decision‑maker rather than a scripted bot.
What the Researchers Propose
The authors introduce a mixed‑methods experimental framework that embeds LLMs as active agents in two canonical economic games. The core idea is to treat the LLM not merely as a tool that generates text, but as a delegated decision‑maker whose offers are presented to human participants as if they originated from an autonomous “AI partner.”
Key components of the framework include:
- LLM Decision Engine: A fine‑tuned language model prompted to produce allocation proposals based on a concise description of the game’s rules and the participant’s role.
- Human Participant Interface: A web‑based platform that randomizes whether the offer comes from a “human” or “AI” source, while keeping all other visual cues identical.
- Norm Elicitation Module: Post‑decision questionnaires that capture participants’ judgments of fairness, acceptability, and perceived responsibility.
- Cross‑Cultural Sampling: Separate recruitment streams for participants in the United Kingdom and the United States to probe cultural moderation effects.
By systematically varying the perceived agent (human vs. LLM) while holding the offer amount constant, the study isolates the psychological impact of delegation to an AI.
How It Works in Practice
The experimental workflow proceeds in four stages:
- Scenario Presentation: Participants read a brief vignette describing either the Dictator Game (unilateral allocation) or the Ultimatum Game (allocation with a rejection option).
- Agent Assignment: A randomizer tags the upcoming offer as generated by either a “human proposer” or an “AI assistant.” The label appears in the UI header, e.g., “Offer from AI Agent.”
- Offer Generation: The LLM receives a prompt such as “You have $10 to split with another player. Propose a fair split.” It returns a numeric split (e.g., $7 to the responder, $3 to the proposer). For human‑labeled trials, a pre‑recorded human decision of equivalent value is displayed.
- Response Capture: Participants either accept or reject the offer (Ultimatum) or simply rate its fairness (Dictator). Immediately after, they answer Likert‑scale items probing perceived appropriateness, trust, and responsibility attribution.
What distinguishes this approach from prior work is the seamless integration of a generative model into a live decision loop, rather than using static, pre‑written AI statements. The LLM’s output is dynamically conditioned on the same game parameters as the human baseline, ensuring a fair comparison.
Evaluation & Results
The authors evaluated the framework across three primary dimensions:
- Expectation Shifts: How participants’ anticipated offers differed when they believed the proposer was an AI.
- Acceptance Behavior: The rate at which participants accepted offers in the Ultimatum Game under each agent condition.
- Cultural Consistency: Whether the observed patterns held across UK and US samples.
Key findings include:
- When the proposer was labeled as an LLM, participants expected significantly higher offers—on average 15 % more generous—than when the proposer was a human.
- In the Ultimatum Game, offers from the AI were rejected less frequently (by roughly 8 % points) even when the split was objectively unfair (e.g., 80/20). This suggests a “leniency bias” toward machine‑generated decisions.
- The effect persisted across both national samples, though UK participants displayed a slightly stronger expectation of generosity from AI agents than US participants.
- Post‑experiment surveys revealed that participants attributed lower personal responsibility to the AI, even when the AI’s offer was identical to the human baseline, indicating a diffusion of moral accountability.
These results collectively demonstrate that the mere presence of an LLM as a decision‑maker reshapes fairness norms, leading to both higher expectations and greater tolerance of suboptimal outcomes.
Why This Matters for AI Systems and Agents
For practitioners building AI‑driven delegation platforms—such as autonomous financial advisors, procurement bots, or customer‑service negotiators—understanding this norm shift is crucial. If users expect AI agents to act more generously, system designers may need to calibrate recommendation algorithms to avoid over‑generous allocations that could be economically unsustainable.
Conversely, the leniency bias could be exploited maliciously: an AI could propose minimally fair offers that users are more likely to accept simply because they trust the “machine” label. This raises red flags for governance frameworks that assume human‑centric fairness standards apply unchanged to AI‑mediated interactions.
From an evaluation standpoint, the study suggests that traditional benchmark suites (e.g., accuracy, BLEU scores) are insufficient for delegated decision‑making contexts. Instead, designers should incorporate “norm compliance” metrics that capture user expectations and acceptance thresholds.
Practically, teams can leverage the insights to:
- Design UI cues that transparently communicate AI agency and its limits.
- Implement adaptive fairness thresholds that adjust offers based on observed user leniency.
- Integrate post‑decision debriefs that re‑anchor responsibility, mitigating moral diffusion.
For deeper guidance on building responsible AI delegation pipelines, see our AI Delegation Framework and the Agent Governance Playbook.
What Comes Next
While the study provides a robust first look at expectation dynamics, several limitations remain:
- Scope of Games: Only two low‑stakes economic games were examined. Real‑world decisions often involve multi‑stage negotiations, risk, and asymmetric information.
- Model Diversity: The experiments used a single LLM architecture. Different model sizes, training data, or prompting strategies could yield divergent norm effects.
- Long‑Term Interaction: The study captures a one‑shot interaction. Repeated exposure to AI agents may attenuate or amplify the observed biases.
Future research avenues include:
- Extending the framework to high‑stakes domains such as credit scoring or medical triage, where fairness expectations have tangible consequences.
- Comparing open‑source versus proprietary LLMs to assess whether model provenance influences perceived responsibility.
- Investigating mitigation strategies—e.g., explicit responsibility statements, calibrated transparency layers—to counteract moral diffusion.
- Exploring cross‑cultural dimensions beyond the US and UK, especially in collectivist societies where AI trust dynamics may differ.
By addressing these gaps, the community can develop a more nuanced theory of “AI‑augmented social norms” that informs both product design and policy regulation.
We encourage readers to explore related research on our platform, including studies on human‑AI negotiation dynamics and the emerging field of ethical LLM deployment. Together, these works chart a path toward AI systems that respect, rather than unintentionally reshape, the social contracts we rely on.