- Updated: June 28, 2026
- 6 min read
Confident but Conflicted: Internal Uncertainty and Cognitive Dissonance Resolution in LLMs
Direct Answer
The paper Confident but Conflicted introduces Trust Elasticity (TE), an econometrics‑inspired metric that quantifies how readily a large language model (LLM) yields to contradictory evidence. By linking TE to internal uncertainty signals, the authors reveal a measurable pathway for steering LLMs away from over‑confidence and toward more reliable decision‑making.
Background: Why This Problem Is Hard
LLMs are increasingly deployed as autonomous agents, search assistants, and medical advisors. In real‑world deployments they often encounter information that clashes with their prior predictions—whether through user pushback, retrieved documents, or live web searches. Existing safety‑orchestration layers treat such clashes as binary “accept or reject” events, ignoring the nuanced internal state of the model. Moreover, conventional confidence scores are poorly calibrated: a model can appear certain while internally wavering, or vice‑versa. This mismatch hampers two critical workflows:
- Trust calibration: downstream systems cannot reliably decide when to defer to a human or another model.
- Iterative persuasion: agents that must negotiate or persuade (e.g., health‑chatbots) lack a principled way to gauge how much evidence is needed to shift a model’s stance.
Consequently, developers resort to ad‑hoc prompting tricks, which are brittle and difficult to evaluate at scale. A systematic, quantifiable bridge between external persuasion attempts and the model’s internal uncertainty has been missing—until now.
What the Researchers Propose
The authors frame the interaction between an LLM and conflicting evidence as a form of cognitive dissonance resolution. Their solution consists of three intertwined components:
- Persuasion dimensions: two orthogonal axes—source authority (e.g., a peer‑reviewed journal vs. a random blog) and evidence quality (high‑precision data vs. noisy anecdotes).
- Trust Elasticity (TE): a scalar derived from the model’s probability shift when exposed to a persuasive stimulus. TE captures the “elastic” stretch of trust: high TE means the model is easily swayed; near‑zero TE indicates immunity.
- Internal uncertainty indicators: model‑specific signals such as Confidence Miscalibration (observed in Qwen) and Internal Uncertainty Change (observed in Llama). These metrics serve as proxies for the hidden belief state that drives TE.
By aligning TE with these internal signals, the framework offers a unified lens to compare disparate LLMs on a common “persuadability” scale.
How It Works in Practice
The workflow can be visualized as a four‑step loop:
- Claim generation: the LLM produces an initial answer to a health‑science query (e.g., “Vitamin D prevents COVID‑19”).
- Conflict injection: a second module supplies contradictory evidence, varying along the authority and quality axes.
- Response measurement: the model re‑evaluates the claim, and the system records the probability shift for the original answer.
- TE computation & uncertainty mapping: the shift is normalized to produce TE; simultaneously, the model’s internal uncertainty indicator is logged.
What distinguishes this pipeline from prior prompting studies is the explicit separation of external persuasion (the “stimulus”) from internal state (the “response”). The TE calculation treats the model as a semi‑elastic membrane: a gentle push (low‑authority, low‑quality evidence) yields a small deformation, while a strong push (high‑authority, high‑quality evidence) produces a larger bend—unless the membrane is intrinsically stiff (low internal uncertainty).
Evaluation & Results
The authors evaluated four commercial LLMs (including two open‑weight variants) across twelve health‑science statements that span a spectrum of epistemic certainty—from well‑established facts to speculative claims. Each statement was paired with eight persuasion scenarios (2 authority levels × 4 evidence qualities), yielding 384 interaction points per model.
Key observations:
- Model‑level TE variance: Open‑weight Llama exhibited the highest average TE (≈0.42), indicating a relatively pliable trust profile, whereas a closed‑source counterpart showed a muted TE (≈0.12), reflecting stronger resistance to persuasion.
- False‑claim immunity: For statements that are demonstrably false (e.g., “homeopathy cures cancer”), all models converged to near‑zero TE, suggesting that the framework correctly identifies unpersuadable misinformation.
- Correlation with internal signals: In Qwen, higher Confidence Miscalibration scores aligned with larger TE values (Pearson r ≈ 0.68). In Llama, the magnitude of Internal Uncertainty Change predicted TE with a similar strength (r ≈ 0.71).
These results demonstrate that TE is not merely a behavioral artifact; it reflects measurable fluctuations in the model’s latent belief distribution. The diagram below visualizes a typical TE curve, where the x‑axis represents evidence strength and the y‑axis shows the resulting probability shift.

Overall, the experiments validate TE as a cross‑model, interpretable metric that bridges external persuasion attempts with internal uncertainty dynamics.
Why This Matters for AI Systems and Agents
Understanding and quantifying how an LLM reacts to contradictory information unlocks several practical levers for AI product teams:
- Dynamic trust gating: Agents can monitor TE in real time and decide when to hand off a conversation to a human expert, reducing the risk of confidently delivering misinformation.
- Targeted fine‑tuning: By identifying low‑elasticity regions, developers can focus data augmentation or reinforcement learning on specific knowledge gaps, making the model both more accurate and more adaptable.
- Orchestration of multi‑model pipelines: In a system that routes queries to several specialist LLMs, TE can serve as a scoring function to select the most persuadable model for a given evidence set.
- Regulatory compliance: For health‑care or finance applications, demonstrating that a model’s confidence is calibrated against an internal uncertainty metric can satisfy audit requirements.
These capabilities map directly onto the UBOS platform overview, where developers can embed TE monitoring into the Workflow automation studio and trigger custom actions—such as escalating to a domain expert or invoking a secondary verification model.
What Comes Next
While the study establishes a solid foundation, several open challenges remain:
- Generalization beyond health claims: Extending TE measurement to legal, financial, or technical domains will test the metric’s universality.
- Real‑time uncertainty estimation: Current internal signals are extracted post‑hoc; integrating them into the forward pass could enable on‑the‑fly trust adjustments.
- Intervention design: Future work could explore how to deliberately modulate internal uncertainty (e.g., via dropout or temperature scaling) to achieve desired TE profiles.
- Human‑in‑the‑loop studies: Measuring how end‑users perceive TE‑driven system behavior will inform UI/UX guidelines for trustworthy AI assistants.
Practitioners interested in experimenting with these ideas can start by leveraging the OpenAI ChatGPT integration to prototype TE‑aware prompts, or combine the Chroma DB integration for efficient storage of evidence vectors. As the community refines internal uncertainty diagnostics, Trust Elasticity could become a standard KPI for responsible LLM deployment.
For a deeper dive into the methodology and to explore additional research artifacts, visit the About UBOS page and join the conversation on emerging AI trust metrics.