The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

Eilam Shapira; Roi Reichart; Moshe Tennenholtz

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

Intermediate

Eilam Shapira, Roi Reichart, Moshe Tennenholtz1/16/2026

arXiv PDF

Key Summary

•The paper shows that simply adding a new AI model to the menu—without anyone actually using it—can push a fairness-focused regulator to change the market rules, shifting money from one side to the other.
•This trick is called the Poisoned Apple effect: releasing a tempting but harmful option that scares the referee (regulator) into redesigning the game.
•The authors study three classic settings—bargaining, negotiation, and persuasion—using 13 top language models as simulated agents across 1,320 market designs.
•They compute equilibria (stable outcomes) for each market and let a regulator pick rules that maximize fairness or efficiency, then see what happens when one more model is made available.
•In many cases, payoffs move in opposite directions when a new model appears—one player gains while the other loses—even when the new model is never chosen in the final play.
•Harm to fairness is especially common when the regulator’s goal is equality rather than total pie size, making fairness-first rules easier to manipulate.
•If the regulator doesn’t update the rules after a new model arrives, the system’s fairness or efficiency gets worse in about 40% of cases, showing that static rules are risky.
•The study uses the GLEE framework to build payoff tables from over 80,000 simulated games and finds stable patterns across bargaining, negotiation, and persuasion.
•A detailed example shows Alice’s payoff rising from 0.49 to 0.52 while Bob’s drops from 0.50 to 0.46, caused only by a market switch forced by a new model that neither uses.
•The key takeaway: regulators need dynamic, simulation-informed market designs that anticipate manipulation through technology expansion.

Why This Research Matters

As AI agents start handling real deals—finding apartments, negotiating salaries, or setting ad prices—rules that seem fair today can be quietly tilted tomorrow by simply adding new models. This means people could unknowingly get worse prices or splits even when nothing “visible” changes in the marketplace. Regulators and platforms need to test technology releases the way cities test new road closures—by simulating traffic before flipping the signs. Dynamic market design guided by data can protect both fairness and efficiency as AI capabilities evolve. If we ignore availability effects, open releases or new APIs can become tools of regulatory arbitrage that shift wealth without improving quality. Getting ahead of this will build trust in AI-mediated markets and keep everyday transactions from becoming a hidden arms race.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: Imagine your school lets students pick any calculator for math tests. One day, someone brings a strange calculator that can hint at answers. Even if no one ends up using it, the teacher might change test rules to stop cheating—and those new rules might accidentally help some students and hurt others.

🥬 The Concept (AI agents): AI agents are computer programs that make decisions on our behalf in tasks like negotiating prices, splitting resources, or recommending purchases. How it works:

People or companies let AI agents act for them in deals.
These AIs follow rules (market designs) set by a regulator (like a referee).
The AIs pick strategies to get the best outcomes for their side. Why it matters: Without understanding how these AIs change the “game,” rules can be gamed, and outcomes can become unfair. 🍞 Anchor: Think of an AI assistant haggling for your concert tickets online—it decides what offers to make so you don’t have to.

🍞 Hook: You know how in chess you plan moves by guessing what your opponent will do? Markets are like that, too.

🥬 The Concept (Game theory): Game theory studies how decision-makers interact when each one’s best move depends on what the others do. How it works:

Define players, actions, and payoffs.
Predict stable outcomes where no one wants to switch.
Check how changing the rules shifts those outcomes. Why it matters: If we change what choices are allowed (like adding a new AI tool), we may flip who wins. 🍞 Anchor: Two kids trading snacks think ahead: “If I offer chips for your cookie, will you accept or hold out for gummy bears?”

🍞 Hook: Think about splitting a pizza with a friend; you both want more, but you also want to agree before it gets cold.

🥬 The Concept (Bargaining): Bargaining is the back-and-forth process of dividing a fixed pie. How it works:

One side proposes a split; the other accepts or rejects.
Waiting costs everyone (the pizza gets colder), so delay is risky.
A deal ends the game and fixes who gets what. Why it matters: Small rule changes—like how long you can talk or what you can reveal—can tilt the final split. 🍞 Anchor: “I’ll take 60% now” vs “No, make it 55%-45% or I wait a turn.”

🍞 Hook: Have you ever tried to sell a toy without saying exactly what you paid for it? Secret info makes deals tricky.

🥬 The Concept (Negotiation): Negotiation is a buyer-seller game where private values and alternating offers try to discover a fair price. How it works:

Seller posts a price; buyer can accept or reject.
Buyer posts a price next, and so on.
Trade happens only when both sides see a gain. Why it matters: Communication rules and hidden values decide whether a good deal happens or fails. 🍞 Anchor: Selling your bike: you want $100, your friend wants to pay$ 80; you dance around numbers until someone agrees.

🍞 Hook: Imagine a friend telling you their video game works perfectly—but do you trust them?

🥬 The Concept (Persuasion): Persuasion is about sending messages to change someone’s decision when the sender knows more than the receiver. How it works:

Seller privately knows quality (high/low).
Seller sends a message; buyer decides to buy or not.
Repeating this builds or breaks trust. Why it matters: Rules about messages can protect buyers from being tricked. 🍞 Anchor: If the snack is stale, honest messages and buyer caution stop bad trades.

🍞 Hook: When a teacher grades group work, they want things to be both fair and to show strong learning.

🥬 The Concept (Fairness and Efficiency): Fairness means splitting gains evenly; efficiency means making the total pie as large as possible with minimal waste. How it works:

Fairness measures equality of outcomes.
Efficiency measures total value created (fast deals, right trades, truthful buys).
Regulators often pick rules to maximize one of these. Why it matters: Choosing fairness vs. efficiency changes which market rules look “best.” 🍞 Anchor: Two kids split 10 stickers: 5–5 is very fair; 6–4 might be fine if it got done quickly (more efficient than arguing all recess and ending with 0–0).

🍞 Hook: Every game needs a referee who sets and updates the rules.

🥬 The Concept (Regulatory frameworks/market design): A regulatory framework is the set of rules that shape how market players interact—what they can say, what they know, and how long they have. How it works:

The regulator lists possible rule-sets (markets).
Predicts outcomes under each set using player behavior.
Chooses the rules that best meet a social goal (fairness or efficiency). Why it matters: Bad or static rules can be gamed; good, adaptive rules keep markets safe and fair. 🍞 Anchor: A sports league bans certain equipment if it gives unfair advantages, then tweaks rules as new gear appears.

🍞 Hook: To study how players behave, you need a giant, shared playground where you can run lots of safe practice matches.

🥬 The Concept (GLEE dataset): GLEE is a big simulation framework where many language models play thousands of economic games so we can measure strategies and outcomes. How it works:

Define game families (bargaining, negotiation, persuasion) and market rules.
Let 13 AI models play many rounds.
Log decisions and build performance tables. Why it matters: With shared data, we can compare models and predict what happens when a new one appears. 🍞 Anchor: It’s like a science fair for AIs where every project is tested with the same rubric, so results are comparable.

The World Before: People assumed that adding more AI options mainly helps or at least doesn’t hurt. The Problem: Just making a new AI model available can shift the regulator’s best rule choice, which can flip who gains more. Failed Attempts: Static rules and one-time audits often missed how “unused” options still change incentives. The Gap: We lacked a way to quantify how technology availability alone (even without adoption) moves equilibria and regulatory picks. Real Stakes: This touches online markets, job offers, real estate, and ads—places where your agent may bargain or persuade on your behalf. If rules can be steered by unused tools, everyday deals can become less fair without anyone noticing.

02Core Idea

🍞 Hook: You know how a fire drill changes how everyone behaves in school—even if there isn’t an actual fire? Just the possibility reshapes plans.

🥬 The Concept (Poisoned Apple effect): The Poisoned Apple effect is when someone releases a new AI tool not to use it, but to scare the regulator into changing the rules in a way that quietly helps the releaser. How it works:

Add a new model to the choice set.
Under current rules, that model would make fairness drop if used.
A fairness-focused regulator switches to new rules to prevent harm.
In the new rules, nobody uses the new model—but payoffs now favor the releaser. Why it matters: Outcomes can be manipulated by options that stay on the shelf. 🍞 Anchor: A student brings an allowed-but-sketchy calculator; the teacher tightens rules; testing changes now benefit that student’s study style.

The “Aha!” Moment in one sentence: Options you don’t use can still move the referee, and moving the referee moves the game.

Three analogies:

Sports equipment: A new, edgy shoe appears; the league bans some moves; defense-friendly rules now favor teams built like yours.
Board games: Adding a risky card to the deck makes the house change scoring; even if you never draw the card, your strategy wins more under the new scoring.
Traffic: A new shortcut opens; to prevent congestion, the city reroutes lanes; your usual route becomes faster than your rival’s—even if no one uses the shortcut.

Before vs. After:

Before: Regulators pick rules based on current model choices; adding models was seen as mostly beneficial or neutral.
After: Regulators must expect strategic releases; simply adding a model can swing payoffs via a forced rules change.

🍞 Hook: Imagine a seesaw where moving a rock on one side changes the balance, even if no one sits on the rock.

🥬 The Concept (Why it works—intuition): The regulator optimizes a metric (fairness or efficiency) after players respond to rules; adding a new choice changes the predicted equilibrium under each rule, so the regulator’s best rule can flip. How it works:

Compute expected outcomes for each market with current choices.
Add one model; recompute equilibria for each market.
The top-scoring market under the regulator’s metric may switch.
Payoffs change because the whole environment changed, not because players picked the new model. Why it matters: It reveals a hidden lever—available but unused technology—as a strategic tool. 🍞 Anchor: A menu adds a spicy dish; the cafeteria changes lunch lines to manage heat warnings; even if no one orders spice, line changes alter who eats first.

🍞 Hook: Think of building with LEGO: snap together a few simple blocks to make a complex shape.

🥬 The Concept (Building blocks): The idea breaks into parts: market design (rules), agent choice (which AI to delegate), equilibrium prediction (stable outcomes), and regulatory optimization (pick the best market for fairness/efficiency). How it works:

For each market, predict payoffs for every model pair.
Find each market’s equilibrium.
Score each market by the regulator’s goal.
Add one model and repeat; see if the chosen market changes. Why it matters: It pinpoints exactly where and how a new option nudges the system. 🍞 Anchor: Re-scoring a competition after adding one more contestant, even if that contestant doesn’t win, can reshuffle medals for everyone else.

03Methodology

At a high level: Inputs (game families + markets + models) → Build payoff tables from GLEE (regressions) → For each market, compute mixed-strategy Nash equilibria → Regulator picks the market that maximizes fairness or efficiency → Expand technology set by adding one model → Recompute equilibria and regulator’s choice → Compare outcomes before vs. after.

🍞 Hook: You know how scientists test recipes by keeping most steps the same and changing just one ingredient?

🥬 The Concept (Meta-game pipeline): The meta-game treats “which AI model to use” as each player’s strategy inside a larger loop where the regulator also chooses the best market rules. How it works:

Fix a list of possible markets (rule-sets).
For each market, build two payoff matrices (one for Alice, one for Bob) from GLEE-based regressions.
Compute the market’s mixed-strategy Nash equilibrium (MSNE).
Score the equilibrium using the regulator’s metric (fairness or efficiency).
Pick the best market.
Add a new model to the choice set and repeat. Why it matters: Holding everything else constant, we isolate the impact of “availability” of a new model. 🍞 Anchor: It’s like adding one new chess piece type to the box and rechecking which opening the coach now recommends.

Each step in detail:

Step A: Define Markets (Information structure, Communication, Horizon) What happens: Create 1,320 rule combinations across bargaining, negotiation, and persuasion (e.g., complete vs. incomplete information; messages allowed or not; finite vs. infinite horizon). Why it exists: Rules shape incentives; without enumerating them, you can’t see how the same models behave under different conditions. Example: Market 4 (incomplete info, messages allowed, infinite horizon) vs. Market 8 (complete info, no messages, infinite horizon).
Step B: Build Payoff Tables via Regression What happens: From 80,000+ simulated games, linear regressions predict expected payoffs for any model pair in any market. Why it exists: You can’t simulate every pairing in every market fresh; regressions generalize from data to fill the table. Example: For Alice=Model D and Bob=Model A in Market 4, predicted fairness=1.000 with payoffs (0.49, 0.50).
Step C: Compute Equilibria (MSNE with Lemke–Howson) What happens: For each market, find mixed strategies where neither player would swap models given the other’s mix. Why it exists: Real strategic play may randomize; ignoring mixed strategies can miss stable, realistic outcomes. Example: Market 4 stabilizes at a pure choice (Alice=D, Bob=A) before expansion; other markets may produce mixes across two or more models.
Step D: Regulatory Optimization What happens: For each market’s equilibrium, compute the regulator’s score (fairness or efficiency) and pick the top market. Why it exists: This mirrors real oversight—choose rules that best meet a public goal given how players will actually respond. Example: Before expansion, regulator picks Market 4 for fairness=1.000.
Step E: Technology Expansion What happens: Add one more AI model to the choice set and recompute equilibria and the regulator’s pick. Why it exists: To measure the causal effect of availability, even if no one adopts the new model at the end. Example: Adding Model E makes Market 4’s fairness drop to 0.976 if kept; regulator flips to Market 8 (=0.990).
Step F: Outcome Comparison What happens: Compare agent payoffs and the regulator’s metric before vs. after the expansion. Why it exists: This shows who gained, who lost, and whether the public goal got better or worse. Example: Payoffs shift from (0.49, 0.50) to (0.52, 0.46) after the regulator moves to Market 8.

🍞 Hook: Like a secret sauce that changes flavor even if you don’t pour it on your food.

🥬 The Concept (The secret sauce): The clever bit is separating “adoption” from “availability,” letting the authors prove that unused options can still bend the regulator’s choice. How it works:

Hold players’ incentives and data generation constant.
Toggle one availability variable (new model) and propagate through equilibria.
Observe regulator’s optimal market flip—without the model being chosen. Why it matters: This isolates the Poisoned Apple effect from ordinary performance improvements. 🍞 Anchor: A new, scary rollercoaster changes park traffic rules; even non-riders get different wait times across the park.

Supporting concepts introduced:

🍞 Hook: Picking the best seat so no one wants to move. 🥬 The Concept (Nash Equilibrium): A stable state where no player can do better by changing their choice alone. How it works: (1) Compute best responses; (2) Find intersections where choices are mutual best responses; (3) Allow randomization when needed. Why it matters: Predicts what players will actually do. 🍞 Anchor: A lunch table where switching seats won’t make anyone happier.
🍞 Hook: Sometimes you flip a coin to keep others guessing. 🥬 The Concept (Mixed strategy): Randomizing across options to prevent the opponent from exploiting you. How it works: (1) Assign probabilities to each model; (2) Equalize payoffs across supported options; (3) Ensure no deviation is profitable. Why it matters: Captures realistic unpredictability and stability. 🍞 Anchor: A goalie diving left or right with certain odds so the kicker can’t always score.
🍞 Hook: A ref’s job is easier with a scoreboard. 🥬 The Concept (Metrics—Fairness vs. Efficiency): Numbers that summarize how good an outcome is for society. How it works: (1) Compute equality (fairness) or total surplus (efficiency); (2) Compare across markets; (3) Pick the best. Why it matters: The chosen goal changes which rules win. 🍞 Anchor: Choosing between “everyone gets equal candy” vs. “we make the biggest candy pile.”

04Experiments & Results

🍞 Hook: If you want to know whether a new backpack makes you faster at school, you time your walk before and after—same route, one change.

🥬 The Concept (The test design): The authors measure how adding one AI model changes (a) the regulator’s chosen market, (b) the players’ payoffs, and (c) the regulator’s fairness/efficiency score—across bargaining, negotiation, and persuasion. How it works:

Establish a baseline with N models; pick the market that maximizes fairness or efficiency.
Add one more model; recompute equilibria and the regulator’s choice.
Record whether payoffs move in opposite directions and whether the new model is adopted. Why it matters: This isolates the effect of availability from actual use. 🍞 Anchor: Time two walks—one with your usual bag, one with a new bag in the closet that you never wear but that makes your parents rearrange your morning plan.

The competition: The “competitors” are different markets and model sets. Baselines are markets chosen with models A–D; expansions add one of the other 9 models from the pool of 13.

Scoreboard with context:

Opposite payoff changes are common: when the set of available models expands, one side often gains while the other loses—like one kid getting a bigger slice while the other gets a smaller one.
Zero-adoption reversals: About one-third of those opposite shifts happen even when nobody uses the new model—classic Poisoned Apple.
Regulator’s goal matters: Expansions more often help when the regulator optimizes efficiency (total pie), but can backfire under fairness (equal split).
Adoption predicts improvement: When the new model is actually chosen in equilibrium, the regulator’s metric usually improves; when it’s unused, harm to the metric is more likely.
Regulatory inertia hurts: If the regulator fails to re-optimize after a release, the metric worsens roughly 40% of the time—like keeping last year’s rules for a very different team.

Detailed example (Bargaining):

Before: Models A–D available. Regulator picks Market 4 for fairness=1.000. Payoffs: Alice=0.49, Bob=0.50.
After adding Model E: If the regulator kept Market 4, fairness would drop to 0.976 (Alice would switch to E). To avoid that, the regulator moves to Market 8, where fairness=0.990.
End result: Neither uses Model E, but payoffs shift: Alice=0.52 (up 0.03—like jumping from a B to an A-), Bob=0.46 (down 0.04—like slipping from a B to a C+).

Surprises:

Unused tools changing outcomes was more frequent than many would expect.
Fairness-driven designs proved especially sensitive to manipulation-by-availability.
The safest move for regulators is not to “freeze” the rules but to re-run optimization—and even then, be alert to side-swaps in payoffs.

05Discussion & Limitations

🍞 Hook: Even the best rules can be gamed if the players change their options.

🥬 The Concept (Limitations): The study uses simulations and linear regressions on LLM behaviors, which may not capture all real-world messiness (like human oversight, multi-agent coalitions, or strategic media campaigns). How it works:

Models are proxies for future AI agents.
Payoffs are predicted statistically; not every nuance is simulated.
Only three game families are studied. Why it matters: Real markets include more players, richer contracts, and evolving incentives; effects could be stronger or weaker there. 🍞 Anchor: A wind tunnel tests toy planes well—but the real sky has storms.

Required resources:

A simulation stack (like GLEE), equilibrium solvers, and enough compute to re-run analyses as new models arrive.
Access to both open and proprietary models to measure availability effects realistically.

When not to use this approach:

If the rules can’t be updated frequently (legal or technical lock-in), relying on a fairness-first static design could be risky.
If data to estimate payoffs is too sparse, the regulator’s pick may be noisy.

Open questions:

How to design “robust-to-availability” rules that don’t flip when a new model appears?
Can we pre-test releases in regulatory sandboxes to forecast Poisoned Apple risks?
How do coalitions (multiple Alices) coordinate technology drops to steer rules even more?
Can commitment devices (e.g., penalties for unused releases that degrade metrics) deter manipulation?
What mixtures of fairness and efficiency goals reduce vulnerability?

Overall assessment: The paper convincingly shows that availability alone can move markets via the regulator’s choice, especially under fairness objectives, and argues for dynamic, simulation-backed market design.

06Conclusion & Future Work

Three-sentence summary: Adding a single AI model to the menu can force a fairness-focused regulator to switch market rules, changing payoffs even if no one adopts the new model. This Poisoned Apple effect appears broadly across bargaining, negotiation, and persuasion with real LLMs as agents. Therefore, static regulatory frameworks are vulnerable and should be replaced with dynamic, simulation-informed designs.

Main achievement: Proving—quantitatively and with realistic agents—that availability without adoption can strategically manipulate regulatory market design and flip who gains.

Future directions:

Build adaptive regulators that routinely re-optimize with guardrails (e.g., stability checks, penalties for availability-only harm).
Create regulatory sandboxes that simulate releases before approval.
Develop robustness metrics that prioritize designs less sensitive to unused options.
Expand to multi-agent, multi-market, and longer-horizon ecosystems.

Why remember this: It reveals a new lever of strategic power in AI markets—the ability to steer the referee by adding options you won’t use—and sets the agenda for resilient, dynamic market rules that protect fairness and social welfare.

Practical Applications

•Create regulatory sandboxes to simulate technology releases and measure fairness/efficiency shifts before approval.
•Adopt dynamic market rules that auto-recompute the optimal design whenever a new model becomes available.
•Add stability guardrails: require that small availability changes cannot flip the chosen market unless the public metric improves by a clear margin.
•Penalize availability-only harm: if a new model is unused yet worsens the regulator’s metric, trigger fees, throttling, or temporary delisting.
•Publish release notes with “economic impact footprints” (projected shifts in fairness and efficiency) for each new model.
•Use mixed-model audits: test how coalitions of models affect equilibria, not just single-model performance.
•Prioritize robustness in rule choice: select markets that perform well across plausible future model additions, not just today’s set.
•Schedule periodic re-optimization (e.g., weekly) and emergency re-optimization on major releases to reduce regulatory inertia.
•Empower platforms with kill-switches or caps to pause harmful availability shocks while recomputations run.
•Educate enterprise users to red-team model releases for regulatory side-effects, not only safety and bias.

Version: 1