Web World Models

Jichen Feng; Yifan Zhang; Chenggong Zhang; Yifu Lu; Shilong Liu; Mengdi Wang

Web World Models

Intermediate

Jichen Feng, Yifan Zhang, Chenggong Zhang et al.12/29/2025

arXiv PDF

Key Summary

•This paper introduces Web World Models (WWMs), a way to build huge, explorable worlds by putting strict rules in code and letting AI write the fun details.
•Instead of storing everything in a big database or having the AI make up the whole world, WWMs split the job: code is the 'physics,' AI is the 'imagination.'
•Typed interfaces (clear JSON schemas) act like a checklist the AI must follow, which keeps things consistent and easy to debug.
•Deterministic hashing turns any location or ID into a stable seed, so when you revisit a place, it stays the same without saving it in a database.
•A 'fidelity slider' lets the system gracefully degrade: live AI when fast, cached text when slow, and templates if AI is offline.
•They built several demos: an Infinite Travel Atlas of Earth, a Galaxy Travel Atlas, a customizable card game (AI Spire), a creative physics sandbox (AI Alchemy), a space explorer (Cosmic Voyager), a live web encyclopedia (WWMPedia), and a story generator (Bookshelf).
•Across all demos, the design principles are the same: separate physics from imagination, require typed outputs, use deterministic generation for infinite worlds, and add fallbacks.
•Compared to traditional web apps, WWMs aren’t stuck with a fixed database; compared to fully generative worlds, WWMs are controllable and testable.
•This approach makes large, persistent worlds practical for language agents while keeping developers in control.
•The idea could power future education tools, games, simulations, and knowledge explorers that are both reliable and endlessly new.

Why This Research Matters

Many apps need both reliability and freshness: rules that never break and content that never runs out. WWMs deliver this by keeping the logic in code and letting AI supply rich, typed descriptions on demand. That means travel planners that know real maps but give new ideas, science sandboxes that safely expand themselves, and encyclopedias that build clean, cited pages instantly. Developers can use familiar web tools to build large, persistent worlds without massive databases. Users get consistent experiences even when AI is slow, thanks to caching and templates. As language agents become more common, WWMs provide the practical ‘home’ they need to act, learn, and remember.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

You know how a playground has rules (like “don’t run into people”) but kids still make up new games every day? Computers making worlds have a similar challenge: we want firm rules so things don’t break, but we also want endless creativity so it never gets boring. Before this paper, there were two main ways to build worlds for language agents. One way was like a super tidy playground with everything decided ahead of time: traditional web apps with databases and fixed pages. These are safe and well-behaved, but they can only do what the developers planned in advance. The other way was like a wild, make-believe forest where anything can happen: fully generative world models where an AI makes up the whole world in its own imagination. These are exciting and unbounded, but they can be hard to control, test, or debug—sometimes the rules shift under your feet. The problem researchers faced was: how do we give language agents a place to act, remember, and learn for a long time—like a real world—without either boxing them into a tiny playground or letting them wander into chaos? People tried a few things that didn’t fully work. They tried stuffing more and more content into databases, but that only scales as fast as humans can type. They tried letting the AI generate everything, but that often led to contradictions or expensive runs that didn’t scale well. They tried using fuzzy hidden vectors (embeddings) as the world’s memory, but those are hard to inspect or fix when something goes wrong. They tried caching, but without rules, caches could save mistakes. The missing piece was a middle path: a way to keep strict, testable rules (so the world stays consistent), while also allowing AI to flex its creativity (so the world can expand on demand). That’s where Web World Models (WWMs) come in. In WWMs, code is the physics: it defines what exists, what actions are legal, and how the logical state changes. AI is the imagination: it writes descriptions, dialogue, missions, and guides, but only inside shapes that the code allows. This is like giving the AI a coloring book with clear outlines; it can choose amazing colors, but it can’t redraw the page. Why should you care? Because lots of real-life experiences need both reliability and freshness. Imagine a travel guide that always follows real maps but gives you a new, themed itinerary every time you click a spot. Or a science sandbox where you can invent new materials, and the system creates sensible reactions without breaking physics. Or a live encyclopedia that assembles a clean, well-cited page on any topic you ask, right now. With WWMs, developers can use the ordinary web stack they already know (TypeScript, HTTP, JSON) to build worlds that are controllable, open-ended, and ready for language agents to live in for the long haul. This matters for education, games, research, and any app where you want stable rules plus unlimited exploration.

02Core Idea

The aha! moment in one sentence: Keep the rules of the world in code (physics) and let the AI add rich descriptions and decisions (imagination), tied together by strict, typed interfaces so the world can be both endless and dependable. Three analogies:

Theme park: Engineers build sturdy rides (physics), while storytellers design the themes and shows (imagination). Visitors get both safety and magic.
LEGO city: The instruction booklet defines what bricks snap where (physics); you decorate neighborhoods with your own creative scenes (imagination).
Cooking show: The recipe fixes ingredients and steps (physics); the chef’s flair adds plating, stories, and variations (imagination). Before vs. after:

Before: Either you had fixed, database-backed websites (safe but limited), or fully AI-made worlds (endless but wobbly and hard to control).
After: WWMs let you program the solid skeleton in ordinary web code, then ask an LLM to fill in color and story—always returned as valid JSON matching code-defined shapes—so you get both control and creativity. Why it works (intuition, not equations):
The code layer is like the guardrails. It enforces what’s allowed, computes the next state, and never contradicts itself.
The AI layer only paints inside the lines. Because its outputs must match a typed schema, it can’t break the rules of the world.
Deterministic seeds (from hashing) mean the same place or ID always yields the same content. That gives ‘object permanence’ without saving everything.
A fidelity slider lets you degrade gracefully: live AI when possible, cached text when slow, templates when offline. Logic never stops. Building blocks (in learning order), each with the Sandwich pattern: 🍞 Hook: You know how a super-smart friend can write stories and explain things in many ways? 🥬 Language Models: A language model is a program that reads and writes text very well.
How it works: 1) It reads your prompt; 2) It uses patterns learned from lots of text; 3) It predicts the next words; 4) It keeps going to form answers or stories.
Why it matters: Without LLMs, worlds would feel dry and repetitive; with them, places come alive with context, dialogue, and missions. 🍞 Anchor: In the Infinite Travel Atlas, the LLM writes a themed itinerary for any spot you click on the globe. 🍞 Hook: Imagine a form where every box has a label so you can’t forget anything important. 🥬 Typed Interfaces: A typed interface is a strict template (like a JSON schema) that says exactly what fields must be filled and what types they are.
How it works: 1) Code defines types (like Planet { biome: string; hazard: string; }); 2) AI must return valid JSON that fits; 3) Code checks it; 4) Only valid data is accepted.
Why it matters: Without types, AI might return messy or missing fields and break the app; types keep everything debuggable and consistent. 🍞 Anchor: In Galaxy Travel Atlas, the AI’s planet brief must match a schema (terrain/sky/hazards), or it’s rejected and retried. 🍞 Hook: If you use the same recipe and the same ingredients, you get the same cake every time. 🥬 Deterministic Generation: Deterministic generation means the same input always produces the same output.
How it works: 1) Take a stable key (like coordinates); 2) Hash it to make a seed; 3) Fix the AI’s randomness with that seed; 4) Generate—revisits match.
Why it matters: Without determinism, a place could change each visit, confusing users and agents. 🍞 Anchor: Click the same beacon in the Travel Atlas today or next week—you’ll get the same themed guide. 🍞 Hook: When Wi‑Fi is slow, your video app reduces quality but keeps playing. 🥬 Graceful Degradation: Graceful degradation means the system keeps working with simpler outputs if resources are tight.
How it works: 1) Try live AI; 2) If slow, use cache; 3) If offline, use templates; 4) Physics (logic) always runs.
Why it matters: Without it, your app might freeze; with it, users always get something usable. 🍞 Anchor: If the AI is down, Cosmic Voyager shows bundled descriptions so your tour continues. 🍞 Hook: Chefs can make many meals from a few rules—mix, bake, season. 🥬 Procedural Generation: Procedural generation creates lots of content from algorithms instead of hand-writing everything.
How it works: 1) Use rules and seeds; 2) Generate structures (maps, stars, items); 3) Let AI add flavor text; 4) Repeat infinitely.
Why it matters: Without it, you must store or author every piece; with it, the world can grow on demand. 🍞 Anchor: Galaxy Travel Atlas builds galaxies and planet layouts in code, then AI writes the mission logs. 🍞 Hook: In sports, the referee enforces rules, while commentators tell the story. 🥬 Separation of Concerns (Physics vs. Imagination): Keep strict logic (physics) separate from creative text (imagination).
How it works: 1) Code computes next state; 2) AI describes it; 3) Types connect them; 4) Never let AI change the rules.
Why it matters: Without separation, AI might break the game or contradict itself. 🍞 Anchor: In AI Spire, the AI designs a card’s text within a schema, but only the code decides exactly how it affects HP, damage, or energy. 🍞 Hook: Think of a coloring book: the outlines are fixed, but colors are endless. 🥬 Web World Model (WWM): A WWM is a world where code defines the rules and state, while an LLM adds descriptions and high-level choices—both joined by typed interfaces.
How it works: 1) Users act; 2) Code updates the world; 3) AI fills in narratives as valid JSON; 4) Render; 5) Repeat.
Why it matters: Without WWMs, we pick between boring-but-safe or wild-but-wobbly; WWMs give both safety and wonder. 🍞 Anchor: WWMPedia uses code to search and render sections, while the AI writes the article body with citations inside a fixed page layout.

03Methodology

At a high level: User action → Physics (code) updates state → Deterministic seed picked → AI (LLM) fills in content as typed JSON → Cache or fall back if needed → Render and loop. Step-by-step with what, why, and an example:

Receive an input (user action)

What happens: The user clicks a spot on a globe, selects a planet, wins a card battle, mixes two elements, or asks a question.
Why it exists: The world should respond to the user—this event is the spark for a new state.
Example: In the Infinite Travel Atlas, you click near 1.2921° S, 36.8219° E (Nairobi area), which becomes the key for the next steps.

Physics layer computes the next logical state (code only)

What happens: Deterministic code checks rules, updates inventories, sets flags, or expands structure (like generating star clusters or shop stock) without AI.
Why it exists: This guarantees consistency—doors won’t open without keys; energy won’t go below zero; coordinates remain stable.
Example: In AI Spire, after winning a fight, code transitions to the reward phase and updates relic counters—no AI needed here.

Pick or compute a deterministic seed

What happens: The system hashes a stable identifier (like coordinates or a planet ID) to create a seed that fixes randomness.
Why it exists: It ensures object permanence without storing every location in a database—revisits recreate the same content.
Example: For a beacon at lat/lon, hash → 834219. Visiting again later yields the same seed and the same place ‘personality.’

Call the AI (Imagination layer) with a typed contract

What happens: The LLM receives structured context (the typed state, the seed, and constraints) and must return valid JSON conforming to a schema.
Why it exists: It keeps creative outputs compatible with the code engine—no missing fields, no broken types.
Example: In Galaxy Travel Atlas, the AI returns: { "planetName": "Velis Minor", "biome": "stormglass", "hazards": ["shard squalls"], "missionHook": "Recover a probe lost in the crystals." } The code validates it before rendering.

Validate, cache, and possibly retry

What happens: A validator checks the JSON against the schema. If it fails, the system retries with a stricter prompt or falls back to templates. Valid results are cached by seed.
Why it exists: Validation catches structure errors; caching saves cost and latency; fallbacks keep the app responsive.
Example: If the WWMPedia article body is missing a ‘References’ array, the system retries once; if still failing, it uses a minimal, pre-authored section layout.

Render the world state

What happens: The client shows UI panels, cards, subtitles, or map overlays that combine the physics state with AI text.
Why it exists: Clear rendering turns structured data into a usable experience.
Example: Cosmic Voyager shows an orbit card with stats, plus an AI ‘Cosmic Guide’ subtitle that updates every 30 seconds.

Loop for the next action

What happens: The user makes another move, and the cycle repeats with persistent logic and stable seeds.
Why it exists: This is how the world stays alive and coherent over long sessions.
Example: In AI Alchemy, after a new reaction (Life + Fire → Ash) is generated and cached, future collisions use the same rule instantly. Concrete data walkthroughs:
Travel Atlas: Input: lat/lon = (48.2082 N, 16.3738 E) → Physics resolves geographic metadata (country/city tags) → Seed = hash(lat, lon) → AI returns Guide JSON (theme: ‘old-world arts’, itinerary: day-by-day cards) → Cache by seed → Render a scrollable cockpit with thematic colors.
AI Spire: Input: ‘Wish’ prompt: “a card that heals me and restores energy” → Physics confirms reward phase → Seed for this reward → AI returns CardSpec { name: “Second Wind”, type: “Skill”, cost: 1, effects: [ { heal: 6 }, { gainEnergy: 1 } ] } → Validator enforces numeric ranges and allowed types → Code executes effects in the next combat.
WWMPedia: Input: Query = “Superconductor” → Physics does web search and sanitizes snippets → AI composes sections as JSON: { title, toc, sections[], references[] } → Code renders a Wikipedia-like page with citations → User clicks ‘explain more’ and the loop continues with expanded sections. The secret sauce:
Separation of Concerns: Never let the AI change rules; it only paints descriptions within the shapes code defines.
Typed Interfaces: Every AI output is a well-formed, debuggable object; if not, it doesn’t enter the world.
Deterministic Hashing: Infinite worlds without infinite storage—stable seeds recreate places exactly.
Fidelity Slider: The experience never stalls: live AI → cached → templates, while physics always runs.
Web-native Stack: TypeScript for shared types, HTTP streaming for responsive text, and serverless patterns for scaling procedural worlds. Putting it all together across demos:
Infinite Travel Atlas: Physics = real geography + beacon logic; Imagination = themed guides and visuals.
Galaxy Travel Atlas: Physics = procedural galaxies with stable IDs; Imagination = structured mission briefs.
AI Spire: Physics = deterministic combat and effect execution; Imagination = schema-bound card and relic designs.
AI Alchemy: Physics = cellular automata; Imagination = new reactions synthesized on demand, then cached.
Cosmic Voyager: Physics = WebGL scene rules and camera modes; Imagination = view-aware narration.
WWMPedia: Physics = retrieval and renderer; Imagination = structured article prose with citations.
Bookshelf: Physics = pagination and style constraints; Imagination = story continuations within genre/tone tags.

04Experiments & Results

The tests: Instead of classic ML benchmarks, the authors evaluate whether WWMs deliver what they promise across real, running systems.

Persistence: Do places and objects stay the same when revisited (object permanence) without saving everything in a database?
Unlimited scope: Can users explore arbitrarily many locations without pre-authoring content?
Controllability: Do typed interfaces and code-enforced physics prevent contradictions and broken states?
Latency resilience: Does the fidelity slider keep the experience usable when AI is slow or offline?
Generality: Does the same architecture work for maps, galaxies, games, sandboxes, encyclopedias, and stories? The competition (what they compare against conceptually):
Traditional web frameworks: Reliable but boxed-in by database schemas; they struggle with infinite, on-demand content.
Fully generative world models: Open-ended but hard to control, test, and scale; worlds can drift or break logic. Scoreboard with context:
Persistence: WWMs score an ‘A’—deterministic hashing recreates the same place every time, like getting an A+ for consistency when others get a B- due to drifting generations.
Unlimited scope: WWMs again score high—procedural structures plus on-demand AI text allow exploration far beyond fixed databases, like owning a library that writes new books responsibly, instead of being stuck with a single shelf.
Controllability: Typed JSON schemas and a strict code layer keep outputs valid and avoid logic violations, which feels like moving from a scribble pad (D grade for messiness) to a lined notebook with margins (A for neatness).
Latency resilience: With caching and templates, the world keeps running even if the AI stalls—like a streaming app that never fully freezes—an A- compared to generative-only systems that can drop the ball.
Generality: The same pattern works across seven diverse demos, indicating the abstraction travels well—like a toolbelt that fits many jobs. Surprising findings:
Typed interfaces aren’t just guardrails; they make debugging and iteration much faster because errors are caught at the boundary.
Deterministic seeds feel almost like ‘free storage’: places persist without writing to a database, which is counterintuitive but very effective.
Treating the LLM as a microservice behind a contract reduces risk: when it’s down, templates keep the world coherent.
Long-horizon narrative (Bookshelf) is mostly a state-management challenge: keeping carried state small and typed helps the story stay consistent. Examples in action:
Infinite Travel Atlas: Clicking beacons worldwide returns consistent, themed guides tied to geography. Nairobi evokes a ‘desert-bloom’ vibe; Honolulu and Rio each get distinct but stable themes.
Galaxy Travel Atlas: Planets have fixed IDs and attributes from procedural code; AI fills mission logs within a schema. Revisits match exactly.
AI Spire: Reward cards are generated on the fly yet always legal to play because effect codes are from a controlled vocabulary.
AI Alchemy: New element reactions are invented by AI once, cached, then behave deterministically in the simulator thereafter.
Cosmic Voyager: View-aware narration enriches a WebGL solar system; if AI is unavailable, bundled blurbs keep the tour going.
WWMPedia: On-demand, sectioned articles with citations make the open web feel like a navigable knowledge world.
Bookshelf: Style tags and pagination rules keep long-form stories coherent while letting the prose keep flowing.

05Discussion & Limitations

Limitations:

AI quality matters: If the model writes weak or off-tone content, deterministic seeds can ‘lock in’ that content until prompts or schemas are improved.
Schema design overhead: Getting the right typed interfaces takes careful thought; too loose invites errors, too tight restricts creativity.
Not for physics-accurate simulation: WWMs favor clarity and control over perfect real-world physics (e.g., Cosmic Voyager uses scripted motion).
Security and safety: Even with schemas, generated text can still require moderation; careful prompt design and post-filters help but don’t solve everything.
Cost and latency: Live AI calls cost money and time; caching and fallbacks reduce but don’t eliminate this. Required resources:
A web stack (TypeScript/React), tooling for JSON schema validation, and basic serverless or caching infrastructure.
An LLM API or local model, plus prompt engineering and logging/monitoring to catch failures.
For 3D demos, WebGL or similar graphics skills. When not to use WWMs:
Ultra low-latency, high-FPS games where any AI delay is unacceptable.
Offline-only deployments without a plan for fallbacks or local models.
Settings demanding perfect factuality with no risk of creative embellishment (e.g., compliance-critical documents without strict human review). Open questions:
Learning better rules over time: Can the physics layer safely evolve as the system learns, with migrations that preserve old seeds?
Multi-user consistency: How to coordinate shared state when many agents act at once across large worlds?
Safety evaluation at scale: What automated checks catch subtle content risks beyond schema validation?
Agent skill growth: How should agents store and share reusable skills (like Voyager in Minecraft) inside WWMs?
Tool-augmented imagination: What’s the best mix of retrieval, templates, and generation for both quality and speed?

06Conclusion & Future Work

In three sentences: Web World Models split a world into code-defined physics (rules and state) and AI-powered imagination (descriptions and high-level choices), connected by strict typed interfaces. Deterministic hashing yields object permanence without heavy databases, while a fidelity slider ensures the app stays usable even when AI is slow. This middle path combines the safety of traditional web apps with the open-endedness of generative worlds. Main achievement: Showing that the ordinary web stack itself—TypeScript, JSON, HTTP—can be a scalable substrate for controllable, infinite worlds where language agents can act, remember, and learn. Future directions: Richer multi-user coordination, evolving schemas with versioning, tighter safety tooling, local or hybrid models for lower latency, and deeper integrations where agents acquire and reuse skills across many WWMs. Why remember this: WWMs turn ‘code as physics, AI as imagination’ from a slogan into a practical recipe you can ship today, letting you build worlds that are both reliable and endlessly new.

Practical Applications

•Interactive geography platforms that generate themed, multi-day guides for any coordinate.
•Educational space explorers with view-aware narration that works even when AI is offline.
•Customizable card or roguelike games where AI designs new items but code enforces balanced rules.
•Creative physics sandboxes that invent plausible reactions and materials on demand.
•On-the-fly encyclopedias that render clean, cited articles from live web evidence.
•Story platforms that keep style, pacing, and pagination stable while streaming fresh prose.
•Corporate training simulators where procedures (physics) are strict but scenarios (imagination) vary widely.
•Museum or classroom exhibits that adapt explanations to age and interest, within safe schemas.
•Urban planning or logistics mockups where deterministic maps meet AI-generated what-if narratives.
•Multi-agent social simulations with stable rules and AI-driven dialogue within typed constraints.

Version: 1