Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Yuning Gong; Yifei Liu; Yifan Zhan; Muyao Niu; Xueying Li; Yuanjun Liao; Jiaming Chen; Yuanyuan Gao; Jiaqi Chen; Minming Chen; Li Zhou; Yuning Zhang; Wei Wang; Xiaoqing Hou; Huaxi Huang; Shixiang Tang; Le Ma; Dingwen Zhang; Xue Yang; Junchi Yan; Yanchi Zhang; Yinqiang Zheng; Xiao Sun; Zhihang Zhong

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Beginner

Yuning Gong, Yifei Liu, Yifan Zhan et al.12/9/2025

arXiv PDF

Key Summary

•Visionary is a web-based platform that lets you view and interact with advanced 3D scenes, right in your browser, with just a click.
•It uses WebGPU for super-fast graphics and ONNX to run AI models every frame, so scenes can be dynamic and alive, not just static.
•A key idea is the Gaussian Generator contract, a simple, standard rule for how AI models output 3D Gaussian data for rendering.
•Compared to popular WebGL viewers like SparkJS and SuperSplat, Visionary moves heavy work to the GPU and performs one true global sort per frame, fixing quality glitches during fast camera moves.
•In tests, Visionary rendered a 6M-Gaussian scene in about 2.09 ms per frame versus 176.90 ms for SparkJS—up to roughly 135× faster under identical assets.
•It supports many 3D Gaussian variants, including MLP-based 3DGS, 4DGS for dynamic scenes, and animatable neural avatars, all inside the browser.
•The platform also plugs in generative post-processing (like stylization and enhancement) via ONNX, creating a complete compute + render + enhance pipeline on the client.
•A three.js plugin and a simple TypeScript API make it easy to add Visionary into existing web apps.
•This work lowers the barrier to share, compare, and deploy world-model components by unifying rendering and per-frame inference in one portable, web-native runtime.
•Limitations include evolving WebGPU/ONNX browser support and browser memory policies, which can restrict the largest models.

Why This Research Matters

Visionary turns the browser into a powerful lab for dynamic 3D and world-model research, so anyone can try advanced demos with a link instead of a heavy install. It makes classrooms, creators, and companies more agile by enabling real-time, in-browser visualization, editing, and comparison of many 3DGS variants. For world models, it provides an explicit 3D memory that stays consistent as the camera moves, unlike purely 2D video approaches. Developers can integrate it easily via a three.js plugin and a clean TypeScript API, accelerating product prototypes. Because inference and rendering both happen locally, users keep privacy and enjoy low-latency interaction. As WebGPU and ONNX keep improving, this approach will reach more devices and unlock even richer interactive 3D experiences.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: You know how sharing a photo is easy—you just send a link and anyone can see it? But sharing a fancy 3D demo often needs big installs, drivers, and the right computer.

🥬 The Concept (WebGPU): WebGPU is a new browser superpower that lets websites use your graphics card directly for fast 3D and computation. How it works:

The browser talks to your GPU safely, like a secure high-speed lane.
It runs compute shaders (math programs) and graphics shaders (drawing programs).
It keeps data on the GPU so things don’t bounce back and forth to the CPU. Why it matters: Without WebGPU, viewers often fall back to older WebGL paths or CPU work, making big, dynamic 3D scenes slow or glitchy. 🍞 Anchor: When you move the camera in a big 3D scene and it stays smooth, that’s WebGPU doing lots of math right on the graphics card.

🍞 Hook: Imagine painting a 3D world with tiny, soft, colorful dots that blend perfectly—like airbrush mist building a picture.

🥬 The Concept (3D Gaussian Splatting, 3DGS): 3DGS draws scenes using millions of fuzzy 3D dots (Gaussians) that blend into realistic images when projected onto your screen. How it works:

Each dot has a 3D position, size, orientation, color, and transparency.
The camera projects each dot to an ellipse on the screen.
Dots are sorted from far to near and blended to make the final color per pixel. Why it matters: Without 3DGS, rendering neural scenes can be slow and heavy; Gaussians render fast and look great, making real-time viewing possible. 🍞 Anchor: In a 3D bicycle demo made of millions of soft dots, you can fly the camera around and it still looks like a real bike—because the dots blend smoothly.

🍞 Hook: Picture a travel adapter that lets your charger work in any country—you don’t need a new charger, just a standard plug.

🥬 The Concept (ONNX): ONNX is a standard file format that lets AI models trained in different tools run in many places. How it works:

You export your trained model (e.g., from PyTorch) into ONNX.
An ONNX runtime in the browser loads and runs the graph.
The same model can run on different hardware backends. Why it matters: Without ONNX, every new AI algorithm needs custom code paths, making sharing and running models in a browser painful. 🍞 Anchor: A human-avatar model exported to ONNX can be loaded by Visionary in Chrome, Safari, or Edge without rewriting the model.

The world before: Researchers built amazing neural renderers like NeRF and then faster ones like 3D Gaussian Splatting (3DGS). They ran great on desktops with CUDA and custom C++/GPU code. But sharing those demos was hard: you needed the right GPU drivers, libraries, engine versions, and often a whole Python stack. Web viewers existed, but most used older WebGL pipes, pushing some critical steps—like sorting those millions of little Gaussians—onto the CPU. That meant slow frame rates, lags, and trouble supporting truly dynamic scenes.

The problem: How do we make 3DGS and its growing family (MLP-based 3DGS, 4DGS for moving scenes, animatable avatars) easy to share and run, without installs, and with the ability to change every frame? Most demos either precompute everything ahead of time or do the smart AI parts on a server. That kills interactivity, adds cost, and hurts reproducibility.

Failed attempts: Desktop viewers and engine plugins (Unity/Unreal/Blender) are powerful but heavy and finicky to install. WebGL viewers display static splat scenes, but struggle with per-frame changes and often rely on CPU-side sorting, which becomes a bottleneck in big scenes or fast camera moves. Some tools side-step by doing inference server-side, but then you lose the magic of click-to-run and user privacy.

The gap: We needed a browser-native system that can both compute and render inside the same frame—no servers, no installs—and that welcomes lots of algorithm styles without rewriting the renderer. That means: (1) GPU-first rendering with compute shaders (WebGPU), (2) a standardized way for any model to output Gaussians per frame (ONNX contract), and (3) a fast GPU global sort to keep images stable even during quick camera motion.

Real stakes: This isn’t just for flashy demos. A web-native, dynamic, high-quality 3D platform helps classrooms explore digital twins in science, lets game studios preview worlds instantly, boosts e-commerce with interactive 3D try-ons, and supports researchers who want to compare new methods fairly—just send a link. It’s also a building block for world models that need a reliable 3D memory of the world, not just 2D video frames.

Visionary steps into this space by uniting WebGPU rendering and per-frame ONNX inference in the browser, making dynamic 3D scenes click-to-run while remaining fast, extensible, and consistent across platforms.

02Core Idea

🍞 Hook: Imagine a theater where the actors (AI models) can change the set on stage every second, while the spotlight crew (renderer) keeps the show smooth and bright—no delays, no backstage chaos.

🥬 The Concept (Visionary’s key insight): Run AI model inference and 3D Gaussian rendering together, in the browser, every frame, using one simple contract that any model can follow. How it works:

A standard ONNX “Gaussian Generator” contract says exactly how models output positions, sizes, colors, and opacities of Gaussians each frame.
The browser runs the ONNX model on WebGPU (or supported backend) to compute updates on the fly.
The WebGPU renderer preprocesses, globally sorts on GPU, and draws the updated Gaussians with meshes.
Optional ONNX post-processing (like stylization) beautifies the final image. Why it matters: Without this, every algorithm needs special-case glue code or a server; with it, any 3DGS-family method can plug in and just work—fast, dynamic, and portable. 🍞 Anchor: A human avatar changes pose parameters, the ONNX model outputs deformed Gaussians, Visionary sorts and renders them that same frame, and you see the avatar move smoothly in your browser.

Three analogies:

Universal power strip: The Gaussian Generator contract is the strip; every AI model is a plug. No rewiring—just plug in and power on.
Kitchen assembly line: Prep (ONNX pre-decoding) chops ingredients (Gaussians), the stove (WebGPU) cooks them fast, and plating (post-processing) adds garnish.
Orchestra: Models play notes (Gaussians) in real time; the conductor (global GPU sort) keeps everyone in order so the music (image) is clean.

Before vs after:

Before: Web viewers were static or offloaded smart parts to servers; sorting often lived on the CPU, causing lag and artifacts when moving fast.
After: Visionary pulls compute to the client GPU, runs per-frame ONNX inference, and performs a true global sort each frame. Result: real-time, dynamic, clean compositing—even in big scenes—without installs.

Why it works (intuition):

Keep data where the speed is (GPU). WebGPU compute avoids slow CPU-GPU ping-pong and lets us do per-frame heavy lifting, like projection and culling, in parallel.
One contract to rule them all (ONNX I/O). If every model outputs the same kind of Gaussian buffers, the renderer never changes.
Global order beats local guesses. Correct back-to-front blending needs a true global sort; GPU radix sort makes it practical every frame.
Small but mighty optimizations. FP16 packing cuts bandwidth; graph capture and graph rewrites reduce JavaScript and WebGPU overhead, smoothing frame times.

Building blocks (with concept cards):

🍞 Hook: Think of the browser as a race car track that now opens its fastest lane to everyone.

🥬 The Concept (WebGPU, recap): WebGPU is the modern GPU lane in your browser for fast compute and graphics. How it works:

Upload data to GPU buffers.
Run compute shaders to preprocess.
Run graphics shaders to draw. Why it matters: Without WebGPU, per-frame dynamic updates and global sorting would be too slow. 🍞 Anchor: Visionary uses WebGPU to preprocess millions of Gaussians and sort them globally in milliseconds.

🍞 Hook: Imagine building a 3D sculpture from millions of soft, colored droplets.

🥬 The Concept (3DGS, recap): 3DGS represents scenes with soft 3D blobs that blend into images quickly. How it works:

Each blob projects to a 2D ellipse.
Sort all blobs by depth.
Blend back-to-front for correct transparency. Why it matters: It’s fast and looks great, enabling real-time rendering of learned scenes. 🍞 Anchor: A 6M-Gaussian bicycle renders interactively when sorted and blended properly.

🍞 Hook: Like a universal translator that lets different teams work together.

🥬 The Concept (ONNX, recap): ONNX is a standard way to package AI models so they run anywhere. How it works:

Export trained models to ONNX.
Load with ONNX Runtime Web.
Bind inputs/outputs to WebGPU buffers. Why it matters: It makes per-frame plug-ins possible without custom code. 🍞 Anchor: An MLP-based 3DGS decoder trained in PyTorch runs in the browser via ONNX.

🍞 Hook: Think of a form everyone fills out the same way so the machine can read it fast.

🥬 The Concept (Gaussian Generator contract): A fixed ONNX input/output schema that says how models should output Gaussian attributes each frame. How it works:

Inputs: frame index, camera, or control signals.
Model outputs packed Gaussian buffers (positions, covariances, color, opacity) plus metadata.
Renderer consumes outputs directly, no per-model shader changes. Why it matters: Without a standard contract, each new algorithm would force renderer edits and bugs. 🍞 Anchor: Swap in 4DGS today and a neural avatar tomorrow—no renderer code changes.

🍞 Hook: Like a chef who cooks fresh per plate instead of reheating old food.

🥬 The Concept (MLP-based 3DGS): An MLP decodes Gaussians from anchor features for the current view, saving storage and improving quality. How it works:

Store anchors with features.
Feed them to an MLP per frame with current view.
Output fresh Gaussian parameters to render. Why it matters: Without per-frame decoding, these methods wouldn’t look as good or be view-adaptive. 🍞 Anchor: As you orbit a scene, the MLP decodes view-aware Gaussians that keep fine details crisp.

🍞 Hook: A flipbook where each page is a moment in time.

🥬 The Concept (4DGS): 4DGS adds time by deforming canonical Gaussians with a fast spatiotemporal field. How it works:

Keep a canonical scene.
Use a learned field (e.g., HexPlanes + small MLP) to predict per-time deformations.
Render the deformed Gaussians for the current timestamp. Why it matters: Without 4DGS, moving scenes would require storing huge per-frame data. 🍞 Anchor: A dancing character updates Gaussians smoothly as time advances, right in your browser.

🍞 Hook: Think of a digital action figure that bends at joints.

🥬 The Concept (Neural avatars): Avatars keep Gaussians in a neutral pose and use body parameters to deform them per frame. How it works:

Store canonical Gaussians and skinning weights.
Input pose/shape (e.g., SMPL-X) to compute joint transforms.
Blend transforms to move each Gaussian. Why it matters: Without this, realistic, controllable humans in the browser would be impractical. 🍞 Anchor: Change a slider for arm pose; the avatar’s Gaussians deform and render instantly.

🍞 Hook: Like adding Instagram filters after taking a picture.

🥬 The Concept (Generative post-processing): A feedforward model enhances or stylizes the rendered image, per frame, in the browser. How it works:

Render a base image.
Run an ONNX U-Net to denoise/enhance/stylize.
Display the polished result. Why it matters: Without this, you miss creative styles or quality boosts without leaving the browser. 🍞 Anchor: Toggle a style button and your 3D scene becomes watercolor or sharper and cleaner.

03Methodology

High-level recipe: Inputs → ONNX pre-decoding (Gaussian Generator) → GPU preprocessing (transform, cull, ellipse) → GPU global sort → GPU rasterization + mesh depth composition → ONNX post-processing (optional) → Output frame.

Step 1: ONNX pre-decoding (Gaussian Generator contract)

What happens: Each frame, Visionary runs the loaded ONNX model to generate or update Gaussian attributes for that moment (e.g., per-view MLP decoding, per-time 4D deformation, per-pose avatar skinning).
Why this step exists: Different 3DGS variants compute Gaussians differently; without a standard ONNX contract, the renderer would need custom branches per algorithm.
Example: You load a 4DGS scene. The only input is the timestamp t. The model outputs updated positions, rotations/scales (as covariances), colors, and opacities for the current time.
Secret sauce: Graph capture keeps the model session and I/O bindings stable so WebGPU can reuse recorded work, reducing overhead; big Concat/Split ops are rewritten into chunks to respect WebGPU limits while keeping the same packed output.

Step 2: GPU preprocessing (compute shader)

What happens: For each Gaussian, Visionary applies the model’s transform (if any), projects to camera space and clip space, prunes off-screen or nearly invisible splats (frustum/opacity culling), computes the 2D ellipse axes and NDC center, and writes a compact Splat record. Depth (NDC z) becomes the sort key. All valid splats go into unified global buffers via an atomic counter.
Why this step exists: Doing this math on the GPU in parallel keeps the CPU free and avoids slow round trips. Without it, preprocessing and culling would bottleneck the frame.
Example: In the 6M-Gaussian bicycle scene, millions of splats are transformed and culled in a few tenths of a millisecond on a high-end GPU.
Secret sauce: FP16 packing halves bandwidth; pre-storing upper-triangular covariance keeps memory tight and math fast.

Step 3: GPU global sorting (radix sort)

What happens: All visible splats across all models are globally sorted by depth so blending is back-to-front and correct everywhere.
Why this step exists: If you only sort locally within chunks, overlaps between chunks blend incorrectly, causing transparency errors. Without global order, fast camera motion can also create popping artifacts.
Example: SuperSplat’s local sorting can mis-composite overlaps; Visionary’s global sort fixes this by enforcing one consistent order for the entire frame.
Secret sauce: A GPU radix sort is fast and stable, making a true global order practical each frame.

Step 4: Rasterization of Gaussians + mesh depth composition

What happens: The vertex stage expands each Splat into a screen-space quad using its ellipse axes; the fragment stage evaluates the Gaussian weight and returns premultiplied color. If meshes are present, a depth prepass renders the mesh to a depth buffer; Gaussian fragments behind the mesh depth are rejected.
Why this step exists: Expanding to a quad and evaluating the Gaussian in the fragment shader is the standard, efficient way to draw ellipses; mesh depth composition enables hybrid scenes (splat + traditional mesh) with correct occlusion.
Example: A splatted tree in front of a mesh house looks right because splats behind the house are dropped by the depth test.
Secret sauce: Back-to-front alpha compositing stays correct because of the prior global sort.

Step 5: ONNX post-processing (optional)

What happens: The rendered image is optionally fed into an ONNX U-Net to enhance details, denoise, or stylize, then displayed.
Why this step exists: Some applications want a specific look or quality boost without leaving the browser pipeline.
Example: Applying EXGS-style enhancement to sharpen fine edges of a building façade in a city scene.
Secret sauce: Keeping this as ONNX means you can swap styles or enhancers without touching the renderer.

Putting variants into the pipeline (concept cards included once; now applied):

MLP-based 3DGS: Anchors + features go into the ONNX MLP per frame with camera/view info; outputs are fresh Gaussians. Why: Saves storage and stays view-adaptive.
4DGS: A canonical set of Gaussians plus a fast deformation field (e.g., HexPlanes + small MLP) transforms them for the current time t. Why: Efficient dynamic scenes without per-frame storage bloat.
Neural avatars: Canonical Gaussians in a neutral pose plus skinning weights; input pose/shape parameters deform them via LBS inside ONNX. Why: Real-time, controllable human rendering.
Generative post-processing: Feed the final frame into an ONNX U-Net for denoise/enhance/style. Why: Creative control and quality, in-browser.

Data flow and formats:

Inputs: 3DGS assets (or embedded in ONNX), camera parameters, frame index/timestamp, avatar pose controls.
Intermediate GPU buffers: FP16-packed positions, covariances (upper-triangular), colors/SH, per-splat ellipse axes, NDC center, depth keys, indices.
Outputs: Final rendered RGBA frame (optionally enhanced).

Why this method is clever:

Contract-first design: New algorithms plug in via ONNX without changing the renderer.
GPU-first execution: Heavy lifting (preprocess, sort, draw) happens on the GPU every frame.
Correctness under motion: True global sorting avoids the visual glitches seen in lazy/local strategies.
Practical ONNX tweaks: Graph capture and post-export rewriting reduce overhead and improve reliability in browsers.

Concrete mini-examples:

Static scene: Load a 6M-Gaussian model; no per-frame ONNX needed. GPU preprocess + global sort + draw = ~2 ms on high-end hardware.
Dynamic scene (4DGS): Input t=0.37 s; ONNX returns deformed Gaussians; renderer draws them that frame.
Avatar: Input SMPL-X pose; ONNX returns skinned Gaussians; renderer draws a waving character instantly.
Style: After drawing, ONNX U-Net adds watercolor style; users can toggle it in real time.

04Experiments & Results

The test: The authors evaluated two things—speed at scale and visual robustness—because a viewer must be both fast and stable. They used identical 3DGS assets and camera paths across tools so comparisons were fair. They also measured image quality on a public dataset to ensure speed didn’t hurt fidelity.

Competition (baselines): SparkJS and SuperSplat are well-known WebGL-based viewers. SparkJS does sorting on the CPU and uses a lazy update strategy to reduce cost; SuperSplat avoids a full global order by sorting locally in partitions.

Scoreboard with context:

Runtime on a big scene (the classic “bicycle” with ~6M Gaussians): • SparkJS: ~172.87 ms sorting + ~4.03 ms prep/draw ≈ 176.90 ms total per frame. That’s like trying to run a 100 m race with ankle weights. • Visionary: ~0.58 ms sorting + ~1.52 ms prep/draw ≈ 2.09 ms total per frame. That’s like sprinting the same race almost weight-free. • Speedup: Up to ~135× end-to-end faster under identical assets, thanks to GPU-first preprocess and a true global GPU radix sort.
Scaling down the same asset (1/2, 1/4, 1/8): Visionary’s total times were ~1.09 ms, ~0.60 ms, and ~0.40 ms, staying tiny and stable; SparkJS stayed dominated by CPU sorting (e.g., ~145.75 ms at half scale).

Quality metrics (MipNeRF360):

SparkJS: PSNR 27.315, SSIM 0.825, LPIPS 0.253.
Visionary: PSNR 27.867, SSIM 0.828, LPIPS 0.249. Interpretation: Visionary slightly improves fidelity while being much faster. That’s like getting a sharper photo even though you took it ten times quicker.

Robustness under rapid viewpoint changes:

SparkJS’s lazy sorting can fall behind when the camera rotates quickly. The depth order becomes stale, which breaks alpha blending and causes popping/streaking artifacts. In demos, these artifacts are clearly visible during fast motion.
Visionary re-sorts globally every frame on the GPU, so alpha compositing stays correct and stable, even under quick camera moves.

Composition correctness vs. SuperSplat:

SuperSplat’s local partition sorting is efficient but not equivalent to a single global order. When splats from different partitions overlap, blending can be wrong, leading to depth-inconsistent transparency.
Visionary maintains one global buffer and one global sort for all valid splats across all loaded models. Result: correct compositing for multi-asset scenes (splat+splat or splat+mesh).

ONNX pre-decoding overhead (feasibility of dynamic content):

MLP-based 3DGS (Scaffold-GS): For scenes producing ~2.49M–4.56M Gaussians, ONNX inference took ~9.29–16.10 ms per frame.
4DGS: For outputs around 0.03M–0.06M Gaussians, ONNX inference took ~4.76–7.93 ms per frame.
Avatars (e.g., GauHuman, R3-Avatar): Per-frame ONNX runtimes ranged from ~7–8 ms for typical settings, with fewer Gaussians than large static scenes. Interpretation: These per-frame inference times are compatible with real-time rendering on modern hardware, showing that a single browser pipeline can handle both fast drawing and per-frame neural updates.

Surprises and takeaways:

Moving sorting from CPU to GPU didn’t just speed things up; it also fixed visual hiccups during fast motion by ensuring a correct back-to-front order every single frame.
Quality didn’t drop despite aggressive speed; in fact, Visionary’s design choices (avoiding overly aggressive quantization, compute-based preprocessing) nudged PSNR/SSIM upward and LPIPS downward.
No other web viewer in the study supported the same breadth of dynamic 3DGS variants and in-browser generative post-processing with comparable performance, highlighting Visionary’s unique web-native scope.

05Discussion & Limitations

Limitations:

Browser variability: WebGPU and ONNX runtimes are advancing quickly but still maturing. Different browsers/OS/hardware can show subtle differences in performance or stability.
Memory ceilings: Browser security and memory policies limit how big models/assets can be. Extremely large neural post-processors or huge multi-scene loads may exceed practical in-browser limits today.
Hardware expectations: While Visionary runs in the browser, smooth performance for giant scenes or per-frame AI decoding still benefits from a modern GPU.

Required resources:

A WebGPU-capable browser (recent Chrome/Edge/Safari/Firefox builds that support WebGPU).
ONNX Runtime Web backend for model inference.
GPU with enough VRAM for the target scene (especially for millions of splats) and any post-processing model.

When not to use:

Ultra-constrained devices where WebGPU or ONNX acceleration is unavailable or unstable.
Workflows requiring offline film-quality path tracing, advanced GI, or heavy physics beyond what current WebGPU/ONNX in-browser stacks can handle.
Massive, city-scale multi-scene bundles that exceed browser memory policies without careful asset streaming/compression.

Open questions:

Streaming and out-of-core: How best to stream giant scenes (and dynamic models) into the browser without stutters or memory spikes?
Physics coupling: What’s the right browser-friendly interface to integrate robust physics (e.g., MPM) with splats for interactive, physically grounded worlds?
Cross-browser determinism: How to ensure perfectly consistent results across GPUs/drivers/browsers as WebGPU evolves?
Mobile-first performance: Which optimizations (quantization, tiling, LOD, adaptive decoders) unlock truly smooth dynamic scenes on mobile GPUs?
Tooling ecosystem: What authoring tools and validators help creators package ONNX models that perfectly match the Gaussian Generator contract with minimal friction?

06Conclusion & Future Work

Three-sentence summary: Visionary is a web-native platform that unites per-frame ONNX inference with a high-throughput WebGPU Gaussian Splatting renderer, enabling dynamic, real-time neural scenes directly in the browser. Its Gaussian Generator contract standardizes how models output Gaussians, so many 3DGS variants—MLP-based, 4D, and avatars—plug in without renderer changes. Compared to WebGL viewers, Visionary delivers massive speedups and better compositing correctness, while also supporting optional in-browser generative post-processing.

Main achievement: Turning the browser into a universal, click-to-run world-model carrier—where compute (inference), graphics (rendering), and creativity (post-processing) live together, standardized by a simple ONNX contract and powered by GPU-first design.

Future directions: Integrate physics and collision with mesh pipelines; explore physics-aware splat dynamics; connect to vectorized simulators for embodied AI; add relighting and domain adaptation; and refine asset streaming for huge, city-scale worlds. As WebGPU/ONNX mature, expect broader device coverage, stronger determinism, and even richer dynamic content.

Why remember this: Visionary shows that advanced, dynamic 3D neural rendering doesn’t have to be locked in heavyweight desktop stacks—it can be fast, flexible, and shareable with a link. The combination of a standard model contract, GPU-global sorting, and optional generative finishing forms a practical blueprint for future interactive world models on the open web.

Practical Applications

•Embed interactive 3D product viewers in e-commerce pages that run instantly with no installs.
•Create education labs where students explore digital twins (science exhibits, historical sites) in real time.
•Prototype game scenes mixing meshes and splats, with dynamic AI-driven effects, directly in the browser.
•Test and compare 3DGS research variants (MLP-based, 4DGS, avatars) side-by-side using a standard contract.
•Build avatar try-on experiences that update poses and styles per frame for virtual fitting rooms.
•Preview robotics or autonomous driving scenes as explicit 3D states for debugging and planning.
•Apply real-time artistic styles or denoising to 3D renders without leaving the web app.
•Share reproducible research demos: send a URL and let peers run the exact model and assets locally.
•Develop world-model frontends where AI agents interact with stable, physically plausible 3D memories.
•Integrate with simulation backends to visualize trajectories and collisions with correct splat/mesh occlusion.

Version: 1