ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Zhihang Liu; Xiaoyi Bao; Pandeng Li; Junjie Zhou; Zhaohe Liao; Yefei He; Kaixun Jiang; Chen-Wei Xie; Yun Zheng; Hongtao Xie

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate

Zhihang Liu, Xiaoyi Bao, Pandeng Li et al.12/15/2025

arXiv PDF

Key Summary

•ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.
•It combines a Multi-Modal Large Language Model (MLLM) to plan and critique images with a diffusion model to draw and edit them.
•The pipeline has four steps: Rewriting (plan the picture), Generation (make the first image), Reflection (spot mistakes), and Refinement (fix them).
•A special training set teaches the planner (rewriting) using examples, and the editor (refinement) using rewards for better fixes.
•The team built TableVisBench, an 800-example test that checks five things: data accuracy, text rendering, relative proportions, extra info accuracy, and overall look.
•Across many popular image models, ShowTable boosts scores a lot, especially on getting the numbers and proportions right.
•A key finding is that the editor (refinement model) can be the bottleneck; training it with a reward model and RL makes the self-correction loop actually improve results round by round.
•This matters for real jobs like slides, reports, posters, and news graphics where pretty pictures also must be faithful to the data.
•The approach is modular, so stronger planners, critics, or editors can plug in to get even better results.
•Limitations include reliance on base model quality, multiple components to run, and challenges with very tiny text or strict vector outputs.

Why This Research Matters

Charts guide decisions in school, business, health, and news, so they must be both clear and correct. ShowTable’s loop produces visuals that match the data precisely, reducing human errors like wrong labels or mismatched bar heights. It also saves time by automating planning and polishing, which is especially helpful for people who aren’t design experts. The modular design means it can improve as better planners, critics, and editors appear. By defining a tough benchmark, the paper helps the community measure real progress, not just pretty pictures. In short, it pushes AI from “artistic but loose” toward “creative and trustworthy,” which is what real-world reporting needs.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

You know how a great science fair poster doesn’t just list numbers, it tells a clear story with neat charts and readable labels? Before this paper, AI could make stunning photos of cats or landscapes, and even do cool posters, but it struggled when a picture had to match exact numbers from a table—like making the 30% pie slice really be 30%, not just “kinda big.” In everyday tools, people rely on charts to be both attractive and honest. If the bar for 50 is taller than the bar for 60, the whole message breaks.

🍞 Top Bread (Hook) Imagine you’re baking cupcakes for friends. Each friend gets a different number, and you must arrange them in a tray with labels. If the labels don’t match the counts, people get upset. Charts are like that—pretty trays that must match the exact recipe of numbers.

🥬 The Concept: Creative Table Visualization

What it is: A task where an AI turns a data table into a single infographic that is both beautiful and precisely faithful to the numbers.
How it works (recipe):
1. Read a dense table and decide on a good chart style and layout.
2. Draw the visual so sizes and labels perfectly match the numbers.
3. Check and correct any mistakes in labels, proportions, or layout.
Why it matters: Without this, AI may make nice-looking but misleading charts, which can confuse decisions in school projects, news, or business.

🍞 Bottom Bread (Anchor) Example: A table lists smartphone usage by app. A correct infographic must make the Instagram bar exactly as tall as the Instagram number says, not just “tall-ish.”

The problem researchers faced is that normal text-to-image systems don’t naturally understand strict rules like “a bar exactly twice as tall must mean exactly twice the value.” They treat charts like any other picture—good-looking, but not necessarily accurate. People tried a few paths:

Code-based charting: Have an LLM write plotting code (like Matplotlib). This gives accuracy, but looks less flexible or artistic and depends on strict templates.
Template retrieval: Find a similar chart and edit it. This can be neat, but breaks when the new data doesn’t fit the old template well.
Unified models: Big multimodal models that both understand and generate images. They’re improving, but still miss tiny details like exact bar heights or perfect text.

So there was a gap: Could we get the creativity of image generators plus the exactness of data tools—together? This paper proposes ShowTable, a pipeline where a smart planner (an MLLM) and a careful artist (a diffusion model) work as a team, and then check and fix their own work in a loop.

Real stakes are high. Charts are used for science classes, business pitches, public health posters, and breaking news. If the numbers are wrong or the labels misspelled, people can make bad choices. A system that can both design and be data-faithful saves time and reduces human error, helping anyone who needs to turn tables into trustworthy visuals.

02Core Idea

You know how when you write an essay, a friend can first outline your ideas, you write a draft, they mark mistakes, and you fix them? The key insight here is exactly that: split the job into plan → draw → check → fix, and loop until the picture tells the truth beautifully.

🍞 Top Bread (Hook) You know how a coach and a player work together? The coach plans and critiques; the player executes and practices until it’s right.

🥬 The Concept: A Coach–Player Team for Charts

What it is: A pipeline that uses an MLLM as the coach (planner and critic) and a diffusion model as the player (artist and editor), repeating a self-correcting loop.
How it works (recipe):
1. Rewriting: The coach reads the table and writes a clear, visual plan.
2. Generation: The player draws the first version from that plan.
3. Reflection: The coach checks the drawing against the table and lists precise fixes.
4. Refinement: The player edits the image to apply the fixes.
Why it matters: Without planning, generation is fuzzy; without reflection, mistakes stay; without refinement, feedback can’t be applied. All four lock together like gears.

🍞 Bottom Bread (Anchor) Example: If Pinterest and Instagram have the same value, reflection will notice their bars aren’t equal and tell the editor to make them match exactly.

Multiple analogies for the same idea:

Writing a story: outline → first draft → peer review → revision.
Building LEGO: plan the model → assemble → compare to the box picture → adjust bricks to match.
Cooking: choose the recipe → cook → taste and spot issues → adjust salt or cooking time.

Before vs. After:

Before: One-shot prompts often produced pretty but unreliable charts with typos or wrong sizes.
After: With plan–draw–check–fix, the image moves closer to the table data every round.

Why it works (intuition): Planning translates dense, structured tables into artist-friendly instructions. Reflection acts like a math teacher, comparing expected values to the picture’s geometry. Refinement lets the artist change only what’s wrong, keeping good parts intact. Iteration shrinks errors instead of letting them pile up.

Now the building blocks, explained simply and in order.

🍞 Top Bread (Hook) Imagine a teammate who can read text and look at pictures, like a scout who both listens and watches.

🥬 The Concept: Multi-Modal Large Language Model (MLLM)

What it is: A model that understands and reasons over both text and images, and writes detailed, structured instructions.
How it works (recipe):
1. Read the table (text) and sometimes view the image.
2. Plan a layout and wording.
3. Check for mistakes by comparing image to table.
4. Write exact edit commands.
Why it matters: Without a good coach, the artist gets vague goals and can’t improve.

🍞 Bottom Bread (Anchor) Example: The MLLM notices “Linkecin” is misspelled and says, “Change the label to ‘LinkedIn’ with the same font.”

🍞 Top Bread (Hook) Imagine an artist who starts with a noisy canvas and keeps refining until a scene appears.

🥬 The Concept: Diffusion Model

What it is: An image generator that turns random noise into a picture step by step, and can also edit a picture using instructions.
How it works (recipe):
1. Start with static (noise).
2. Follow the prompt to reveal shapes and text.
3. For edits, nudge parts to match new instructions.
Why it matters: Without the artist, the plan stays words instead of becoming a chart.

🍞 Bottom Bread (Anchor) Example: After getting “make the Pinterest bar equal to Instagram,” the diffusion editor tweaks just that bar’s height.

🍞 Top Bread (Hook) You know how you check your math homework, find mistakes, and fix them before turning it in?

🥬 The Concept: Progressive Self-Correcting Process

What it is: A loop of making something, inspecting it, and fixing it until it meets strict rules.
How it works (recipe):
1. Generate a first draft.
2. Compare the image to the table.
3. List exact mismatches (labels, sizes, missing items).
4. Apply targeted edits and repeat if needed.
Why it matters: Without this loop, small errors survive and mislead.

🍞 Bottom Bread (Anchor) Example: Round 1 fixes wrong logos; Round 2 equalizes two bars; Round 3 corrects a misspelled axis title.

Together, these pieces make accurate, attractive charts from plain tables, like turning ingredients and a recipe into a delicious, well-plated meal you can trust.

03Methodology

At a high level: Table → Rewriting (plan) → Generation (draw) → Reflection (check) → Refinement (fix) → Final infographic.

Step A: Rewriting (planning the visual)

What happens: The MLLM reads a dense markdown table, reasons about chart type, layout, colors, icons, and produces a rich, unambiguous prompt for the image model.
Why this step exists: Directly feeding a table to an image model often yields a rendered table or a messy picture, not a true visualization. Planning turns structure into a story the artist can follow.
Example: For a social media time-per-day table, the rewrite might say: “A vertical bar chart titled ‘Average Time Spent...’ with eight colored bars, correct logos at each bar’s base, Instagram and Pinterest equal height at 21 minutes, labels in bold sans-serif.”

🍞 Top Bread (Hook) Imagine a teacher giving you sample answers so you learn the right style faster.

🥬 The Concept: Supervised Fine-Tuning (SFT)

What it is: Training the planner (MLLM) on many examples of tables paired with excellent visual plans.
How it works (recipe):
1. Collect pairs: table → rationale → great description.
2. Show the model the inputs and the desired outputs.
3. Adjust the model until it predicts plans like the examples.
Why it matters: Without SFT, the planner might forget data points or pick awkward layouts.

🍞 Bottom Bread (Anchor) Example: After SFT, the planner stops missing subcategories in multi-level tables and includes all values.

Step B: Generation (making the first image)

What happens: The diffusion model draws an initial infographic from the rewritten prompt.
Why this step exists: You need a concrete draft before you can measure and fix errors.
Example: It draws bars roughly right, but two equal values might not look equal, or a label might be slightly off.

🍞 Top Bread (Hook) Imagine cleaning a foggy window by wiping away noise until you see the view.

🥬 The Concept: Diffusion Model (as generator/editor)

What it is: The image engine that can both create and later edit the picture with fine control.
How it works (recipe):
1. Start from noise, follow the prompt to reveal a chart.
2. Later, accept edit prompts to tweak specific parts.
3. Keep style and composition while fixing details.
Why it matters: Without a precise artist/editor, feedback can’t become a better image.

🍞 Bottom Bread (Anchor) Example: “Replace the Tumblr logo with the correct one” changes only that logo, not the whole chart.

Step C: Reflection (auditing the draft)

What happens: The MLLM compares the image with the table, checking five things: data accuracy, text rendering, relative sizes, extra info (axes, ticks), and aesthetics (via a separate scorer). It writes exact edit instructions.
Why this step exists: Humans need checklists; the model does too. It stops “pretty but wrong” charts.
Example: “Make the Pinterest bar exactly the same height as Instagram (21). Change ‘Linkecin’ to ‘LinkedIn’.”

Step D: Refinement (editing to fix)

What happens: The diffusion model applies edits: resizing bars, fixing spellings, adjusting legends, aligning to axes.
Why this step exists: Finding errors isn’t enough; you must fix them precisely without breaking good parts.
Example: The model shortens the Snapchat bar to 17, updates a misspelled label, and repositions a data tag.

🍞 Top Bread (Hook) Think of training a dog with treats: do the trick right, get a reward; do it wrong, try again.

🥬 The Concept: Reinforcement Learning (RL)

What it is: A way to train the editor so that successful fixes get higher “scores,” nudging it toward better future edits.
How it works (recipe):
1. Generate multiple edit attempts.
2. Score them with a reward model and an aesthetic scorer.
3. Update the editor to prefer higher-scoring edits.
Why it matters: Without RL, repeated edit rounds can drift or degrade the image instead of improving it.

🍞 Bottom Bread (Anchor) Example: After RL, each refinement round more reliably moves the donut slice toward exactly 81%.

🍞 Top Bread (Hook) Imagine a fair judge who compares two drawings and says which one better matches the instructions.

🥬 The Concept: Reward Model (RM)

What it is: A trained judge that assigns higher scores to images that match the table and instructions better.
How it works (recipe):
1. Build many pairs (better vs. worse) for the same prompt.
2. Train the judge to pick the better one reliably.
3. Use its score to guide RL for the editor.
Why it matters: Without a reliable judge, the editor learns from noisy signals and doesn’t improve.

🍞 Bottom Bread (Anchor) Example: The RM prefers the version where the “Yes” slice is exactly 81% and labels are typo-free.

Secret sauce

Specialized planning: SFT makes the rewrite step thorough and faithful to data.
Focused fixing: Reflection writes actionable, localized edits instead of vague critiques.
Reliable improvement: RL plus an RM turns iterative editing into steady progress, not ping-ponging mistakes.

A mini walk-through with data

Input table: Instagram 21, Pinterest 21, Facebook 40, Twitter 17...
Rewriting: “Title X, eight bars, equal heights for Instagram and Pinterest, correct logos...”
Generation: Draft has Instagram and Pinterest near-equal, but Snapchat at 13 is too tall, and ‘LinkedIn’ misspelled.
Reflection: “Lower Snapchat bar to 13, fix ‘Linkecin’ to ‘LinkedIn’, align axis ticks by 10s.”
Refinement: Edits applied; a second reflection finds all correct; stop.

04Experiments & Results

You know how a science fair has judges and scoring rubrics so projects are graded fairly? The authors built a test just like that.

🍞 Top Bread (Hook) Imagine a report card that checks not only if your answers are correct, but also if your handwriting is neat and your graphs are drawn to scale.

🥬 The Concept: TableVisBench Benchmark

What it is: A set of 800 challenging table-to-visualization tasks with a careful scoring system.
How it works (recipe):
1. Give a table and ask for a creative but accurate infographic.
2. Check five dimensions: data accuracy, text rendering, relative proportions, extra info accuracy, and aesthetics.
3. Use a deterministic way to count mistakes and an aesthetic model for beauty.
Why it matters: Without a fair, detailed test, improvements might be luck or style-only.

🍞 Bottom Bread (Anchor) Example: A bar chart is checked for missing categories, misspellings, bar height mismatches, axis tick logic, and overall look.

The test

Metrics:
1. Data Accuracy (DA): Did every number and label from the table appear correctly?
2. Text Rendering (TR): Are spellings and numbers readable and correct?
3. Relative Relationship (RR): Do sizes match values (e.g., 60% slice bigger than 30%)?
4. Additional Information Accuracy (AA): Are axes, ticks, and extra marks appropriate and logical?
5. Aesthetic Quality (AQ): Does it look well-designed?
Models compared: Popular generators like Flux, Bagel, Blip3o-Next, UniWorld-V1, OmniGen2, and Qwen-Image.
Conditions:
1. Base only.
2. With Rewriting (RW + Base).
3. Full pipeline: RW + Base + Reflection/Refinement (REF).

The scoreboard with context

Base models alone often flunk DA: some scored close to zero on faithfully mapping table data to visuals. That’s like turning in a pretty poster that gets the math wrong.
Adding Rewriting gives big logical gains: for Qwen-Image, RR jumped from about 26 to about 50—like boosting from a C- to a solid B in understanding proportions.
Full pipeline (RW+REF) wins overall: For Blip3o-Next, DA rose from around 0.5 to over 21, and TR from around 14 to about 64—like going from “didn’t show the work” to a strong pass with neat penmanship and correct answers. With Qwen-Image, the overall score improved from roughly 44 to almost 55.

Surprising findings

The editor can be the bottleneck: With a weaker editing model, more correction rounds actually reduced quality—like over-erasing a drawing until the paper tears.
Training the editor with RL and a reward model flips this: now each round improves or holds steady. That’s like practicing scales with a metronome and coach—steady, reliable progress.
Stronger executors raise the ceiling: Using a powerful editor (e.g., Wan-series) with ShowTable pushed results even higher, especially on DA and RR. The pipeline itself works; how high you climb depends on your climbing shoes.

Qualitative examples

Common fixes: equalizing bars that should match, correcting labels like ‘Linkecin’ to ‘LinkedIn’, aligning bar tops with proper axis ticks, and adjusting donut angles to exact percentages (e.g., 81% = 291.6°).
Adaptive behavior: When the first draft is already correct, reflection says “done” and stops early—no wasted compute.

Takeaway

Planning is necessary, reflection is essential, and a capable refinement model converts smart feedback into faithful pictures. Together they turn tables into infographics that are both gorgeous and honest.

05Discussion & Limitations

Limitations

Dependency on base models: The final quality caps out at how good the editor and generator are. Weak editors can’t reliably follow precise fix instructions.
Multi-part system: Running a planner, generator, critic, and editor increases engineering complexity and runtime compared to a single model.
Tiny text and micro-details: Very small fonts or dense annotations remain tough, especially under image-only editing (not vector).
Exact vector output: This method produces pixels, not plotting code or vector charts, which some workflows require for print or accessibility.
Domain drift: Extremely unusual table structures or unfamiliar iconography can still confuse planning or reflection.

Required resources

A capable MLLM for planning and reflection, a strong diffusion model for generation and editing, and GPU resources for iterative rounds.
Training data for SFT (rewriting), preference pairs for the reward model, and RL runs for editor improvement.

When not to use

If you need guaranteed vector outputs or executable plotting code (e.g., regulatory filings), traditional code-based charting may be safer.
For highly interactive dashboards where users hover and filter, this static infographic approach isn’t the right fit.
Extremely tight latency or compute budgets may prefer a simpler, single-pass template method.

Open questions

Unified end-to-end model: Can a single architecture plan, draw, check, and fix without multiple components?
Vector-aware editing: Can we blend pixel-level creativity with vector-level exactness for print-perfect outputs?
Better judges: Can reward models capture even subtler chart rules (e.g., log scales, dual axes) and style guidelines?
Robustness: How to guarantee monotonic improvement across more rounds and more challenging edge cases?
Human-in-the-loop: What’s the best way for a person to nudge the loop—approve, skip, or specify constraints—without breaking flow?

06Conclusion & Future Work

In three sentences: ShowTable turns plain tables into accurate, attractive infographics by pairing a planning-and-critique MLLM with a drawing-and-editing diffusion model in a self-correcting loop. Specialized training for the planner (via SFT) and the editor (via RL with a reward model) makes each iteration more faithful to the data. A new benchmark, TableVisBench, proves that this teamwork beats strong baselines on data accuracy, proportions, text quality, and overall look.

The main achievement is showing that splitting the job into rewrite → generate → reflect → refine, then training the planner and editor for those roles, unlocks reliable, round-by-round improvement in data-faithful visualizations.

What’s next: unify the whole loop into one model, add vector-level precision, and design smarter judges for tricky chart logic. Why remember this: it’s a recipe for trustworthy visuals—creative like an artist, careful like an accountant—that can save time, reduce errors, and help people make better decisions from their data.

Practical Applications

•Auto-generate slide-ready charts from quarterly tables with a one-click accuracy check-and-fix loop.
•Create consistent, on-brand report infographics that keep exact proportions and correct labels.
•Turn classroom data tables into student-friendly posters that are visually engaging and numerically faithful.
•Produce newsroom explainer graphics quickly while avoiding misleading scales or typos.
•Batch-convert spreadsheets into themed dashboards for internal briefs with minimal designer time.
•Assist researchers in making figures that follow strict data mapping for papers and posters.
•Localize charts (translate labels) while preserving exact geometry and layout.
•Clean up legacy charts by detecting and editing misspellings, wrong ticks, or misaligned bars.
•Generate social-media infographics that look great and keep numeric integrity for public trust.
•Prototype design variations (bar vs. donut vs. stacked) and auto-select the one that fits the data best.

Version: 1