Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Zhitao He; Zongwei Lyu; Yi R Fung

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner

Zhitao He, Zongwei Lyu, Yi R Fung1/22/2026

arXiv PDF

Key Summary

•Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.
•This paper builds RebuttalAgent, which uses Theory of Mind (imagining what the reviewer thinks) to plan and write better rebuttals.
•The process is split into three steps called TSR: first guess the reviewer’s perspective (ToM), then pick a strategy, then write the response.
•A new dataset, RebuttalBench (70K+ examples), teaches the model how to analyze reviewers, choose strategies, and respond.
•Training happens in two phases: supervised fine-tuning to learn the steps, then reinforcement learning with a self-reward signal to improve.
•A custom judge, Rebuttal-RM (trained on 100K+ scored samples), evaluates rebuttals and aligns with human preferences better than GPT-4.1.
•Across tests, RebuttalAgent improves over its base model by an average of 18.3% and matches or beats strong proprietary models in both automated and human evaluations.
•The biggest gains come in persuasiveness and constructiveness, meaning the responses are clearer, more convincing, and more helpful.
•The system avoids making up experiments by focusing on language-based persuasion and grounding every claim in the manuscript.
•This work shows that adding perspective-taking to AI can make scientific conversations more respectful, focused, and productive.

Why This Research Matters

Peer review shapes which ideas advance and which careers grow. By helping authors understand reviewers’ perspectives and respond with the right evidence and tone, RebuttalAgent can reduce misunderstandings and improve fairness in decisions. Early-career researchers get concrete guidance for turning tough comments into clear, respectful, and effective replies. Conferences and journals benefit from higher-quality dialogue, saving time for both authors and reviewers. The approach also shows how perspective-taking (ToM) can upgrade AI from polite text generator to strategic collaborator. In the long run, this can foster a more constructive, transparent, and efficient scientific ecosystem.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

The World Before: Imagine two people trying to solve a puzzle together while standing on opposite sides of a wall. They can’t see each other’s hands, only hear each other’s voices. That’s what academic rebuttals are like: authors and reviewers want a better paper, but they don’t fully see each other’s assumptions, knowledge, or concerns. Before this work, AI was already helping scientists with lots of tasks like summarizing papers, brainstorming ideas, and even helping design experiments. But rebuttals remained tricky. Many AI systems could write polite sentences, yet they often missed the deeper goal—changing minds through careful strategy. They tended to copy the surface style of past rebuttals without understanding what the reviewer truly cared about.

The Problem: Rebuttals are not just mini-essays; they are strategic conversations under uncertainty. Authors must decide when to concede, when to clarify, how to use evidence, and how to address misunderstandings. The hard part is information asymmetry: you don’t know the reviewer’s background, biases, or decision criteria. If an AI only imitates polite phrasing, it may sound nice but fail to persuade. That means lost chances to fix misunderstandings, missed opportunities to highlight real strengths, and wasted space on less important points.

Failed Attempts: Earlier methods mainly used supervised fine-tuning on collections of reviews and rebuttals. This helped models mimic politeness and structure (for example, “Thank you for your comment…”), but the responses often felt formulaic. They didn’t organize arguments by what the reviewer most cared about. They rarely allocated effort wisely—spending too much time on minor typos and too little on the main critique. Attempts to just “make the model think harder” without guidance often produced longer texts, not smarter strategy.

The Gap: What was missing was perspective-taking—the ability to model the reviewer’s likely beliefs, intentions, and priorities. In cognitive science, that’s called Theory of Mind (ToM). If an AI could infer the reviewer’s stance (e.g., Probably Reject vs. Constructive), core concerns (e.g., Experimental Rigor vs. Presentation), and expertise level, then it could choose better strategies. For example, a skeptical, methods-focused expert might need precise evidence and clear proofs, while a constructive generalist might need big-picture framing and clarifying summaries.

Real Stakes: Why should anyone care? Because academic decisions shape careers, funding, and the direction of science. A well-crafted rebuttal can turn a borderline decision into an acceptance by clearing up confusion or offering precise fixes. For early-career researchers, guidance on tone and strategy can be life-changing. For the scientific community, better rebuttals reduce friction, make discussions more evidence-based, and help good ideas rise. In daily life terms: it’s like learning to explain your homework to a teacher who sees it differently than you do. If you can understand what they find confusing and respond clearly, you’re far more likely to succeed.

This Paper’s Focus: This work reframes rebuttal as a strategic game with hidden information and proposes a model that first builds a mental picture of the reviewer, then plans a tailored strategy, and finally writes an evidence-grounded response. It builds a large dataset that teaches this three-step process, trains the model to follow it, and then uses reinforcement learning with a self-reward system to keep improving. Finally, it creates a specialized judge, Rebuttal-RM, to evaluate rebuttals consistently and fairly. The result: clear, persuasive, context-aware replies that outperform strong baselines in both automated and human tests.

02Core Idea

The “Aha!” Moment in One Sentence: If AI learns to imagine what the reviewer is thinking (Theory of Mind), it can choose smarter strategies and write rebuttals that truly persuade, not just sound polite.

Multiple Analogies:

Theater analogy: The reviewer is the audience; ToM is reading the room. TSR means: understand the audience → plan the act → deliver the lines with evidence.
Sports analogy: Before taking a shot, read the defender (ToM), pick the play (Strategy), then execute (Response) with support from teammates (retrieved evidence).
Detective analogy: First profile the suspect (ToM), then plan the interrogation (Strategy), and finally present the case with proof (Response).

Before vs After:

Before: Models wrote nice-sounding text but often missed the reviewer’s real worries; effort was spread thin; important misunderstandings stayed unresolved.
After: The model builds a reviewer profile, ranks what matters, picks a strategy that fits the person and the point, and backs every claim with the paper’s evidence.

Why It Works (Intuition): People change their minds when they feel understood and see credible, tailored evidence. ToM gives the “feel understood” part by modeling stance, attitude, and core concerns. Strategy converts that understanding into a plan that prioritizes the big issues and chooses the right tone (e.g., clarify vs. concede vs. reframe). The response step weaves in proof from the manuscript so claims are grounded, not guessed. Reinforcement learning then nudges the system toward outputs that are not only correct but convincingly structured and diverse in style (not templated), which humans prefer.

Building Blocks (each introduced with the Sandwich pattern):

🍞 You know how you guess what a friend is thinking when deciding how to explain something to them? 🥬 Theory of Mind (ToM): It is the skill of imagining another person’s beliefs and concerns to predict how they’ll react. How it works: (1) Read the review. (2) Infer stance, attitude, dominant concern, and expertise (macro). (3) For each comment, classify the issue type and severity (micro). (4) Use this profile to guide tone and priorities. Why it matters: Without ToM, the reply may focus on the wrong things and fail to persuade. 🍞 Example: If the reviewer is a domain expert worried about missing baselines, you lead with comparisons and evidence, not just wording fixes.

🍞 Imagine planning a road trip: you first pick the route, then drive. 🥬 TSR Pipeline (ToM–Strategy–Response): It is a three-step recipe to turn understanding into action. How it works: (1) Build the reviewer profile (ToM). (2) Choose a step-by-step strategy aligned with that profile. (3) Write the response using retrieved manuscript evidence. Why it matters: Without the plan step, the model jumps into writing and drifts off-topic. 🍞 Example: For a “major: baselines weak” comment, the strategy might be “acknowledge → cite exact table → explain fairness → propose revision to add missing baseline.”

🍞 Think of sorting mail into big bins first, then into exact slots. 🥬 Hierarchical Reviewer Profile (Macro + Micro): It is a two-level map of who the reviewer likely is (macro) and what each comment really asks (micro). How it works: (1) Macro: infer stance, attitude, concern, expertise. (2) Micro: for each comment, assign category (e.g., Experimental Rigor) and severity (major/minor). Why it matters: Without this structure, the model can’t target the main pain points. 🍞 Example: Macro says “Skeptical, Rigor-focused Expert”; Micro flags “Baselines Missing, Major,” so the response leads with evidence.

🍞 You know how a coach gives you a game plan before you step onto the field? 🥬 ToM-Driven Strategy: It is a compact, high-level plan linking the profile to concrete steps. How it works: (1) Read profile + comment. (2) Pick actions (acknowledge, clarify, cite, propose fix). (3) Set tone and order. Why it matters: Without strategy, the response can be long but aimless. 🍞 Example: “1) Validate concern; 2) Point to Table 2; 3) Explain dataset splits; 4) Commit to adding baseline X.”

🍞 Imagine writing a school report with notes from your textbook highlighted. 🥬 Strategy-Guided, Evidence-Grounded Response: It is the final text that follows the plan and cites the manuscript. How it works: (1) Pull relevant chunks from the paper. (2) Weave them with the strategy steps. (3) Keep tone aligned with the profile. Why it matters: Without evidence, claims feel hollow; without tone control, even good points can sound defensive. 🍞 Example: “As shown in Section 4.2, we evaluated on three seeds; we will add baseline Y in the revision.”

🍞 Think of practicing piano pieces a teacher picked for you. 🥬 RebuttalBench: It is a 70K+ example dataset that teaches the full chain (Analysis, Strategy, Response). How it works: (1) Extract comments and context. (2) Use strong teacher models to create high-quality TSR examples. (3) Mix sources to avoid one writing style. Why it matters: Without rich examples, the model can’t learn nuanced planning. 🍞 Example: The dataset includes tags like <Analysis>, <Strategy>, <Response> so the model learns the structure.

🍞 Picture a coach who gives you points when you play well and also explains why. 🥬 Self-Reward Reinforcement Learning: It is a way for the model to grade and improve its own outputs across multiple qualities. How it works: (1) Check format (did it include Analysis/Strategy/Response?). (2) Score thinking quality. (3) Score response persuasiveness and clarity. (4) Reward stylistic diversity (avoid clichés). Why it matters: Without these rewards, models can “game” style or forget structure. 🍞 Example: A templated, listy answer gets a lower diversity score; a nuanced, narrative reply gets a higher one.

🍞 Imagine a fair judge who knows what makes a good explanation. 🥬 Rebuttal-RM: It is a specialized evaluator trained on 100K+ examples to score Attitude, Clarity, Persuasiveness, and Constructiveness. How it works: (1) Read comment + context + response. (2) Output multi-dimensional scores and a rationale. (3) Align closely with human preferences. Why it matters: Without a good judge, it’s hard to measure progress reliably. 🍞 Example: Rebuttal-RM’s scores correlate with human ratings better than GPT-4.1.

🍞 Think of a teacher who first shows you the format and then lets you practice. 🥬 Supervised Fine-Tuning (SFT): It is the phase where the model learns by example how to do the TSR steps. How it works: (1) Feed many TSR-labeled samples. (2) Learn to output analysis → strategy → response. Why it matters: Without SFT, the model lacks the structured “muscle memory.” 🍞 Example: It learns to start by identifying macro/micro concerns before drafting text.

🍞 Imagine ranking several possible answers from a group and nudging the best ideas forward. 🥬 GRPO (Group Reward Policy Optimization): It is the RL method that compares multiple candidate responses and updates the policy toward higher-reward ones. How it works: (1) Generate a group of candidates. (2) Score each with the self-reward signals. (3) Update the model to prefer the better ones while staying near a reference. Why it matters: Without group comparison, the model can’t reliably tell good from slightly-better. 🍞 Example: Of five drafts, the one with clear evidence and natural style wins more reward and guides learning.

03Methodology

At a high level: Manuscript + Review + Target Comment → (A) Extract the exact comment and retrieve relevant paper chunks → (B) Build a hierarchical reviewer profile (macro + micro) → (C) Generate a ToM-driven strategy → (D) Write a strategy-guided, evidence-grounded response → (E) Train via SFT → (F) Improve via RL with self-reward → (G) Evaluate with Rebuttal-RM.

Step A: Comment Extraction and Context Retrieval

What happens: The system turns a messy review into clear, distinct comments and finds the most relevant parts of the manuscript for each comment. It uses an LLM as an extractor to split the review into actionable units, then encodes the comment and all manuscript chunks with embeddings and picks the top-k most similar chunks by cosine similarity.
Why it exists: If you feed the whole paper and the entire review, the model drowns in details and misses the point. Extraction makes the task focused; retrieval provides the exact evidence needed.
Example: If the comment is “Baselines missing for task X,” the retrieval grabs the results table and the experimental setup paragraphs.

🍞 Imagine highlighting the important lines in a textbook before writing your answer. 🥬 Context Retrieval: It is selecting only the most relevant paper pieces for a given comment. How it works: (1) Split the paper into chunks. (2) Embed comment and chunks. (3) Use cosine similarity to pick top-k chunks. Why it matters: Without this, the model may cite the wrong section or ramble. 🍞 Example: For a question on training data size, it retrieves the dataset and ablation paragraphs.

Step B: Hierarchical Reviewer Profile (Macro + Micro)

What happens: Macro infers overall stance, attitude (e.g., constructive), dominant concern (e.g., Experimental Rigor), and expertise. Micro classifies each comment into categories like Significance, Methodology, Experimental Rigor, or Presentation with severity.
Why it exists: This creates a map of what truly matters so the model can prioritize big issues and match tone.
Example: Macro says “Constructive, Rigor-focused Expert”; Micro marks “Baselines Missing/Weak, Major.”

🍞 Think of first reading the teacher’s mood, then the exact question on the quiz. 🥬 Hierarchical Reviewer Profile: It is a two-layer understanding of the reviewer and each comment. How it works: (1) Macro labels stance/attitude/concern/expertise. (2) Micro labels category/severity per comment. Why it matters: Without it, the response can sound polite but off-target. 🍞 Example: Major rigor issues get top billing; typos get brief fixes.

Step C: ToM-Driven Strategy Generation

What happens: Given the profile and the specific comment, the model writes a short plan: which actions to take, in what order, and in what tone (e.g., acknowledge → cite table → clarify method → propose revision).
Why it exists: Planning first prevents drifting, repetition, and over/under-reacting.
Example: For “Flawed evaluation,” the plan could be: validate concern → explain evaluation metric choice → show robustness check → commit to adding missing metric.

🍞 Like sketching an outline before writing the essay. 🥬 ToM-Driven Strategy: It is a concise roadmap aligned with the reviewer’s likely priorities. How it works: (1) Read profile + comment. (2) Pick actions and order. (3) Set tone. Why it matters: Without it, the response may be long but unconvincing. 🍞 Example: “Step 1: Acknowledge; Step 2: Point to Sec. 3.2; Step 3: Provide numbers; Step 4: Offer fix.”

Step D: Strategy-Guided Response with Evidence

What happens: The model drafts the final rebuttal by following the plan and weaving in the retrieved chunks. During data synthesis (not deployment), it can also look at the author’s original response as a reference to refine phrasing.
Why it exists: This guarantees that claims are traceable to the manuscript and that the tone matches the inferred profile.
Example: “As shown in Table 2 (rows 3–5), we compare against baselines A and B; we will add baseline C in the revision for completeness.”

🍞 Like answering a question with your notes open to the correct page. 🥬 Strategy-Guided, Evidence-Grounded Response: It is the final message that follows the plan and cites the paper. How it works: (1) Insert relevant facts, tables, and sections. (2) Keep to the plan’s order and tone. Why it matters: Without grounding, the text could be confident but wrong. 🍞 Example: Citing the exact figure that resolves the reviewer’s confusion.

Step E: Supervised Fine-Tuning (SFT) on RebuttalBench

What happens: The model learns from 70K+ examples that include explicit <Analysis>, <Strategy>, and <Response> tags created via a critique-and-refine process using strong teacher models.
Why it exists: Structure has to be learned by imitation before it can be refined; SFT gives the model the “TSR habit.”
Example: After SFT, the model reliably outputs an analysis first, a plan second, and a response third.

🍞 Think of learning a dance by copying the instructor’s steps. 🥬 Supervised Fine-Tuning: It is training on labeled TSR chains so the model learns the routine. How it works: (1) Feed tagged examples. (2) Learn to reproduce the structure. Why it matters: Without SFT, the model forgets to analyze before writing. 🍞 Example: It stops jumping straight to “thank you” paragraphs.

Step F: Reinforcement Learning (RL) with Self-Reward using GRPO

What happens: The model generates several candidate outputs, then scores them using four signals: (1) Format adherence (did it include all parts?), (2) Thinking quality (analysis/strategy depth), (3) Response quality (persuasiveness, clarity, evidence), and (4) Diversity (avoid formulaic, cliché writing). The Group Reward Policy Optimization (GRPO) algorithm then nudges the model toward the higher-scoring candidates while keeping it close to a reference policy.
Why it exists: SFT teaches the routine; RL polishes style, persuasiveness, and robustness, discouraging template-like replies and reward hacking.
Example: A stiff, list-heavy reply gets penalized on diversity; a smooth, narrative reply with evidence gets rewarded.

🍞 Picture a coach who grades multiple practice runs and tells you which one to imitate. 🥬 Self-Reward RL + GRPO: It is group-based comparison learning guided by multi-part rewards. How it works: (1) Sample several drafts. (2) Score each on structure, thinking, response, diversity. (3) Update policy toward better drafts. Why it matters: Without multi-signal feedback, the model may overfit to one trick (e.g., just being lengthy). 🍞 Example: The best of five drafts—clear, grounded, and human-like—pulls the model forward.

Step G: Rebuttal-RM Evaluator

What happens: A separate fine-tuned model (on 100K+ instances) assigns Attitude, Clarity, Persuasiveness, and Constructiveness scores and produces an explanation. It aligns with human ratings better than strong general judges.
Why it exists: Reliable, consistent evaluation lets us compare methods fairly and learn faster.
Example: On held-out data, Rebuttal-RM correlates best with human labels and outperforms GPT-4.1 as a judge.

Secret Sauce: Explicit decomposition (ToM → Strategy → Response) forces the model to decide how to respond before writing what to respond. Combined with targeted retrieval and a multi-signal self-reward (including a diversity reward that penalizes clichéd templates), the system becomes both strategic and natural, not just polite.

04Experiments & Results

The Test: The authors evaluate two things: (1) the judge, Rebuttal-RM—does it agree with humans? and (2) the rebuttal agent—does it write better responses than strong baselines? They test in-domain (R2-test drawn from a large multi-conference corpus) and out-of-domain (Rebuttal-test from recent ICLR/NeurIPS reviews) to check generalization. Each response is rated on a 0–10 holistic scale and broken down into four dimensions—Attitude (tone), Clarity (organization), Persuasiveness (argument + evidence), and Constructiveness (actionable improvements).

The Competition: RebuttalAgent is compared against strong foundation models (o3, GPT-4.1, DeepSeek-R1/V3, Gemini-2.5, GLM-4-9B, Llama-3.1-8B, Qwen3-8B) and agent-style baselines (Self-Refined and Strategy-Prompt). There is also RebuttalFT, a supervised fine-tuned model on human rebuttals without the ToM pipeline.

The Scoreboard with Context:

Judge Alignment: Rebuttal-RM shows the highest agreement with human raters across multiple statistics (e.g., Pearson/Spearman/Kendall correlations and accuracy by score ranges), surpassing GPT-4.1. This means its scores are a trustworthy proxy for human preferences.
Automated Evaluation: On R2-test, RebuttalAgent achieves the top average score (~9.42), beating strong baselines including GPT-4.1 and o3, with especially big gains in Persuasiveness and Constructiveness (up to +34.6% vs. base Qwen3-8B). Relative to its base model, RebuttalAgent improves by an average of 18.3% across metrics.
Human Evaluation: On a 100-instance subset rated by three experienced annotators (κ = 0.79 agreement), RebuttalAgent obtains the highest average score (~9.57), outpacing GPT-4.1 and o3. The largest relative gain is in Persuasiveness, confirming that the model’s strategy and evidence use resonate with human judges.

Ablations and What They Mean:

Remove ToM, Strategy, or the explicit “Thinking” step? Scores drop. This shows each part of the TSR pipeline is necessary.
Training-only changes: SFT-only or RL-only underperform the full pipeline; both stages are complementary.
Reward components: The response-quality reward is most impactful, and the diversity reward helps prevent formulaic writing (mitigating reward hacking).
Backbone Swap: Applying the TSR + self-reward framework to other models (e.g., Llama-3.1-8B, Qwen3-4B) still yields large gains, indicating the approach is model-agnostic.

Surprising/Notable Findings:

ToM-as-Context Helps Others: Feeding the ToM analysis and strategy as extra context boosts external base models’ performance (e.g., +21% in Presentation for Qwen3-8B), suggesting the reasoning artifacts are broadly useful.
Better Than a General Judge: A specialized evaluator (Rebuttal-RM) trained on rebuttal data aligns more with humans than a powerful general-purpose judge, highlighting the importance of task-specific judging.
Style Matters: Penalizing clichéd templates via a diversity reward meaningfully improves human preference without sacrificing evidence use or structure.

Takeaway: The numbers show that perspective-taking plus planning and evidence grounding produces responses that humans find clearer, kinder, and more convincing than those from strong, general-purpose models.

05Discussion & Limitations

Limitations:

Scope Boundaries: The system intentionally avoids comments that demand new experiments or unavailable data to prevent hallucination. This means it focuses on language-level persuasion, not fresh empirical results.
ToM Fallibility: Inferring stance and concerns from text can be wrong, especially with short or ambiguous reviews. Mis-profiling can misguide strategy and tone.
Domain Shift: While out-of-domain tests look good, unusual venues, formats, or highly specialized subfields may degrade performance until more data are added.
Style Bias: Despite diversity rewards, training data styles can still influence tone. Some communities may prefer different rhetorical norms.

Required Resources:

Compute: SFT and RL with GRPO benefit from multi-GPU setups (e.g., A100/H800). Inference on a single 8B model is moderate, but retrieval and profiling add overhead.
Data: Access to manuscript text and reviews; the retrieval system needs embeddings and chunked documents.
Guardrails: Optional human-in-the-loop review to catch misprofiling or tone mismatches.

When NOT to Use:

If the reviewer requests new experiments or unavailable measurements that you cannot provide—this system won’t fabricate results and is not intended to do so.
When institutional or venue policies prohibit AI-assisted writing for rebuttals.
For non-scholarly contexts where the tone/format radically differ (e.g., legal filings).

Open Questions:

Can ToM profiling be made verifiable (e.g., uncertainty estimates) to alert authors when the inferred stance is shaky?
How to incorporate safe, provenance-tracked external evidence (e.g., arXiv citations) without risking hallucination?
Could multi-agent debate among multiple reviewer profiles further improve strategy selection?
How can we adapt style to different communities (e.g., theory vs. systems) automatically while preserving substance?
What are the best ways to teach models to propose precise, feasible revision commitments that fit page limits and timelines?

Overall, RebuttalAgent is a strong step toward empathetic, evidence-based scientific dialogue, with clear room for safer profiling, broader generalization, and richer, community-aware styles.

06Conclusion & Future Work

Three-Sentence Summary: This paper reframes academic rebuttal as strategic persuasion under hidden information and builds RebuttalAgent to first understand the reviewer (Theory of Mind), then plan a strategy, and finally write an evidence-grounded response. It trains on a large, structured dataset (RebuttalBench), improves with self-reward reinforcement learning (via GRPO), and evaluates with a specialized judge (Rebuttal-RM) that aligns with humans better than GPT-4.1. The result is large, consistent gains in clarity, persuasiveness, and constructiveness across both automated and human evaluations.

Main Achievement: Making perspective-taking operational—by turning reviewer modeling into a practical, stepwise TSR pipeline—so AI rebuttals become not just polite, but strategically persuasive and well-evidenced.

Future Directions: Add uncertainty-aware ToM profiling, extend retrieval to vetted external sources, adapt tone across communities automatically, and explore multi-agent reasoning among multiple reviewer archetypes. Investigate richer commitments (e.g., precise revision plans) and guardrails that highlight when the model’s inferences might be wrong. Continue improving the evaluator with more diverse human labels and fairness checks.

Why Remember This: It shows that empathy-like reasoning (ToM) is not fluff—it measurably improves outcomes in high-stakes, real-world scientific communication. The simple recipe—Understand → Plan → Respond—paired with diverse, self-aware training and a task-specific judge, turns AI from a generic writer into a strategic collaborator who helps move science forward.

Practical Applications

•Draft rebuttal plans that prioritize the reviewer’s main concerns before writing full responses.
•Generate respectful, evidence-grounded replies that cite exact sections, tables, or figures.
•Adapt tone based on inferred stance (e.g., skeptical vs. constructive) to avoid sounding defensive.
•Create concise commitments (e.g., specific revisions) for major issues while efficiently addressing minor points.
•Provide ToM analyses and strategies as context to boost other base models’ rebuttal quality.
•Use the evaluator (Rebuttal-RM) to pre-score and compare multiple rebuttal drafts before submission.
•Flag potential misalignments (e.g., major rigor concerns left unaddressed) using macro/micro profiles.
•Train institution-specific rebuttal assistants by fine-tuning the TSR pipeline on local review styles.
•Support mentoring by turning critiques into step-by-step action plans for junior researchers.
•Build writing checklists (acknowledge → evidence → clarification → commitment) aligned with reviewer profiles.

Version: 1