Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Pierre Colombo; Malik Boudiaf; Allyn Sweet; Michael Desa; Hongxi Wang; Kevin Candra; Syméon del Marmol

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Intermediate

Pierre Colombo, Malik Boudiaf, Allyn Sweet et al.12/21/2025

arXiv PDF

Key Summary

•Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.
•The paper shows why normal LLM agents with RAG struggle: they must search many documents, prove what is missing, and always give the same answer.
•Equall’s key idea is to first build a structured world model (an Event Graph) from all documents and then run deterministic checks on it.
•This eager world model approach turns fuzzy retrieval into precise, repeatable graph queries with traceable evidence.
•On four real deal datarooms (Seed to Series B), Equall reaches about 85% F1 vs. 42% and 29% for two agentic baselines.
•Equall’s checks run about 22× faster than an agentic RAG system once the model is built.
•As companies mature, documents explode and issues shift from simple missing items to complex inconsistencies; the world model handles this better.
•The method keeps every fact linked to its original document span, so lawyers can trust and verify results.
•This world-modeling recipe can power many other legal tasks beyond tie-out, not just this one workflow.

Why This Research Matters

When investors fund companies, ownership must be crystal clear; mistakes can delay deals or cause costly disputes. This approach gives lawyers fast, consistent, evidence-backed answers by building a structured model of the company’s legal history. It scales to thousands of pages and years of amendments, while keeping every claim traceable to exact document text. Faster and more accurate tie-outs reduce legal friction, speed up financings, and protect all stakeholders’ rights. Beyond tie-out, the same world model can power other critical tasks like rights audits, compliance checks, and exit waterfall calculations. In short, better structure turns messy legal paperwork into dependable decisions.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: Imagine cleaning your room before guests arrive. You don’t just make the bed; you check under it, look inside drawers, and make sure every toy is where it belongs. Everything must match the checklist your parents gave you.

🥬 The Concept (Capitalization Tie-Out): It’s the legal version of that tidy-up, where lawyers verify that a company’s ownership table truly matches all the signed documents.

What it is: A careful check that every share, option, warrant, and term on the cap table is backed by real, signed papers.
How it works: 1) Gather all company documents. 2) Read what each document claims. 3) Compare each claim to the cap table. 4) Mark differences and show proof. 5) Fix or escalate issues.
Why it matters: If this goes wrong, money and control can be mis-assigned—causing delays, disputes, or broken deals.

🍞 Anchor: If the cap table says Alex owns 10,000 shares but the signed stock purchase agreement says 8,000, tie-out flags the mismatch and points to the exact pages that prove it.

🍞 Hook: You know how a school binder has many tabs—math, science, art—and you sometimes need to jump between them to finish one project?

🥬 The Concept (Dataroom): A dataroom is a big digital binder holding all the company’s legal and financial history.

What it is: A repository of contracts, board approvals, charters, option grants, SAFEs, and more—often thousands of pages.
How it works: 1) Upload docs. 2) Organize by type (e.g., board consent, stock plan). 3) Lawyers search across everything to support each cap table line.
Why it matters: Without a complete and organized dataroom, you can’t be sure the cap table is true.

🍞 Anchor: To confirm an employee’s options, you might need the stock plan, the person’s grant, a board approval, and an exercise or transfer document—all inside the dataroom.

🍞 Hook: Think of the cap table like a classroom seating chart that shows who sits where and how many seats each row has.

🥬 The Concept (Cap Table): The cap table is a spreadsheet listing who owns what—common, preferred, options, SAFEs—and key terms.

What it is: The company’s stated picture of current ownership.
How it works: 1) Each row is a security for a holder. 2) Columns include number of shares, price, vesting, dates, etc. 3) The whole sheet should match the legal record.
Why it matters: Investors use it to decide money and control. If it’s wrong, payouts and votes can be wrong.

🍞 Anchor: A row might say: “CS-10, John Jackson, 5,000 shares, $0.00001, 4-year vesting,” which must be fully supported by the grant and approvals.

🍞 Hook: When you solve a mystery, you don’t just read one note; you compare clues from many pages to see the full story.

🥬 The Concept (Multi-Document Reasoning): This means combining facts from many different documents to answer one question.

What it is: Pulling related points across multiple files to make a single, correct conclusion.
How it works: 1) Find all related docs. 2) Link who, what, when across them. 3) Resolve conflicts and timelines.
Why it matters: Tie-out answers often hide in chains of documents, not in one place.

🍞 Anchor: To confirm a grant’s vesting start date, you might need the grant agreement, an amendment, and a later board consent that changed the date.

🍞 Hook: Imagine your teacher says, “Show your work!” They don’t just want your answer; they want to see where it came from.

🥬 The Concept (Evidence Traceability): Every claim must point to exact document sources.

What it is: A clear link from each output back to the specific pages and clauses that prove it.
How it works: 1) Store the text span for each fact. 2) Keep document IDs and page numbers. 3) Bundle them with every result.
Why it matters: Without traceability, you can’t trust or audit the system’s answers.

🍞 Anchor: If the system flags “vesting schedule mismatch,” it shows the cap table cell and the clause on page 12 of the grant that disagrees.

🍞 Hook: If you weigh the same apple twice, the scale should give the same number both times.

🥬 The Concept (Deterministic Outputs): Running the same documents should always give the same results.

What it is: Repeatable outputs that don’t change randomly.
How it works: 1) Use fixed rules for combining facts. 2) Avoid randomness during verification. 3) Log decisions.
Why it matters: Deals need reliability; people must trust that results won’t wobble.

🍞 Anchor: Process the same option grant tomorrow, get the same vesting answer and the same evidence links.

🍞 Hook: In some classes, a few topics get tons of pages, while many others are short—like a few big hills and a long row of small bumps.

🥬 The Concept (Long-Tail Document Mix): In real datarooms, some document types (like board consents) appear very often, but there’s a long tail of rare yet important docs.

What it is: A skewed spread where common types dominate counts, but rare ones still matter.
How it works: 1) Identify frequent categories. 2) Don’t ignore rare types—they can be decisive.
Why it matters: Systems must handle both the common and the rare to avoid missing key evidence.

🍞 Anchor: A seldom-seen warrant amendment might change ownership more than dozens of routine consents.

02Core Idea

🍞 Hook: You know how packing your backpack is easier if you organize your books and folders first, instead of hunting for each page every time you need it?

🥬 The Concept (World Model Architecture): Build a structured map of the company’s legal history first, then run checks on that map.

What it is: An organized, layered model that turns messy documents into a clean, connected timeline of events.
How it works: 1) Extract atomic facts (names, dates, numbers) with sources. 2) Combine facts into event nodes (issuance, transfer, amendment). 3) Use clear rules to compute current ownership. 4) Compare that to the cap table and flag gaps with evidence.
Why it matters: Without a world model, checks become guessy and inconsistent, especially when many documents interact.

🍞 Anchor: Instead of re-reading 20 PDFs to re-verify a grant, you query the event graph that already links the grant, its amendment, and a stock split.

🍞 Hook: Imagine two study styles. Style A: open your textbook and Google every time you get a question. Style B: make a neat summary sheet first, then answer from it fast.

🥬 The Concept (Agentic Paradigm vs. Eager Modeling): Agentic-with-RAG is like Style A (answer now, search now). Eager world modeling is like Style B (organize first, answer fast later).

What it is: Two strategies—ad-hoc retrieval vs. up-front structure building.
How it works: Agentic: form query, retrieve chunks, reason each time. Eager: build the event graph once, then do deterministic queries.
Why it matters: Proving what’s missing or tracing long chains is hard with ad-hoc search but straightforward with a prebuilt graph.

🍞 Anchor: Checking 500 items is slow if you must re-search each one; it’s quick if you can run graph queries on a model you already built.

🍞 Hook: Think of LEGO instructions combined with your imagination. The pictures help you find parts; the steps tell you exactly how to build the model.

🥬 The Concept (Neuro-Symbolic Approach): Mix LLMs for reading and understanding text (neuro) with rule-based logic for final calculations (symbolic).

What it is: A combo where AI extracts facts and events, then rules compute the official state.
How it works: 1) LLMs parse and link facts into events. 2) Logic applies precise math and timelines.
Why it matters: LLMs are great at reading messy text, but rules are better at exact, repeatable math.

🍞 Anchor: The LLM finds two amendments; the rules then apply a 10:1 stock split correctly every time.

🍞 Hook: Picture a family tree that also shows who moved, who changed names, and when important milestones happened.

🥬 The Concept (Event Graph): A timeline graph of legal events—issuances, transfers, amendments, conversions, and corporate actions—connected to their sources.

What it is: A structured history of who got what, when, and how it later changed.
How it works: 1) Create event nodes. 2) Link them to documents and to prior events they modify. 3) Traverse to compute current truth.
Why it matters: Without the event graph, you lose the chain of title and miss how one change affects many others.

🍞 Anchor: One option grant node connects to a reprice node, then to a split node, then to a partial exercise node—exactly matching reality.

🍞 Hook: You know how teachers sometimes give you the answer key and want you to explain where it came from?

🥬 The Concept (Verification Transforms): These are specific checks that zoom into one piece of the truth (like “vesting start date for Grant CS-102”).

What it is: Targeted questions that compare the event-graph truth to the cap table claim.
How it works: 1) Query the graph for that field. 2) Compare with the cap table. 3) Output match or flag, plus sources.
Why it matters: Without focused checks, you’d either miss details or waste time on unrelated text.

🍞 Anchor: A transform pulls all events affecting CS-102’s vesting start, computes the final date, and compares it to the spreadsheet cell.

🍞 Hook: Think of the three main ways a homework answer can be wrong: it’s missing, it has no work shown, or the numbers don’t match the steps.

🥬 The Concept (Anomalies in Tie-Out): Problems fall into three buckets.

What it is: 1) Missing from cap table (docs show something real, but it’s not listed), 2) Missing documentation (cap table shows it, but docs don’t back it), 3) Inconsistent terms (both show it, but details don’t match).
How it works: Each flagged anomaly names the item, its type, and the exact evidence.
Why it matters: Clear categories guide fixes and accountability.

🍞 Anchor: A found SAFE not listed (missing from cap table), a grant on the table with no board approval (missing documentation), or a vesting schedule mismatch (inconsistent terms).

03Methodology

At a high level: Input (Dataroom + Cap Table) → Stage 1 Foundational Extraction → Stage 2 Inductive Event Modeling → Stage 3 Neuro-Symbolic Verification → Output (Verified positions + Flags + Evidence).

🍞 Hook: You know how you first sort your Lego bricks by color and size before building a castle?

🥬 The Concept (Stage 1: Foundational Extraction): Sort all raw documents into clean, traceable facts.

What it is: LLM-based parsing that classifies docs and extracts atomic facts (names, dates, share counts, prices, clauses) with exact source spans.
How it works: 1) Classify files into known legal categories (e.g., board consents, grants). 2) Extract fields (e.g., “vesting start date = 2022-03-01,” page 4, paragraph 2). 3) Normalize entities (e.g., “Alexander T. Li” ≈ “Alex Li”). 4) Store provenance (doc ID, page, text span).
Why it matters: Without accurate, source-linked atoms, later reasoning becomes guesswork.

🍞 Anchor: From a grant PDF, the system pulls holder name, quantity, price per share, and vesting schedule, each tied to the exact clause.

🍞 Hook: Imagine arranging photos on a timeline with labels like “birthday,” “graduation,” and “moved house,” so you can tell a full life story.

🥬 The Concept (Stage 2: Inductive Event Modeling): Turn facts into a timeline of business events.

What it is: Build an Event Graph with nodes like Issuance, Transfer, Amendment, Conversion, Exercise, and Corporate Action (e.g., stock split).
How it works: 1) Group related facts into a candidate event. 2) Link it to its documents (citations). 3) Connect events to prior events they modify (e.g., an amendment points to the issuance it changes). 4) Resolve time order and entity identities.
Why it matters: Ownership today depends on a chain of events; the graph captures that chain precisely.

🍞 Anchor: A single option grant spawns a chain: Grant → Reprice → 10:1 Split → Partial Exercise → Expiration → Transfers to three holders, each with citations.

🍞 Hook: Think of applying math rules step-by-step to get from the start of a recipe to the finished cake.

🥬 The Concept (Stage 3: Neuro-Symbolic Verification): Compute the virtual cap table and compare to the reference.

What it is: Deterministic queries over the Event Graph that aggregate events into current positions, then compare against the cap table line-by-line.
How it works: 1) Traverse all Issuances to a holder. 2) Apply Transfers, Exercises, Conversions. 3) Apply Corporate Actions like splits. 4) Calculate final quantities and terms. 5) Compare to cap table and flag anomalies with evidence.
Why it matters: This turns fuzzy retrieval into exact, reproducible math with full traceability.

🍞 Anchor: To verify “John Jackson, 5,000 common,” the system sums all his issuances, subtracts transfers-out, adds transfers-in, applies the 10:1 split, and shows the precise document snippets used.

Detailed Steps with Examples:

Input Preparation: Load the dataroom (hundreds to thousands of documents) and the reference cap table (multiple ledgers). Example: The common stock ledger shows 18 entries; dataroom includes charters, grants, board consents, SAFEs, and amendments.
Document Classification: Label each document type (e.g., Board Consent, Stock Purchase Agreement). Why: Helps route extraction to the right parser and reduce noise. Example: A file titled “Consent_2019_final.pdf” is recognized as a board consent despite a vague filename.
Atomic Fact Extraction: Pull minimal, exact fields from each doc with citations. Why: Later logic needs numbers and dates it can trust. Example: “Vesting start date = 2019-01-15 (Doc #42, p.3, §2).”
Entity Resolution: Map variant names to a single stakeholder. Why: Prevent double-counting or missed links. Example: “ACME Fund I, L.P.” and “ACME I LP” resolved to one entity.
Event Synthesis: Combine facts into event nodes with edges to supporting docs and modified prior events. Why: Captures legal lineage and temporal order. Example: An Amendment node references the prior Issuance node and specifies the changed vesting schedule.
Virtual State Computation: Aggregate through graph traversal to produce the virtual cap table. Why: Ownership is the sum of all history, not a single doc. Example: Total options after a split and partial exercise.
Verification Transforms: For each cap table line/field, query the virtual view and compare. Why: Pinpoints mismatches precisely. Example: Check “Price Per Share” for CS-10 and produce “match” or a flagged discrepancy.
Evidence Packaging: Bundle the anomaly type, affected item, computed vs. stated values, and minimal citation set. Why: Makes review fast and defensible. Example: “Missing Documentation: CS-10 board approval not found; searched all consents; closest doc amends CS-09.”

🍞 Hook: Think of the clever trick as pre-building your study guide so every question later is easy and fast.

🥬 The Concept (Secret Sauce): Front-load the hard reading, then do cheap, exact checks.

What it is: Eager construction of a source-linked Event Graph plus deterministic queries.
How it works: 1) Spend compute once to parse and build the model. 2) Reuse it for thousands of fast, reproducible checks.
Why it matters: This flips the speed-quality trade: better accuracy and much faster per-check time as workload grows.

🍞 Anchor: After a 15-minute build on a 300-document dataroom, each new check takes ~2 seconds, not 45 seconds.

04Experiments & Results

🍞 Hook: Imagine a science fair where three robots solve the same puzzle. You judge them on correctness, speed, and how clearly they show their work.

🥬 The Concept (The Test): Evaluate automated tie-out as anomaly detection with evidence.

What it is: Given the dataroom and the cap table, find all places where the world-model truth disagrees or can’t be proven, and cite sources.
How it works: 1) Run verification transforms on every relevant field. 2) Classify each issue (missing from cap table, missing documentation, inconsistent terms). 3) Match each flag to ground-truth annotations by legal experts.
Why it matters: Real usefulness demands both right answers and rock-solid citations.

🍞 Anchor: If the system says “vesting schedule mismatch,” it must show the exact paragraph proving the correct schedule.

🍞 Hook: Picture a race between a sprinter who starts instantly but slows each lap, and a marathoner who warms up, then runs steadily and fast.

🥬 The Concept (The Competition): Compare three systems—Agentic Baseline (LLM+RAG), Agentic + Structured Repr. (pre-extracted facts but ad-hoc reasoning), and Equall (full world model + deterministic queries).

What it is: A head-to-head on the same four real datarooms (Seed to Series B).
How it works: All systems receive the same inputs and must output flags with evidence.
Why it matters: Shows if structure-first beats search-first in real conditions.

🍞 Anchor: The agentic baseline queries raw text each time; Equall builds the Event Graph once, then answers via exact graph traversals.

🍞 Hook: Don’t just say “87%”—tell me what that means on the report card compared to the rest of the class.

🥬 The Concept (The Scoreboard): Measure precision, recall, F1, and speed with context.

What it is:
- Accuracy: Equall reaches ~85% F1 on average; Agentic+Structured gets ~42%; pure Agentic ~29%.
- Speed: On a 300-document dataroom, after a ~15-minute build, Equall takes ~2 seconds per check vs. ~45 seconds for agentic—~22× faster per check.
- Scaling: As datarooms grow from Seed to Series B, the agentic F1 collapses (about 55%→28%), while Equall stays strong and widens the gap.
How it works: The event graph turns long, fragile retrieval chains into stable queries, so both accuracy and speed improve as checks multiply.
Why it matters: In real deals, there are thousands of checks; small per-check savings and higher reliability add up to hours saved and fewer misses.

🍞 Anchor: If you need 5,000 checks, doing each in 2 seconds instead of 45 seconds, with higher accuracy, changes a grueling week into a manageable afternoon.

🍞 Hook: Sometimes the surprise in a magic trick is that the magician set it up earlier, not at the moment you saw it.

🥬 The Concept (Surprising Findings): The biggest win comes from building the Event Graph, not just using a better LLM.

What it is: Even when the agentic system uses structured facts, its ad-hoc reasoning still fails on global tasks (like proving something is missing) and long chains. The inductive event structure makes the difference.
How it works: Pre-linking amendments to prior issuances and mapping timelines lets the system prove negatives and resolve history precisely.
Why it matters: Architecture (how you think) beats brute-force retrieval (how much you read) for these tasks.

🍞 Anchor: Equall finds a warrant amendment buried in a long tail of rare docs and correctly updates ownership—something the RAG agent often misses or mis-links.

05Discussion & Limitations

🍞 Hook: Even the best map can be fuzzy if some roads are unmarked or the satellite image is blurry.

🥬 The Concept (Limitations): Where the system can struggle.

What it is: 1) Very messy scans or missing pages can block extraction. 2) Extremely unusual instruments may not fit known event types. 3) Edge-case legal interpretations may still need a lawyer’s judgment.
How it works: The model flags uncertainty and routes tough cases to humans, preserving evidence for quick review.
Why it matters: Honest boundaries keep trust high and prevent silent errors.

🍞 Anchor: A low-quality scan of a 2013 consent with a faint signature may require a lawyer to confirm.

🍞 Hook: Powerful tools still need the right batteries and a clear user manual.

🥬 The Concept (Required Resources): What you need to run this well.

What it is: 1) A well-organized dataroom, 2) Compute to build the world model, 3) LLMs for extraction, 4) A graph engine for verification queries, 5) Human-in-the-loop review.
How it works: Build once, then verify many times; lawyers handle exceptions.
Why it matters: The upfront investment unlocks speed and reliability later.

🍞 Anchor: A 15-minute build step on a 300-document set enables thousands of 2-second checks.

🍞 Hook: A hammer isn’t the tool for a delicate watch; choose the right job for the right tool.

🥬 The Concept (When NOT to Use): Situations where this can fail or isn’t ideal.

What it is: 1) Tiny deals with a handful of docs (manual tie-out may be faster), 2) Datarooms missing most core documents, 3) Novel instruments with no analogues in the event schema.
How it works: Triage early—if inputs are too thin or too odd, do targeted manual checks first.
Why it matters: Avoid over-automation that wastes time or adds confusion.

🍞 Anchor: If a startup has ten documents and three shareholders, a lawyer might finish in under an hour without a model build.

🍞 Hook: Big questions lead to bigger quests—what should we explore next?

🥬 The Concept (Open Questions): What we still don’t know.

What it is: 1) Best practices for handling contradictory redlines vs. executed versions, 2) Automated detection of non-dataroom references (e.g., missing exhibits), 3) Learning richer event types from feedback, 4) Synthetic training curricula that mirror real anomaly patterns, 5) Auditing standards for machine-generated tie-out.
How it works: Combine human feedback, synthetic data generation, and event-schema evolution.
Why it matters: Each improvement tightens accuracy and reduces review time.

🍞 Anchor: Future versions might auto-detect that a board consent references an absent exhibit and prompt the company to upload it.

06Conclusion & Future Work

🍞 Hook: Imagine if you tidied your whole room into labeled bins once, so every future cleanup took minutes—not hours.

🥬 The Concept (Three-Sentence Summary): This paper shows that capitalization tie-out needs more than ad-hoc search; it needs a structured world model. Equall builds an Event Graph from all documents first, then runs deterministic, evidence-traceable checks to find anomalies. The result is higher accuracy, faster verification, and better scaling as deals get more complex.

🍞 Anchor: After the initial build, each new check is like pulling from a labeled bin—quick and reliable.

Main Achievement: Turning messy, multi-document legal history into a clean, queryable Event Graph that delivers 85% F1 and ~22× faster checks than agentic RAG in real datarooms.

Future Directions: Expand event types, improve OCR and entity resolution, train agents on synthetic tie-out curricula, and extend the world model to adjacent tasks (e.g., liquidation waterfalls, rights audits, and compliance checks).

Why Remember This: In high-stakes legal work, architecture matters. By organizing first and verifying second, we move from fragile, one-off searches to dependable, scalable legal intelligence that lawyers can trust.

Practical Applications

•Automate cap table verification before closing a financing round.
•Detect missing approvals or documents early and request uploads proactively.
•Continuously monitor for inconsistencies when new documents are added to the dataroom.
•Generate evidence-backed anomaly reports for partner legal review.
•Run rapid re-checks after corrections without re-reading all documents.
•Support liquidation waterfall simulations using the same event graph.
•Audit investor rights (ROFR, pro rata, voting) across amendments and transfers.
•Validate option pool sizes and vesting schedules against stock plans and board consents.
•Reconstruct ownership after corporate actions like stock splits or recapitalizations.
•Provide a defensible audit trail for regulators or post-deal integration.

Version: 1