FARE: Fast-Slow Agentic Robotic Exploration
Key Summary
- •Robots used to explore by following simple rules or short-term rewards, which often made them waste time and backtrack a lot.
- •This paper proposes FARE, a team-of-two brain approach: a slow, careful thinker (an LLM) plans big-picture moves, while a fast, reactive thinker (an RL policy) handles local driving.
- •The slow thinker reads a short environment description and a simplified map-graph, then picks smart global waypoints using graph community pruning to keep things simple.
- •The fast thinker follows local sensor data but is gently rewarded to stay close to the global waypoints, reducing detours without becoming rigid.
- •FARE separates “meaning and strategy” (semantics) from “geometry and motion,” so each part thinks at the right speed and scale.
- •In simulations (indoor, forest, warehouse), FARE matched or beat strong baselines, and it won big in forests and warehouses with shorter paths and faster finishes.
- •It also ran on a real robot with an onboard LLM, successfully exploring a large 200m × 130m building without human help.
- •The key idea is fast–slow thinking: keep global reasoning slow and smart, keep local control fast and safe, and make them talk through waypoints.
- •A pruning trick keeps the graph small so the LLM can reason better and faster, and a waypoint-following reward keeps the RL policy aligned without micromanagement.
Why This Research Matters
Efficient exploration saves battery and time, which directly lowers costs in warehouses, hospitals, farms, and disaster zones. By finishing corners and perimeters early, robots avoid long backtracks, freeing them to complete more tasks per day. The clear split between global planning and local control means robots can adapt to different places—wide fields, tight aisles, or twisty corridors—without hand-tuning many parameters. Onboard LLM reasoning shows that smart global guidance can run in real time on a robot, not just in the cloud. This approach is a template for many autonomous systems that need both big-picture judgment and quick reflexes. As we add richer perception and multi-robot teamwork, the benefits could multiply across logistics, safety, and public services.
Detailed Explanation
Tap terms for definitions01Background & Problem Definition
You know how when you clean your room, you can either wander around picking up whatever you see or make a plan to finish corners and shelves in a good order? Robots exploring new places had a similar choice, but most of them were stuck doing the quick, nearby thing rather than following a smart plan.
🍞 Top Bread (Hook): Imagine drawing a grid map of your school where each square is either empty, has a desk, or is unknown because you haven’t looked there yet. 🥬 Filling (The Actual Concept): Occupancy Grid Mapping is a way for robots to keep a map where each little square says “free,” “blocked,” or “unknown.” How it works: 1) The robot scans with sensors (like LiDAR). 2) It updates squares it can see to free or blocked. 3) Everything else stays unknown until it looks later. Why it matters: Without this map, the robot wouldn’t know where it can drive safely or what still needs exploring. 🍞 Bottom Bread (Anchor): A robot in a hallway marks the path ahead as free, the walls as blocked, and the rooms it hasn’t visited as unknown.
The World Before: Classic exploration planners mostly followed frontiers—the edges between known and unknown areas—or they sampled many possible viewpoints and chased the ones likely to reveal more. This worked okay, but these planners often used fixed knobs (hyperparameters) that didn’t change when the environment did. So in a big open area they might be too cautious; in a cluttered maze they might get stuck polishing tiny spots. Learning-based planners used reinforcement learning (RL) to learn from rewards, but they mainly got strong signals only for short-term information gains (like “I saw something new!”), not for the true goal—finishing the whole map fast. That made them short-sighted.
🍞 Top Bread (Hook): You know how in a soccer game, you need both a coach’s long-term strategy and players’ quick reactions on the field? 🥬 Filling (The Actual Concept): Reinforcement Learning (RL) teaches a robot to make decisions by trying actions and getting rewards or penalties. How it works: 1) See the current situation, 2) pick an action, 3) get a reward, 4) learn which choices lead to better long-term results. Why it matters: Without RL’s practice-and-reward loop, robots won’t improve their split-second choices in messy, changing situations. 🍞 Bottom Bread (Anchor): An RL agent learns to choose the next hallway corner because past tries taught it that corner visits often reveal new space.
The Problem: Robots needed to use long-term, big-picture hints hidden in the growing map—like noticing the building has many parallel aisles—and then turn those hints into better overall routes. But they also had to stay nimble when something popped up nearby (like a chair in the way). Most methods handled either the fast local reactions or the long-term planning, not both together well.
🍞 Top Bread (Hook): Think about how you sometimes count friend groups at recess: kids who always play together look like a cluster. 🥬 Filling (The Actual Concept): Community Detection finds groups in a graph—places that are tightly connected to each other. How it works: 1) Build a graph of viewpoints/places. 2) Measure how strongly nodes connect inside a group versus outside. 3) Split into communities with strong inside links. Why it matters: Without grouping, the global map stays messy, and planning becomes slow and confusing. 🍞 Bottom Bread (Anchor): A big warehouse graph splits into “aisle groups,” each aisle becoming its own community.
Failed Attempts: Hierarchical planners tried coarse global maps plus fine local maps, but the global part was still hand-crafted and not adaptive. RL planners tried denser rewards to make training easier, but that nudged them to chase short-term novelty instead of best total coverage.
🍞 Top Bread (Hook): When you have a giant to-do list, you cross off the few items that matter most first. 🥬 Filling (The Actual Concept): Modularity-based Pruning keeps only the most informative communities in a graph. How it works: 1) Score each community by how much it makes the graph neatly grouped. 2) Keep the top few. 3) Link them by their connections. Why it matters: Without pruning, the slow thinker (LLM) wastes time on tiny, noisy parts and may miss the big structure. 🍞 Bottom Bread (Anchor): From 30 small hallway clusters, keep the 6 that outline the main corridors so the planner sees the building’s skeleton.
The Gap: We needed a way to separate “slow, smart, global thinking” from “fast, reactive, local control,” and then make them cooperate so the robot could keep a smart long-term plan without ignoring nearby surprises.
🍞 Top Bread (Hook): You know how you sometimes think fast (“duck!”) and sometimes slow (“What’s the best route to the library?”)? 🥬 Filling (The Actual Concept): The Fast–Slow Thinking Paradigm uses a careful, slower process for big plans and a speedy process for quick actions. How it works: 1) Slow thinker builds a plan from a simplified world view. 2) Fast thinker reacts to sensors to follow the plan safely. 3) They share waypoints so both stay aligned. Why it matters: Without this split, either you’re too slow to act or too short-sighted to plan well. 🍞 Bottom Bread (Anchor): The slow thinker says “Finish the left wing first,” while the fast thinker dodges a rolling chair on the way.
Real Stakes: In warehouses, hospitals, search-and-rescue, or farms, every extra minute or meter costs money, power, or safety. A robot that plans globally and acts locally can finish sooner, avoid backtracking, and better handle new obstacles—directly helping people in daily life and emergencies.
02Core Idea
🍞 Top Bread (Hook): Picture a field trip. The teacher decides which museum rooms to visit and in what order, but each student still steps around people and benches while walking. 🥬 Filling (The Actual Concept): The key insight: Split the robot’s brain into a slow planner that understands the big picture and a fast driver that reacts to nearby details, and make them talk using global waypoints. How it works: 1) Build a hierarchical graph of the explored world (local dense, global sparse). 2) Prune the global graph to just the most meaningful communities. 3) An LLM reads a short environment description and the pruned graph to pick an exploration strategy and global waypoints. 4) An RL policy chooses safe, informative local moves while getting a gentle reward to stick near those waypoints. Why it matters: Without this split-and-talk approach, robots either get lost in local details or miss the big plan, wasting time and distance. 🍞 Bottom Bread (Anchor): In a warehouse, the LLM says “clear aisles 1–3 first,” while the RL steers around a pallet to reach the next marked aisle.
Multiple Analogies:
- City explorer: The mayor (slow thinker) sets the tour of districts; the taxi driver (fast thinker) handles traffic lights and detours. 2) Video game: The minimap sets the quest order; your character dodges enemies in real time. 3) Cooking: The recipe plans courses; your hands adjust pan angles and heat when food sizzles unexpectedly.
Before vs After: Before, planners either relied on fixed global rules or short-term rewards and struggled with long-horizon choices like when to finish corners or return to a far wing. After FARE, the robot uses environment-aware strategies (from the LLM) and stays aligned via waypoints while still reacting to local surprises, reducing backtracking and finishing faster.
🍞 Top Bread (Hook): Think of a really good reading buddy who not only reads text but also understands diagrams when they’re neat and simple. 🥬 Filling (The Actual Concept): A Large Language Model (LLM) is a program that understands and generates human-like text, and it can reason over structured inputs if we present them clearly. How it works: 1) We summarize the environment in short text (e.g., “lots of narrow aisles”). 2) We feed a pruned graph that captures the big structure. 3) The LLM turns these into global strategies and waypoints. Why it matters: Without an LLM, we’d be stuck with rigid global heuristics that don’t flex to different places. 🍞 Bottom Bread (Anchor): Tell the LLM “dense warehouse, narrow aisles,” and it prioritizes aisle-by-aisle coverage with early corner cleanup.
🍞 Top Bread (Hook): When solving a maze, you use the maze map to reason which branches to try first. 🥬 Filling (The Actual Concept): Graph Reasoning means thinking with a map of nodes (places) and edges (paths). How it works: 1) Keep only big, meaningful communities of the map. 2) Ask the LLM to choose a path through these. 3) Update the plan as the map grows. Why it matters: Without graph reasoning, the global plan can’t leverage the structure that saves time (like finishing side branches smartly). 🍞 Bottom Bread (Anchor): The LLM picks a route that finishes far-end aisles now instead of making you come back later.
🍞 Top Bread (Hook): If a coach says, “Head to the left goal,” you still decide whether to dribble left or right around defenders. 🥬 Filling (The Actual Concept): Exploration Strategy is the plan that balances coverage order, safety, and speed. How it works: 1) Describe the space (open vs. narrow, many dead ends?). 2) Set strategy axes (spatial, efficiency, safety, task). 3) Produce waypoint orders that match those axes. Why it matters: Without a strategy, the robot chases whatever is nearby, often missing a better overall route. 🍞 Bottom Bread (Anchor): In a forest with irregular obstacles, the strategy prefers conservative turns and keeping an escape path while still moving outward.
Why It Works (intuition):
- Decoupling semantics from geometry: The LLM excels at interpreting language and high-level structure; the RL excels at quick, sensor-based control. Each operates at its best timescale. - Waypoint alignment: A soft reward encourages staying near the global plan, improving long-term efficiency without handcrafting dense, short-sighted rewards. - Pruning: By keeping only high-modularity communities, the LLM sees the map’s skeleton, making reasoning both faster and clearer. - Closed-loop updates: As the map fills in, the global plan and local choices co-adapt, reducing late-stage backtracking.
Building Blocks:
- Hierarchical belief graph (dense local, sparse global). - Modularity-based pruning to pick top communities. - LLM environment characterization and strategy axes (spatial, efficiency, safety, task). - LLM graph reasoning to produce global waypoints. - RL policy with inputs: local utility, edges, and a guidepost flag for nodes on the global path. - Instruction-following reward that smoothly penalizes waypoint deviation, keeping the robot aligned without being rigid.
03Methodology
At a high level: Input (LiDAR scans + short environment description) → Build hierarchical belief graph (local dense, global pruned) → Slow-thinking LLM: characterize environment, set strategy axes, reason on pruned graph, output global waypoints → Fast-thinking RL: read local graph + utilities + guidepost flags, pick next waypoint → Robot executes safely → Repeat with updates.
Step 1: Build the hierarchical belief graph
- What happens: From the occupancy grid, sample candidate viewpoints in free space and connect nearby ones if the line between them is safe. This makes a collision-free graph around the robot. A small square window around the robot forms the local dense graph (for fast reactions). The rest gets summarized into a global graph. - Why this step exists: Without a graph, the planner sees only raw pixels; it needs roads and intersections (edges and nodes) to plan. - Example: In a hallway, you might have 20 local nodes near the robot and hundreds more far away; only the nearby 20 go into the local graph right now.
Step 2: Detect communities and prune for the global graph
- What happens: Run community detection to find tightly connected groups (like an aisle or a corridor cluster). Score communities by how much they make the graph neatly modular. Keep the top-k groups and connect them by their interlinks to form a compact global graph. - Why this step exists: The LLM needs a simple, meaningful skeleton of the world to reason efficiently; otherwise, it drowns in details. - Example: A warehouse’s 30 aisle groups shrink to the 6 most informative ones that outline the main structure.
Step 3: Environment-conditioned strategy generation (slow-thinking)
- What happens: Feed a short text description (e.g., “indoor office building with long corridors and rooms”) to the LLM. The LLM fills an interpretable schema: spatial traits (openness, connectivity), obstacle traits (density, predictability), and exploration challenges (dead-end likelihood, backtracking risk). Then it maps these to strategy axes: spatial (coverage order), efficiency (time/energy trade-offs), safety (clearances, unknown handling), and task (completion criteria, info priority). - Why this step exists: Without explicit characterization, the plan can’t adapt to the place; warehouses and forests need different tactics. - Example: “Narrow corridors” leads to conservative turns, early corner cleanup, and always-keep-an-escape-route behavior.
Step 4: LLM graph reasoning to produce global waypoints
- What happens: Give the LLM the pruned global graph plus the strategy prompt and a short memory of past choices. It returns an ordered list of global nodes to visit (waypoints). These are long-horizon goals; they can be updated as the map grows. - Why this step exists: The robot needs a travel itinerary that respects the map’s big structure and the chosen strategy. - Example: In the warehouse, the LLM says: finish aisles A→B→C, then the back corners, to avoid later backtracking.
Step 5: Prepare the local observation for the fast-thinking policy
- What happens: Build an informative local graph that shares edges and positions with the local map, but each node also carries: utility (how many frontiers it can see) and a guidepost bit (true if this node lies along the current global path). - Why this step exists: The RL policy must see both immediate value (utility) and global intent (guidepost) to make aligned choices. - Example: A node near a corner has high utility; if it’s also on the global route, the guidepost bit is 1.
Step 6: Fast-thinking policy selects the next waypoint
- What happens: An attention-based network reads the local graph and focuses on promising neighbors of the current node. It picks the next waypoint to execute. - Why this step exists: Without attention over the local graph, the policy can miss better nearby options or choose unsafe ones. - Example: The policy chooses the neighbor that both reveals many frontiers and nudges the robot toward the next global waypoint.
Step 7: Instruction-following reward shaping
- What happens: During training, add a smooth penalty that grows as the chosen waypoint drifts from the LLM’s advised next waypoint. Small drift is fine (stay flexible); big drift gets penalized more. - Why this step exists: We want long-term alignment without micromanaging every move or overfitting to short-term info-gain rewards. - Example: If the policy deviates by a small distance to dodge a chair, the penalty is mild; a large detour gets a stronger penalty.
Step 8: Closed-loop execution and updating
- What happens: The robot executes the chosen waypoint, updates its occupancy grid with new LiDAR data, refreshes the local/global graphs, possibly re-prunes communities, and lets the LLM re-issue or adjust waypoints as needed. - Why this step exists: Environments change as you discover them; the plan must adapt. - Example: Discovering a new side corridor may cause the global plan to slot it earlier if it prevents later backtracking.
The Secret Sauce
- Decoupling: Keep “meaning and order” with the LLM and “motion and safety” with RL. - Pruning: Feed the LLM only the map’s backbone so it reasons quickly and robustly. - Soft alignment: The waypoint-following reward keeps the policy coherent over long horizons while preserving local agility. - Interpretable strategy axes: Humans can read and tweak the high-level behavior if needed.
Concrete Mini-Run Example
- Start: Description = “Modern office: long corridors, rooms.” - LLM: Spatial = low connectivity, narrow corridors; Strategy = conservative, perimeter-first, early corner cleanup. - Global graph: Top communities = North corridor, South corridor, East wing. - Waypoints: North corridor → East wing → South corridor. - Local RL: Near an intersection, two neighbors exist; one reveals more rooms and is on the global route—pick that. - Outcome: The robot clears corners early and avoids a late-game return trip, finishing sooner.
04Experiments & Results
The Test: The authors measured two things that truly matter in exploration—how far the robot had to drive (distance) and how long it took (time). Shorter paths and faster finishes mean less battery use and more efficient coverage.
The Competition: FARE was compared with strong planners—TARE, DSVP, ARIADNE, and HEADER—in three very different simulated worlds: indoor (corridors/rooms), forest (natural obstacles), and warehouse (narrow aisles). Same robot, same sensors, and consistent graph settings were used. FARE only adjusted one knob per environment (node resolution), while other planners typically needed more tuning.
The Scoreboard (with context):
- Indoor: FARE 1048 m, 590 s. This is basically tied with the top methods—like getting an A when others also get A or A-. Why? Indoors was compact with less distinctive global structure, so global reasoning helped less. - Forest: FARE 1090 m, 680 s, beating others by a clear margin—like going from a B to an A. The global planner avoided long, messy detours among trees. - Warehouse: FARE 441 m, 252 s, the best—like an A+ while others got B or B+. The aisle structure is exactly where community pruning + LLM planning shines, finishing aisles and corners early.
Put another way: FARE often cut tens to hundreds of meters and many seconds off the run compared to strong baselines, especially when the environment had a clear large-scale structure (forest clusters and warehouse aisles). The trajectory plots show FARE tends to complete perimeters and corners sooner, while baselines often leave them for later and pay the price by coming back.
Surprising Findings:
- Minimal retuning: FARE needed only the node-resolution parameter adjusted per environment, while baselines typically needed more careful tuning. - Onboard LLM: In real hardware tests, a 14B-parameter LLM ran onboard and still delivered stable guidance—impressive given compute limits. - Indoor parity: In compact indoor settings with less pronounced global structure, FARE did not massively outperform—suggesting its main gains come when structure matters most.
Real-World Validation: On a wheeled robot with a 3D LiDAR and onboard SLAM (FastLIO2), FARE explored a large 200 m × 130 m teaching building. It completed exploration without human help, maintained steady runtime, and smoothly combined global guidance with local reactivity. This shows the approach isn’t just a simulation trick; it transfers to reality.
05Discussion & Limitations
Limitations (be specific):
- Mostly 2D with LiDAR-based mapping; complex 3D multilevel sites (stairs, shelves, overhangs) aren’t fully addressed yet. - The LLM depends on a concise but accurate environment description; misleading or stale descriptions could bias global plans. - Top-k community pruning trades completeness for simplicity; rare but important small structures might get pruned out. - Highly dynamic environments (fast-moving people/objects) can degrade the usefulness of slow global reasoning unless updates are very frequent. - Onboard LLM inference (14B) needs decent compute and power; smaller models may reduce quality, bigger models may be impractical.
Required Resources:
- Reliable SLAM/odometry and a 3D LiDAR (or equivalent sensing). - Enough compute for onboard or near-edge LLM inference (or a fast link to an edge server). - GPU/accelerator for the attention-based RL policy. - Memory and bandwidth for updating graphs and prompts in real time.
When NOT to Use:
- Tiny or very simple spaces where a straightforward frontier method already performs near-optimally. - Extremely dynamic scenes (e.g., dense crowds) where long-horizon global plans change too fast to matter. - Ultra-low-power robots that can’t run the LLM or graph updates at useful rates. - Missions requiring guaranteed optimal paths under strict proofs—FARE is a pragmatic, learning-driven approach.
Open Questions:
- How to scale to multi-robot teams with shared global reasoning and deconflicted local actions? - How to fuse richer semantics (vision labels, object goals) so strategies adapt mid-mission as scene types change? - Can we auto-generate the environment description from sensors, removing the need for manual text prompts? - How to extend to 3D volumetric exploration with full aerial/ground coordination? - What is the best balance between pruning too much and too little for different map topologies?
06Conclusion & Future Work
Three-Sentence Summary: FARE splits exploration into a slow, strategic LLM that reads a simplified global graph and a fast, reactive RL policy that follows local sensors—connected by global waypoints and a gentle waypoint-following reward. This decoupling lets the robot use environment-aware global plans without losing agility nearby, reducing backtracking and finishing sooner. It works in simulation and on real hardware, especially in structured places like warehouses and forests.
Main Achievement: Showing that a fast–slow, agentic design—LLM-driven global reasoning plus RL-driven local control—can be made practical and robust via graph pruning and soft instruction following, delivering consistent efficiency gains over strong baselines.
Future Directions: Extend to multi-robot teams with shared strategies; add vision-based semantics to detect environment shifts on the fly; push into full 3D spaces and richer action sets; and automate the generation of environment descriptions from online perception.
Why Remember This: FARE demonstrates that letting different “brains” think at their own best speed—and talk through clear waypoints—turns messy, long-horizon exploration into a coordinated, efficient journey. The pruning trick makes LLM reasoning tractable. The soft alignment reward keeps the policy coherent over time without making it brittle. Together, they offer a blueprint for marrying global understanding with local skill across many autonomous tasks.
Practical Applications
- •Warehouse inventory mapping with aisle-by-aisle coverage that minimizes backtracking.
- •Hospital corridor scanning to rapidly verify accessibility and detect blocked paths.
- •Search-and-rescue floor sweeps that prioritize corners and exits while keeping escape routes.
- •Construction site progress mapping that adapts as temporary obstacles appear and disappear.
- •Agricultural field scouting where global rows are planned and local obstacles are avoided on the fly.
- •Office building inspection, finishing perimeters and meeting-room clusters early to save time.
- •Campus security patrols that adjust global routes during events with dynamic crowd patterns.
- •Large retail store mapping to maintain up-to-date layouts while minimizing after-hours robot time.
- •Underground or cave exploration with structured community-based waypointing to avoid dead ends.
- •Autonomous cleaning routes that prioritize high-traffic areas and reduce redundant passes.