Papers1262

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Jinpeng Chen, Cheng Gong et al.Mar 2arXiv

CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.

#constraint-guided verification#multi-turn tool use#user simulator

Not triaged yet

Efficient RLVR Training via Weighted Mutual Information Data Selection

Intermediate

Xinyu Zhou, Boyu Zhu et al.Mar 2arXiv

Reinforcement learning (RL) trains language models by letting them try answers and learn from rewards, but training is slow if we pick the wrong practice questions.

#Reinforcement Learning#RLVR#Data Selection

Not triaged yet

Agentic Code Reasoning

Intermediate

Shubham Ugare, Satish ChandraMar 2arXiv

The paper teaches AI agents to understand big codebases without running the code by following a strict, step-by-step thinking template called semi-formal reasoning.

#agentic code reasoning#semi-formal reasoning#patch equivalence

Not triaged yet

FireRed-OCR Technical Report

Intermediate

Hao Wu, Haoran Lou et al.Mar 2arXiv

FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.

#FireRed-OCR#structural hallucination#document parsing

Not triaged yet

OpenAutoNLU: Open Source AutoML Library for NLU

Beginner

Grigory Arshinov, Aleksandr Boriskin et al.Mar 2arXiv

OpenAutoNLU is a simple, open-source tool that automatically builds text understanding models for you.

#AutoML#Natural Language Understanding#Text Classification

Not triaged yet

Legal RAG Bench: an end-to-end benchmark for legal RAG

Beginner

Abdur-Rahman Butler, Umar ButlerMar 2arXiv

Legal RAG Bench is a new, end-to-end test that checks how well legal AI systems find information and use it to answer tough, real-world legal questions.

#legal RAG#retrieval-augmented generation#embedding models

Not triaged yet

Surgical Post-Training: Cutting Errors, Keeping Knowledge

Intermediate

Wenye Lin, Kai HanMar 2arXiv

The paper introduces SPOT, a training recipe that fixes an AI model’s mistakes with tiny edits while keeping what it already knows well.

#Surgical Post-Training#SPOT#DPO

Not triaged yet

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate

Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

Not triaged yet

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Intermediate

Qiyuan Zhang, Junyi Zhou et al.Mar 2arXiv

RubricBench is a new benchmark that checks whether AI judges can use clear, checklist-style rules (rubrics) the way humans do.

#RubricBench#rubric-guided evaluation#reward models

Not triaged yet

PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

Beginner

Tianyi Xu, Rong Shan et al.Mar 2arXiv

PhotoBench is a new test built from real people’s photo albums to see if AI can find photos based on what you truly mean, not just what you see.

#PhotoBench#personalized photo retrieval#multi-source reasoning

Not triaged yet

LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

Intermediate

Jiajie Jin, Yanzhao Zhang et al.Mar 2arXiv

LaSER teaches a fast search model to “think” quietly inside its hidden space, so it gets the benefits of step-by-step reasoning without writing those steps out as text.

#dense retrieval#chain-of-thought#latent reasoning

Not triaged yet

SciDER: Scientific Data-centric End-to-end Researcher

Beginner

Ke Lin, Yilin Lu et al.Mar 2arXiv

SciDER is a team of smart AI helpers that can run almost the whole research process: think of ideas, read raw data, write and run code, and improve itself with feedback.

#data-centric AI#AI research agent#self-evolving memory

Not triaged yet

4 5 6 7 8