Papers1262

DeepSight: An All-in-One LM Safety Toolkit

DeepSight is a free, all-in-one safety toolkit that both tests how models behave (DeepSafe) and peeks inside how they think (DeepScan).

#LLM safety evaluation#multimodal safety#frontier AI risks

Not triaged yet

LawThinker: A Deep Research Legal Agent in Dynamic Environments

Intermediate

Xinyu Yang, Chenlong Deng et al.Feb 12arXiv

LawThinker is a legal AI agent that double-checks every research step before using it, so small mistakes don’t snowball into big ones.

#Legal AI agent#Explore-Verify-Memorize#DeepVerifier

Not triaged yet

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Intermediate

Xin Xu, Clive Bai et al.Feb 12arXiv

This paper shows a simple way to turn many 'too-easy' questions into harder, still-checkable ones so that AI keeps learning instead of stalling.

#Reinforcement Learning with Verifiable Rewards#Compositional prompts#Sequential Prompt Composition

Not triaged yet

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Intermediate

Romain Froger, Pierre Andrews et al.Feb 12arXiv

Gaia2 is a new test that measures how well AI agents handle real-life messiness like changing events, deadlines, and team coordination.

#Gaia2#ARE platform#asynchronous environments

Not triaged yet

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Intermediate

MiniCPM Team, Wenhao An et al.Feb 12arXiv

MiniCPM-SALA is a 9B-parameter language model that mixes two kinds of attention—sparse and linear—to read very long texts quickly and accurately.

#long-context modeling#sparse attention#linear attention

Not triaged yet

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Intermediate

Futing Wang, Jianhao Yan et al.Feb 12arXiv

The paper teaches language models to explore more ideas while thinking, so they can solve harder problems.

#In-Context Exploration#Test-Time Scaling#Chain-of-Thought

Not triaged yet

Adapting Vision-Language Models for E-commerce Understanding at Scale

Beginner

Matteo Nulli, Vladimir Orshulevich et al.Feb 12arXiv

This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.

#Vision-Language Models#E-commerce adaptation#Attribute extraction

Not triaged yet

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner

Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

Not triaged yet

ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces

Beginner

Xin Xu, Tong Yu et al.Feb 12arXiv

ThinkRouter teaches a model to switch how it “thinks” based on how sure it feels, so it stays accurate without talking forever.

#latent reasoning#discrete token space#confidence-aware routing

Not triaged yet

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Beginner

Jinrui Zhang, Chaodong Xiao et al.Feb 12arXiv

Training big language models usually needs super-expensive, tightly connected GPU clusters, which most people do not have.

#decentralized LLM pretraining#mixture-of-experts (MoE)#sparse expert synchronization

Not triaged yet

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use

Intermediate

Hanbing Liu, Chunhao Tian et al.Feb 12arXiv

This paper tackles a simple but serious question: can AI agents use paid tools to finish multi-step tasks without blowing the budget?

#budget-constrained tool use#agentic LLMs#inference-time planning

Not triaged yet

Multimodal Fact-Level Attribution for Verifiable Reasoning

Beginner

David Wan, Han Wang et al.Feb 12arXiv

This paper builds a new test, called MURGAT, to check whether AI models can back up each small fact they say with the right part of a video, audio, or figure.

#multimodal grounding#fact-level attribution#atomic fact decomposition

Not triaged yet

19 20 21 22 23