Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.
The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).
Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.
Big vision-language models are super smart but too large to fit on phones and small devices.
SlideTailor is an AI system that turns a scientific paper into personalized presentation slides that match what a specific user likes.
Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.
Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.
This paper turns messy chains of thought from language models into clear, named steps so we can see how they really think through math problems.
This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?
The paper proposes the Prism Hypothesis: meanings (semantics) mainly live in low frequencies, while fine picture details live in high frequencies.