Papers924

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Qiguang Chen, Yantao Du et al.Jan 9arXiv

This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.

#Long Chain-of-Thought#reasoning bonds#Deep Reasoning

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Intermediate

Longbin Ji, Xiaoxiong Liu et al.Jan 9arXiv

VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.

#autoregressive video generation#visual autoregression#next-frame prediction

Can We Predict Before Executing Machine Learning Agents?

Intermediate

Jingsheng Zheng, Jintian Zhang et al.Jan 9arXiv

Machine learning agents usually improve by writing code, running it for hours, and then using the results to tweak the next try, which is very slow.

#World Models#Predict-then-Verify#Data-centric AI

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

Beginner

Haoming Xu, Ningyuan Zhao et al.Jan 9arXiv

LLMs can look confident but still change their answers when the surrounding text nudges them, showing that confidence alone isn’t real truthfulness.

#Neighbor-Consistency Belief#belief robustness#self-consistency

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate

Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Beginner

Nate Gillman, Yinghua Zhou et al.Jan 9arXiv

Video models can now be told what physical result you want (like “make this ball move left with a strong push”) using Goal Force, instead of just vague text or a final picture.

#goal force#force vector control#visual planning

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate

John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Intermediate

Xiaoshuai Song, Haofei Chang et al.Jan 9arXiv

EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.

#EnvScaler#tool-interactive environments#programmatic synthesis

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Intermediate

Jingcheng Hu, Yinmin Zhang et al.Jan 9arXiv

PaCoRe is a way for AI to think in many parallel paths and then coordinate them, so it can use a lot more brainpower at test time without running out of context window space.

#Parallel Coordinated Reasoning#Test-time compute scaling#Message passing

Orient Anything V2: Unifying Orientation and Rotation Understanding

Intermediate

Zehan Wang, Ziang Zhang et al.Jan 9arXiv

This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.

#object orientation#rotational symmetry#relative rotation

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Intermediate

Zhi Yang, Runguo Li et al.Jan 9arXiv

FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.

#financial AI agents#execution-grounded benchmarking#sandboxed environments

Over-Searching in Search-Augmented Large Language Models

Intermediate

Roy Xie, Deepak Gopinath et al.Jan 9arXiv

The paper shows that language models with a search tool often look up too much information, which wastes compute and can make answers worse on unanswerable questions.

#search-augmented LLMs#over-searching#abstention

38 39 40 41 42