Papers1262

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Ahmadreza Jeddi, Marco Ciccone et al.Feb 11arXiv

LoopFormer is a Transformer that thinks in loops and can flex its thinking time up or down based on the compute you give it.

#Looped Transformers#Elastic Depth#Shortcut Consistency

Not triaged yet

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Beginner

Heejeong Nam, Quentin Le Lidec et al.Feb 11arXiv

This paper introduces Causal-JEPA (C-JEPA), a world model that learns by hiding entire objects in its memory and forcing itself to predict them from other objects.

#C-JEPA#object-centric world model#object-level masking

Not triaged yet

Voxtral Realtime

Beginner

Alexander H. Liu, Andy Ehrenberg et al.Feb 11arXiv

Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.

#streaming ASR#real-time transcription#causal audio encoder

Not triaged yet

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Intermediate

Dawid J. Kopiczko, Sagar Vaze et al.Feb 11arXiv

The paper shows that, when teaching a reasoning AI with step-by-step examples, repeating a small set many times can beat using a huge set only once.

#Supervised Fine-Tuning#Chain-of-Thought#Data Repetition

Not triaged yet

GENIUS: Generative Fluid Intelligence Evaluation Suite

Intermediate

Ruichuan An, Sihan Yang et al.Feb 11arXiv

The paper introduces GENIUS, a new test that checks whether image-generating AIs can think on the fly, not just recall facts.

#Generative Fluid Intelligence#Unified Multimodal Models#Interleaved Multimodal Context

Not triaged yet

PhyCritic: Multimodal Critic Models for Physical AI

Intermediate

Tianyi Xiong, Shihao Wang et al.Feb 11arXiv

PhyCritic is a judge model that checks other AI models’ answers about the physical world, like cooking steps, robot actions, or driving choices.

#Physical AI#Multimodal critic#Self-referential training

Not triaged yet

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Intermediate

Wayne Chi, Yixiong Fang et al.Feb 11arXiv

GameDevBench is a new test that checks if AI agents can actually make parts of video games, not just write code in one file.

#GameDevBench#Godot#multimodal agents

Not triaged yet

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Intermediate

Yicheng Chen, Zerun Ma et al.Feb 11arXiv

DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.

#data recipe#data processing pipeline#reinforcement learning

Not triaged yet

RISE: Self-Improving Robot Policy with Compositional World Model

Intermediate

Jiazhi Yang, Kunyang Lin et al.Feb 11arXiv

RISE lets a robot learn safely and cheaply by practicing in its imagination instead of always in the real world.

#Reinforcement Learning#World Models#Compositional World Model

Not triaged yet

ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression

Intermediate

Ammar Ali, Baher Mohammad et al.Feb 11arXiv

ROCKET is a fast, training-free way to shrink big AI models while keeping most of their smarts.

#model compression#training-free compression#sparse factorization

Not triaged yet

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Intermediate

Yusong Lin, Haiyang Wang et al.Feb 11arXiv

CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.

#agentic coding#command line interface#Dockerfile

Not triaged yet

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Intermediate

Qixing Zhou, Jiacheng Zhang et al.Feb 11arXiv

FeatureBench is a new benchmark that tests AI coding agents on building real software features, not just fixing small bugs.

#FeatureBench#agentic coding#execution-based evaluation

Not triaged yet

20 21 22 23 24