Papers1055

TranslateGemma Technical Report

Mara Finkelstein, Isaac Caswell et al.Jan 13arXiv

TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.

#machine translation#TranslateGemma#Gemma 3

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Intermediate

Youwei Liu, Jian Wang et al.Jan 13arXiv

Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.

#Imagine-then-Plan#world models#adaptive lookahead

3AM: 3egment Anything with Geometric Consistency in Videos

Intermediate

Yang-Che Sun, Cheng Sun et al.Jan 13arXiv

3AM is a new way to track and segment the same object across a whole video, even when the camera view changes a lot.

#video object segmentation#SAM2#geometry-aware tracking

Motion Attribution for Video Generation

Intermediate

Xindi Wu, Despoina Paschalidou et al.Jan 13arXiv

Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.

#motion attribution#video diffusion#optical flow

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Intermediate

Yao Tang, Li Dong et al.Jan 13arXiv

The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.

#Multiplex Thinking#chain-of-thought#continuous token

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Intermediate

Zhiyuan Hu, Yucheng Wang et al.Jan 13arXiv

The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.

#Uniqueness-Aware Reinforcement Learning#LLM reasoning#strategy clustering

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Intermediate

Giulio Corallo, Paolo PapottiJan 13arXiv

This paper introduces PCED, a way to use many documents as separate 'experts' in parallel so an AI can stitch answers together without stuffing everything into one giant prompt.

#Retrieval-Augmented Generation#PCED#contrastive decoding

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory

Intermediate

Shaoan Wang, Yuanfei Luo et al.Jan 13arXiv

VLingNav is a robot navigation system that sees, reads instructions, and acts, while deciding when to think hard and when to just move.

#Vision-Language-Action#embodied navigation#adaptive chain-of-thought

ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

Intermediate

António Loison, Quentin Macé et al.Jan 13arXiv

ViDoRe V3 is a big, carefully built test that checks how well AI systems find and use information from both text and pictures (like tables and charts) in real documents.

#Retrieval-Augmented Generation#Multimodal RAG#Visual Document Understanding

ExpSeek: Self-Triggered Experience Seeking for Web Agents

Intermediate

Wenyuan Zhang, Xinghua Zhang et al.Jan 13arXiv

ExpSeek helps web-browsing AI agents ask for help exactly when they feel unsure, instead of stuffing them with tips at the very beginning.

#web agents#experience base#experience triplets

MoCha:End-to-End Video Character Replacement without Structural Guidance

Intermediate

Zhengbo Xu, Jie Ma et al.Jan 13arXiv

MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.

#video diffusion#character replacement#in-context learning

Your Group-Relative Advantage Is Biased

Intermediate

Fengkai Yang, Zherui Chen et al.Jan 13arXiv

Group-based reinforcement learning for reasoning (like GRPO) uses the group's average reward as a baseline, but that makes its 'advantage' estimates biased.

#Reinforcement Learning from Verifier Rewards#GRPO#GSPO

51 52 53 54 55