Papers1262

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.

#Omni-LLM#token compression#modality-asymmetric

Not triaged yet

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Intermediate

Mengru Wang, Zhenqian Xu et al.Feb 4arXiv

Large language models can quietly pick up hidden preferences from training data that looks harmless.

#Data2Behavior#Manipulating Data Features#activation injection

Not triaged yet

ERNIE 5.0 Technical Report

Intermediate

Haifeng Wang, Hua Wu et al.Feb 4arXiv

ERNIE 5.0 is a single giant model that can read and create text, images, video, and audio by predicting the next pieces step by step, like writing a story one line at a time.

#ERNIE 5.0#unified autoregressive model#mixture-of-experts

Not triaged yet

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Intermediate

Zelai Xu, Zhexuan Xu et al.Feb 4arXiv

WideSeek-R1 teaches a small 4B-parameter language model to act like a well-run team: one leader plans, many helpers work in parallel, and everyone learns together with reinforcement learning.

#width scaling#multi-agent reinforcement learning#orchestration

Not triaged yet

ASA: Training-Free Representation Engineering for Tool-Calling Agents

Intermediate

Youjin Wang, Run Zhou et al.Feb 4arXiv

The paper finds a strange gap: the model’s hidden thoughts almost perfectly show when it should use a tool, but its actual words often don’t trigger the tool under strict rules.

#activation steering#representation engineering#tool calling

Not triaged yet

Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration

Beginner

Jiaheng Liu, Yuanxing Zhang et al.Feb 4arXiv

This paper says today's content AIs are great at pretty pictures and videos but often miss what people actually want, creating a big Intent-Execution Gap.

#Vibe AIGC#Agentic Orchestration#Meta Planner

Not triaged yet

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding

Intermediate

Gang Lin, Dongfang Li et al.Feb 4arXiv

Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.

#long-context LLM#sparse attention#head specialization

Not triaged yet

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

Intermediate

Yu Bai, MingMing Yu et al.Feb 4arXiv

EgoActor is a vision-language model that turns everyday instructions like 'Go to the door and say hi' into step-by-step, egocentric actions a humanoid robot can actually do.

#EgoActing#vision-language model#humanoid robot

Not triaged yet

Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition

Intermediate

Jinlong Ma, Yu Zhang et al.Feb 4arXiv

The paper teaches multimodal large language models (MLLMs) to stop guessing from just text or just images and instead check both together before answering.

#GMNER#Multimodal Large Language Models#Modality Bias

Not triaged yet

No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data

Intermediate

Dmitry KarpovFeb 4arXiv

The paper tries several different ways to translate five low-resource Turkic languages, instead of forcing one method to fit all.

#low-resource machine translation#Turkic languages#NLLB-200

Not triaged yet

Proxy Compression for Language Modeling

Intermediate

Lin Zheng, Xinyu Li et al.Feb 4arXiv

Most language models are trained on compressed tokens, which makes training fast but ties the model to a specific tokenizer.

#proxy compression#byte-level language modeling#tokenizer-free inference

Not triaged yet

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Intermediate

Yansong Ning, Jun Fang et al.Feb 4arXiv

Agent-Omit teaches AI agents to skip unneeded thinking and old observations, cutting tokens while keeping accuracy high.

#LLM agents#reinforcement learning#agentic RL

Not triaged yet

29 30 31 32 33