Papers1262

KlingAvatar 2.0 Technical Report

Kling Team, Jialu Chen et al.Dec 15arXiv

KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.

#audio-driven avatar#video diffusion#diffusion transformer

Not triaged yet

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate

Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

Not triaged yet

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Intermediate

Jiaqi Wang, Weijia Wu et al.Dec 15arXiv

This paper builds a new test called Video Reality Test to see if AI-made ASMR videos can fool both people and AI video watchers (VLMs).

#ASMR#audio-visual coupling#AI-generated video detection

Not triaged yet

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Intermediate

Juil Koo, Daehyeon Choi et al.Dec 15arXiv

This paper teaches robots to move their camera to a better spot before answering a question about what they see.

#Active Perception#Embodied AI#Vision-Language Models

Not triaged yet

WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory

Intermediate

Jin Sob Kim, Hyun Joon Park et al.Dec 15arXiv

Ships constantly broadcast AIS messages, but these messages are messy, unevenly spaced in time, and sometimes wrong.

#AIS trajectory#vessel destination prediction#nested sequence

Not triaged yet

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Intermediate

Haoyu Dong, Pengkun Zhang et al.Dec 15arXiv

FINCH is a new test that checks whether AI can handle real finance and accounting work using messy, real spreadsheets, emails, PDFs, charts, and more.

#FINCH benchmark#finance and accounting AI#spreadsheet agents

Not triaged yet

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Intermediate

Yicheng Feng, Wanpeng Zhang et al.Dec 15arXiv

Robots often see the world as flat pictures but must move in a 3D world, which makes accurate actions hard.

#Vision-Language-Action#3D spatial grounding#visual-physical alignment

Not triaged yet

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Intermediate

Tong Wei, Yijun Yang et al.Dec 15arXiv

GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.

#GTR-Turbo#checkpoint merging#TIES-merging

Not triaged yet

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

Intermediate

Yifan Pu, Yizeng Han et al.Dec 15arXiv

Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.

#text-to-image#diffusion models#few-step generation

Not triaged yet

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Beginner

Tingyang Chen, Cong Fu et al.Dec 15arXiv

The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.

Not triaged yet

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Intermediate

Weizhou Shen, Ziyi Yang et al.Dec 15arXiv

QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.

#long-context reasoning#reinforcement learning#GRPO

Not triaged yet

Improving Recursive Transformers with Mixture of LoRAs

Intermediate

Mohammadmahdi Nouriborji, Morteza Rohanian et al.Dec 14arXiv

Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.

#Mixture of LoRAs#recursive transformers#parameter sharing

Not triaged yet

93 94 95 96 97