Papers1055

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Xiangyan Qu, Zhenlong Yuan et al.Feb 24arXiv

This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.

#adaptive test-time scaling#image chain-of-thought#image editing

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

Intermediate

Santiago Gonzalez, Alireza Amiri Bavandpour et al.Feb 24arXiv

This paper shows that when AI models grade university-level math proofs, they often disagree with human experts in systematic ways.

#LLM-as-a-Judge#mathematical proof evaluation#alignment gap

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

Intermediate

Jingxuan Zhang, Yunta Hsieh et al.Feb 23arXiv

Vision-Language-Action (VLA) robots are powerful but too big and slow for many real-world devices.

#Vision-Language-Action#Post-Training Quantization#Diffusion Transformer

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Intermediate

Abdelrahman Shaker, Ahmed Heakl et al.Feb 23arXiv

Mobile-O is a small but smart AI that can both understand pictures and make new images, and it runs right on your phone.

#Mobile-O#unified multimodal model#on-device AI

A Very Big Video Reasoning Suite

Intermediate

Maijunxian Wang, Ruisi Wang et al.Feb 23arXiv

This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.

#video reasoning#rule-based evaluation#in-domain generalization

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Intermediate

Kun Yang, Yuxuan Zhu et al.Feb 23arXiv

ManCAR helps recommendation systems think step by step but keeps their thoughts on realistic paths using a map of how items connect.

#sequential recommendation#latent reasoning#interaction graph

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

Intermediate

Zhongwei Wan, Yun Shen et al.Feb 23arXiv

LLMs trained with simple rewards often latch onto just a few ways of solving problems and stop exploring, which hurts their ability to find other correct answers.

#DSDR#dual-scale diversity#RLVR

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

Intermediate

Chongyang Gao, Diji Yang et al.Feb 23arXiv

CFE-BENCH is a new, teacher-verified "Classroom Final Exam" for AI that uses real college STEM problems to test deep, step-by-step reasoning.

#CFE-BENCH#variable-based verification#reasoning flow

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

Intermediate

Yuxuan Yang, Zhonghao Yan et al.Feb 23arXiv

Hepato-LLaVA is a special AI that reads giant microscope pictures of the liver and answers medical questions about cancer.

#Hepato-LLaVA#Hepatocellular Carcinoma#Whole Slide Images

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

Intermediate

Dongming Jiang, Yi Li et al.Feb 22arXiv

This paper explains how AI agents remember things across long conversations and why many current tests don’t truly measure that memory.

#agentic memory#memory-augmented generation#long-context LLMs

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

Intermediate

Shirui Chen, Cole Harrison et al.Feb 22arXiv

Robots learn better when they get small hints at every step instead of only a final thumbs-up or thumbs-down.

#TOPReward#token probabilities#logits

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

Intermediate

Kai Liu, Yanhao Zheng et al.Feb 22arXiv

JavisDiT++ is a new AI that makes short videos and matching sounds from a text prompt, keeping sight and sound in sync.

#joint audio-video generation#multimodal diffusion transformer#modality-specific mixture-of-experts

8 9 10 11 12