Papers1055

All Beginner Intermediate Advanced

All Sources arXiv

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Intermediate

Changpeng Yang, Jinyang Wu et al.Dec 2arXiv

This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.

#Curriculum Advantage Policy Optimization#advantage-based RL#imitation learning

Not triaged yet

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Intermediate

Le Thien Phuc Nguyen, Zhuoran Yu et al.Dec 1arXiv

This paper introduces AV-SpeakerBench, a new test that checks if AI can truly see, hear, and understand who is speaking, what they say, and when they say it in real videos.

#audiovisual reasoning#speaker attribution#temporal grounding

Not triaged yet

Reinventing Clinical Dialogue: Agentic Paradigms for LLM Enabled Healthcare Communication

Intermediate

Xiaoquan Zhi, Hongke Zhao et al.Dec 1arXiv

Clinical conversations are special because they mix caring feelings with precise medical facts, and old AI systems struggled to do both at once.

#clinical dialogue#agentic AI#large language models

Not triaged yet

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Intermediate

Junyan Ye, Leiqi Zhu et al.Nov 29arXiv

RealGen is a new way to make computer-made pictures look so real that they can fool expert detectors and even careful judges.

#photorealistic text-to-image#detector-guided rewards#reinforcement learning

Not triaged yet

Visual Generation Tuning

Intermediate

Jiahao Guo, Sinan Du et al.Nov 28arXiv

Before this work, big vision-language models (VLMs) were great at understanding pictures and words together but not at making new pictures.

#Visual Generation Tuning#VGT-AE#Vision-Language Models

Not triaged yet

VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

Intermediate

Sinan Du, Jiahao Guo et al.Nov 28arXiv

VQRAE is a new kind of image tokenizer that lets one model both understand images (continuous features) and generate/reconstruct them (discrete tokens).

#VQRAE#Vector Quantization#Representation Autoencoder

Not triaged yet

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

Intermediate

Long Lian, Sida Wang et al.Nov 24arXiv

ThreadWeaver teaches a language model to split big problems into smaller parts it can solve at the same time, like teammates working in parallel.

#adaptive parallel reasoning#fork–join#threaded inference

Not triaged yet

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Intermediate

Lakshya A Agrawal, Shangyin Tan et al.Jul 25arXiv

GEPA is a new way to improve AI prompts by letting the AI read its own work, reflect in plain language on what went wrong, and then rewrite its instructions.

#GEPA#reflective prompt evolution#Pareto frontier

Not triaged yet