Papers807

All Beginner Intermediate Advanced

All Sources arXiv

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Intermediate

Jiacheng Guo, Ling Yang et al.Dec 22arXiv

GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.

#GenEnv#co-evolutionary learning#difficulty-aligned curriculum

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

Intermediate

Xinyao Liao, Qiyuan He et al.Dec 22arXiv

Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.

#autoregressive image generation#tokenizer–generator alignment#pixel-space reconstruction

Over++: Generative Video Compositing for Layer Interaction Effects

Intermediate

Luchao Qi, Jiaye Wu et al.Dec 22arXiv

Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.

#augmented compositing#video diffusion#video inpainting

StoryMem: Multi-shot Long Video Storytelling with Memory

Intermediate

Kaiwen Zhang, Liming Jiang et al.Dec 22arXiv

StoryMem is a new way to make minute‑long, multi‑shot videos that keep the same characters, places, and style across many clips.

#StoryMem#Memory-to-Video#multi-shot video generation

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Intermediate

Moritz Böhle, Amélie Royer et al.Dec 22arXiv

CASA is a new way to mix images and text inside a language model that keeps speed and memory low while keeping accuracy high.

#CASA#cross-attention#self-attention

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Intermediate

Li Puyin, Tiange Xiang et al.Dec 22arXiv

QuantiPhy is a new test that checks if AI models can measure real-world physics from videos using numbers, not guesses.

#QuantiPhy#Vision-Language Models#Physical reasoning

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Intermediate

Dehai Min, Kailin Zhang et al.Dec 22arXiv

QuCo-RAG is a new way to decide when an AI should look things up while it writes, using facts from its training data instead of its own shaky confidence.

#Dynamic RAG#Retrieval-Augmented Generation#Uncertainty Quantification

DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation

Intermediate

Shijian Ma, Yunqi Huang et al.Dec 22arXiv

DramaBench is a new test that checks how well AI continues drama scripts across six separate skills instead of one big score.

#DramaBench#script continuation#screenplay evaluation

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Intermediate

Ming Li, Han Chen et al.Dec 21arXiv

This paper asks a simple question with big impact: Can AI tell which test questions are hard for humans?

#Item Difficulty Prediction#Item Response Theory#Rasch Model

From Word to World: Can Large Language Models be Implicit Text-based World Models?

Intermediate

Yixia Li, Hongru Wang et al.Dec 21arXiv

This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.

#world models#next-state prediction#text-based environments

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Intermediate

Kaican Li, Lewei Yao et al.Dec 21arXiv

This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.

#multimodal reasoning#generalized visual search#reinforcement learning

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Intermediate

Pierre Colombo, Malik Boudiaf et al.Dec 21arXiv

Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.

#capitalization tie-out#dataroom#cap table verification

48 49 50 51 52