Papers807

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Intermediate

Linghui Shen, Mingyue Cui et al.Dec 18arXiv

This paper protects your photos from being misused by new AI image editors that can copy your face or style from just one picture.

#Diffusion Transformer#cross-attention#in-context image editing

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Intermediate

Yuxin Wang, Lei Ke et al.Dec 18arXiv

This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.

#3D grounding#vision-language models#spatial reasoning

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

Intermediate

Senmao Li, Kai Wang et al.Dec 18arXiv

StageVAR makes image-generating AI much faster by recognizing that early steps set the meaning and structure, while later steps just polish details.

#Visual Autoregressive Modeling#Next-Scale Prediction#Stage-Aware Acceleration

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Intermediate

Wanghan Xu, Yuhao Zhou et al.Dec 18arXiv

The paper defines Scientific General Intelligence (SGI) as an AI that can do science like a human scientist across the full loop: study, imagine, test, and understand.

#Scientific General Intelligence#Practical Inquiry Model#Scientist-aligned benchmark

Adaptation of Agentic AI

Intermediate

Pengcheng Jiang, Jiacheng Lin et al.Dec 18arXiv

This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.

#agentic AI#adaptation#A1 A2 T1 T2

INTELLECT-3: Technical Report

Intermediate

Prime Intellect Team, Mika Senghaas et al.Dec 18arXiv

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.

#INTELLECT-3#prime-rl#verifiers

ModelTables: A Corpus of Tables about Models

Intermediate

Zhengyuan Dong, Victor Zhong et al.Dec 18arXiv

ModelTables is a giant, organized collection of tables that describe AI models, gathered from Hugging Face model cards, GitHub READMEs, and research papers.

#Model Lake#Model Cards#Scientific Tables

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Intermediate

Jintao Zhang, Kaiwen Zheng et al.Dec 18arXiv

TurboDiffusion speeds up video diffusion models by about 100–200 times while keeping video quality comparable.

#TurboDiffusion#video diffusion acceleration#Sparse-Linear Attention

Are We on the Right Way to Assessing LLM-as-a-Judge?

Intermediate

Yuanning Feng, Sinan Wang et al.Dec 17arXiv

This paper asks whether we are judging AI answers the right way and introduces Sage, a new way to test AI judges without using human-graded answers.

#LLM-as-a-Judge#Sage evaluation#Intra-Pair Instability

Spatia: Video Generation with Updatable Spatial Memory

Intermediate

Jinjing Zhao, Fangyun Wei et al.Dec 17arXiv

Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.

#video generation#spatial memory#3D point cloud

In Pursuit of Pixel Supervision for Visual Pre-training

Intermediate

Lihe Yang, Shang-Wen Li et al.Dec 17arXiv

Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.

#pixel supervision#masked autoencoders#MAE redesign

52 53 54 55 56