Papers1055

All Beginner Intermediate Advanced

All Sources arXiv

TimeBill: Time-Budgeted Inference for Large Language Models

Intermediate

Qi Fan, An Zou et al.Dec 26arXiv

TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.

#time-budgeted inference#response length prediction#execution time estimation

Not triaged yet

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Intermediate

Mengqi He, Xinyu Tian et al.Dec 26arXiv

The paper shows that when vision-language models write captions, only a small set of uncertain words (about 20%) act like forks that steer the whole sentence.

#vision-language models#autoregressive generation#entropy

Not triaged yet

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Intermediate

Steven Xiao, Xindi Zhang et al.Dec 25arXiv

This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.

#Knot Forcing#autoregressive video diffusion#temporal knot

Not triaged yet

An Information Theoretic Perspective on Agentic System Design

Intermediate

Shizhe He, Avanika Narayan et al.Dec 25arXiv

The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.

#agentic systems#compressor-predictor#mutual information

Not triaged yet

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Intermediate

Shuo Cao, Jiayang Li et al.Dec 25arXiv

This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.

#perceptual image understanding#image aesthetics assessment (IAA)#image quality assessment (IQA)

Not triaged yet

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Intermediate

Haonan Qiu, Shikun Liu et al.Dec 24arXiv

HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.

#high-resolution video generation#diffusion transformer (DiT)#dual-resolution caching

Not triaged yet

Streaming Video Instruction Tuning

Intermediate

Jiaer Xia, Peixian Chen et al.Dec 24arXiv

Streamo is a real-time video assistant that knows when to stay quiet, when to wait, and when to speak—while a video is still playing.

#streaming video LLM#real-time video understanding#instruction tuning

Not triaged yet

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Intermediate

Jiawei Liu, Junqiao Li et al.Dec 24arXiv

DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.

#arbitrary frame conditioning#one-shot video generation#Diffusion Transformer

Not triaged yet

Latent Implicit Visual Reasoning

Intermediate

Kelvin Li, Chuyi Shang et al.Dec 24arXiv

Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.

#Latent Implicit Visual Reasoning#latent tokens#bottleneck attention masking

Not triaged yet

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Intermediate

Tanghui Jia, Dongyu Yan et al.Dec 24arXiv

UltraShape 1.0 is a two-step 3D generator that first makes a simple overall shape and then zooms in to add tiny details.

#3D diffusion#coarse-to-fine generation#voxel-based refinement

Not triaged yet

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Intermediate

Yoonwoo Jeong, Cheng Sun et al.Dec 24arXiv

This paper speeds up how 3D scenes handle big, 512‑dimensional features without throwing away important information.

#3D Gaussian Splatting#Quantile Rendering#Open-vocabulary segmentation

Not triaged yet

NVIDIA Nemotron 3: Efficient and Open Intelligence

Intermediate

NVIDIA, : et al.Dec 24arXiv

Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.

#Nemotron 3#Mixture-of-Experts#LatentMoE

Not triaged yet

66 67 68 69 70