Papers1262

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Zhongxi Wang, Yueqian Lin et al.Mar 3arXiv

MUSE is a new open-source platform that tests how safely AI models behave when you talk to them with text, sound, pictures, and video, not just text.

#MUSE#multimodal safety evaluation#red-teaming

Not triaged yet

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

Intermediate

Rituraj Sharma, Weiyuan Chen et al.Mar 3arXiv

PRISM is a new way to help AI think through hard problems by checking each step, not just the final answer.

#DEEPTHINK#Process Reward Model#step-level verification

Not triaged yet

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Intermediate

Yichen Liu, Donghao Zhou et al.Mar 2arXiv

HiFi-Inpaint is a new AI method that fills a missing area in a photo of a person by inserting a specific product, while keeping tiny details like logos, textures, and small text crisp.

#reference-based inpainting#high-frequency map#Shared Enhancement Attention

Not triaged yet

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Beginner

Valentin Lacombe, Valentin Quesnel et al.Mar 2arXiv

Reasoning Core is a tool that automatically creates a huge variety of logic and math puzzles, checks every answer with real solvers, and lets you smoothly dial the difficulty up or down.

#procedural data generation#symbolic reasoning#PDDL planning

Not triaged yet

Tool Verification for Test-Time Reinforcement Learning

Intermediate

Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

Not triaged yet

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Intermediate

Yiqi Lin, Guoqiang Liang et al.Mar 2arXiv

Kiwi-Edit is a new video editor that follows your words and also copies looks from a picture you give it.

#reference-guided video editing#instruction-based editing#multimodal large language model

Not triaged yet

SageBwd: A Trainable Low-bit Attention

Beginner

Jintao Zhang, Marco Chen et al.Mar 2arXiv

SageBwd is a way to make the Transformer's attention both fast and trainable by doing most big multiplications in 8-bit instead of full precision.

#SageBwd#low-bit attention#INT8 training

Not triaged yet

Recursive Think-Answer Process for LLMs and VLMs

Intermediate

Byung-Kwan Lee, Youngchae Chee et al.Mar 2arXiv

This paper teaches AI models to judge how sure they are about an answer and to think again if they are not sure.

#Recursive Think–Answer#Confidence-guided reasoning#Reinforcement learning for LLMs

Not triaged yet

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Intermediate

Siting Wang, Xiaofeng Wang et al.Mar 2arXiv

Robots that read images and instructions (VLAs) get stuck following a narrow, fragile path after normal training.

#vision-language-action#flow matching#stochastic differential equations

Not triaged yet

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

Intermediate

Yisu Zhang, Chenjie Cao et al.Mar 2arXiv

WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.

#video diffusion models#camera control#3D reconstruction

Not triaged yet

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Beginner

Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

Not triaged yet

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Intermediate

Yixin Nie, Lin Guan et al.Mar 2arXiv

CharacterFlywheel is a step‑by‑step loop that steadily improves chatty AI characters by learning from real conversations on Instagram, WhatsApp, and Messenger.

#CharacterFlywheel#large language models#conversational AI

Not triaged yet

3 4 5 6 7