This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.

#Vision-Language Models#E-commerce adaptation#Attribute extraction

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner

Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

Kimi K2.5: Visual Agentic Intelligence

Beginner

Kimi Team, Tongtong Bai et al.Feb 2arXiv

Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.

#multimodal learning#vision-language models#joint optimization