Papers2

#reasoning distillation

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate

Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought