Papers2

#distractors

NanoKnow: How to Know What Your Language Model Knows

Lingwei Gu, Nour Jedidi et al.Feb 23arXiv

NanoKnow is a new benchmark that checks whether a language model’s answers come from what it saw during training or from extra text we give it at question time.

#NanoKnow#FineWeb-Edu#nanochat

Not triaged yet

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

Intermediate

Egor Cherepanov, Daniil Zelezetsky et al.Jan 20arXiv

KAGE-Bench is a fast, carefully controlled benchmark that tests how well reinforcement learning (RL) agents trained on pixels handle specific visual changes, like new backgrounds or lighting, without changing the actual game rules.

#reinforcement learning#visual generalization#KAGE-Env

Not triaged yet