Blogs
2025
Optimizing LLM test-time compute involves solving a meta RL problem
A. Setlur, Y. Qu, M. Yang, L. Zhang, V. Smith, A. Kumar
[CMU MLD Blog]Sharpening or Discovery, RL or Meta RL?: How RL Improves LLM Reasoning
A. Setlur, A. Kumar
[Notion Blog]