News

19.10.2025

Weak-for-Strong: Training a 7B Meta-Agent to Orchestrate Powerful LLMs

‘W4S trains a 7B meta-agent to program Python workflows that call stronger LLM executors, using offline RL to iteratively generate, execute, and refine solutions. The approach yields consistent gains across 11 benchmarks and achieves Pass@1 of 95.4 on HumanEval with GPT-4o-mini.’