Skip to main content
GPUBeat Archive

/Tag: reinforcement-learning

INPUTTOKEMBATTNKVOUTINFERENCE PIPELINE · vLLM 0.6
Frontier Models 11h

PopuLoRA Introduces Dynamic Self-Play Framework for LLMs

PopuLoRA, a novel self-play framework, aims to improve reasoning in large language models by enabling adaptive task generation and evaluation through co-evolving populations of teachers and students.

More from this archive