SDPO
Reinforcement Learning via Self-Distillation (SDPO)
271
GitHub Stars
Jan 24, 2026
Launch Date
6h ago
First Tracked
About
AI Summary
SDPO is an AI tool that utilizes reinforcement learning through a self-distillation process, enabling models to improve their performance by learning from their own predictions and refining their decision-making strategies.
Reinforcement Learning via Self-Distillation (SDPO)
Tags
distillation
llm
reasoning
rl
Python