SDPO

SDPO

Reinforcement Learning via Self-Distillation (SDPO)

271

GitHub Stars

Jan 24, 2026

Launch Date

6h ago

First Tracked

About

AI Summary

SDPO is an AI tool that utilizes reinforcement learning through a self-distillation process, enabling models to improve their performance by learning from their own predictions and refining their decision-making strategies.

Reinforcement Learning via Self-Distillation (SDPO)

Tags

distillation
llm
reasoning
rl
Python