knowledge-distillation Guide
Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.
When to use knowledge-distillation
Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.
How to use knowledge-distillation
knowledge-distillation is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.