fine-tuning-with-trl Guide

Name: fine-tuning-with-trl
Author: davila7

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

27,615 starsby davila7

When to use fine-tuning-with-trl

How to use fine-tuning-with-trl

fine-tuning-with-trl is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/davila7/claude-code-templates/main/cli-tool/components/skills/ai-research/post-training-trl-fine-tuning/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerdavila7

LicenseMIT

fine-tuning-with-trl Guide

When to use fine-tuning-with-trl

How to use fine-tuning-with-trl

Details

Resources