speculative-decoding Guide

Name: speculative-decoding
Author: zechenzhangagi

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

8,991 starsby zechenzhangagi

When to use speculative-decoding

How to use speculative-decoding

speculative-decoding is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/zechenzhangagi/ai-research-skills/main/19-emerging-techniques/speculative-decoding/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerzechenzhangagi

LicenseMIT

speculative-decoding Guide

When to use speculative-decoding

How to use speculative-decoding

Details

Resources