tensorrt-llm Guide

Name: tensorrt-llm
Author: davila7

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

27,615 starsby davila7

When to use tensorrt-llm

How to use tensorrt-llm

tensorrt-llm is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/davila7/claude-code-templates/main/cli-tool/components/skills/ai-research/inference-serving-tensorrt-llm/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerdavila7

LicenseMIT

tensorrt-llm Guide

When to use tensorrt-llm

How to use tensorrt-llm

Details

Resources