llama-cpp Guide

Name: llama-cpp
Author: zechenzhangagi

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

8,991 starsby zechenzhangagi

When to use llama-cpp

How to use llama-cpp

llama-cpp is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/zechenzhangagi/ai-research-skills/main/12-inference-serving/llama-cpp/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerzechenzhangagi

LicenseMIT

llama-cpp Guide

When to use llama-cpp

How to use llama-cpp

Details

Resources