llm-obs-eval-pipeline Guide

Name: llm-obs-eval-pipeline
Author: datadog-labs

End-to-end pipeline from unlabeled ml_app traces to a bootstrapped evaluator suite. Runs trace classification → root cause analysis → eval bootstrap in sequence with user checkpoints. Use when user says "run the eval pipeline", "go from traces to evals", "bootstrap evals end to end", "classify then RCA then bootstrap", "build an eval set from scratch", or wants a guided walkthrough from production data to evaluator code.

121 starsby datadog-labs

When to use llm-obs-eval-pipeline

How to use llm-obs-eval-pipeline

llm-obs-eval-pipeline is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/datadog-labs/agent-skills/main/dd-llmo/llm-obs-eval-pipeline/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerdatadog-labs

LicenseMIT

llm-obs-eval-pipeline Guide

When to use llm-obs-eval-pipeline

How to use llm-obs-eval-pipeline

Details

Resources