llm-obs-eval-bootstrap Guide

Name: llm-obs-eval-bootstrap
Author: datadog-labs

Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.

121 starsby datadog-labs

When to use llm-obs-eval-bootstrap

How to use llm-obs-eval-bootstrap

llm-obs-eval-bootstrap is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/datadog-labs/agent-skills/main/dd-llmo/llm-obs-eval-bootstrap/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerdatadog-labs

LicenseMIT

llm-obs-eval-bootstrap Guide

When to use llm-obs-eval-bootstrap

How to use llm-obs-eval-bootstrap

Details

Resources