exploring-llm-evaluations Guide

Name: exploring-llm-evaluations
Author: posthog

Investigate AI observability evaluations of both types — `hog` (deterministic code-based) and `llm_judge` (LLM-prompt-based). Find existing evaluations, inspect their configuration, run them against specific generations, query individual pass/fail results, and generate AI-powered summaries of patterns across many runs. Use when the user asks to debug why an evaluation is failing, surface common failure modes, compare results across filters, dry-run a Hog evaluator, prototype a new LLM-judge prompt, or manage the evaluation lifecycle (create, update, enable/disable, delete).

34,722 starsby posthog

When to use exploring-llm-evaluations

How to use exploring-llm-evaluations

exploring-llm-evaluations is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/posthog/posthog/master/products/llm_analytics/skills/exploring-llm-evaluations/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainerposthog

LicenseNOASSERTION

exploring-llm-evaluations Guide

When to use exploring-llm-evaluations

How to use exploring-llm-evaluations

Details

Resources