agent-evaluation Guide

Name: agent-evaluation
Author: sickn33

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks

38,911 starsby sickn33

When to use agent-evaluation

How to use agent-evaluation

agent-evaluation is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/agent-evaluation/SKILL.md

Details

PlatformClaude

CategoryAI & ML

Invocationuser-invocable

Modelany

Maintainersickn33

LicenseMIT

agent-evaluation Guide

When to use agent-evaluation

How to use agent-evaluation

Details

Resources