Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
This skill does not declare a tool allowlist. The agent host applies whatever default tools are available at runtime.
SKILL.md / Manifest
https://raw.githubusercontent.com/wshobson/agents/main/plugins/llm-application-dev/skills/llm-evaluation/SKILL.mdRegistry
github (via claudemarketplaces.com)