llm-evaluation

Community

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Claude

36,031 stars Updated 1 months ago

Allowed Tools

This skill does not declare a tool allowlist. The agent host applies whatever default tools are available at runtime.

Source

SKILL.md / Manifest

https://raw.githubusercontent.com/wshobson/agents/main/plugins/llm-application-dev/skills/llm-evaluation/SKILL.md

Registry

github (via claudemarketplaces.com)

Trust Score

53Fair

Verification10/30

Scope Tightness

llm-evaluation

Allowed Tools

Source

Trust Score

Details