Claude
Code & Development
Trust: 55/100 (Fair)evaluating-llms-harness Guide
lm-eval-harness: benchmark LLMs (MMLU, GSM8K, etc.).
170,110 starsby nousresearch
When to use evaluating-llms-harness
lm-eval-harness: benchmark LLMs (MMLU, GSM8K, etc.).
How to use evaluating-llms-harness
evaluating-llms-harness is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.
Details
PlatformClaude
CategoryCode & Development
Invocationuser-invocable
Modelany
Maintainernousresearch
LicenseMIT