metric-validation-harness Guide

Name: metric-validation-harness
Author: pproenca

Empirically validates a software metric before trusting or optimizing it — point it at any candidate metric (a command that takes a path and prints one number) plus a corpus, and it runs experiments that try to falsify each property a good metric must have. Checks determinism (same input, same number across runs and hash seeds), invariance to cosmetic edits (also an anti-gaming probe), monotonicity under construct-increasing edits, discrimination, robustness on edge inputs, near-linear tractability, and construct validity (convergent, discriminant vs LOC, predictive AUC, lift over a baseline). Trigger whenever someone proposes, reviews, tunes, or ships a metric, score, or index, asks "is this metric any good", suspects a score tracks LOC or jumps between runs, or builds a deterministic optimization target. It is the empirical companion to the deterministic-metric-design skill and is read-only.

147 starsby pproenca

When to use metric-validation-harness

How to use metric-validation-harness

metric-validation-harness is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/pproenca/dot-skills/master/skills/.experimental/metric-validation-harness/SKILL.md

Details

PlatformClaude

CategoryBusiness & Workflow

Invocationuser-invocable

Modelany

Maintainerpproenca

LicenseMIT

metric-validation-harness Guide

When to use metric-validation-harness

How to use metric-validation-harness

Details

Resources