metric-validation-harness

Community

Empirically validates a software metric before trusting or optimizing it — point it at any candidate metric (a command that takes a path and prints one number) plus a corpus, and it runs experiments that try to falsify each property a good metric must have. Checks determinism (same input, same number across runs and hash seeds), invariance to cosmetic edits (also an anti-gaming probe), monotonicity under construct-increasing edits, discrimination, robustness on edge inputs, near-linear tractability, and construct validity (convergent, discriminant vs LOC, predictive AUC, lift over a baseline). Trigger whenever someone proposes, reviews, tunes, or ships a metric, score, or index, asks "is this metric any good", suspects a score tracks LOC or jumps between runs, or builds a deterministic optimization target. It is the empirical companion to the deterministic-metric-design skill and is read-only.

Claude

147 stars Updated 1 months ago

Allowed Tools

This skill does not declare a tool allowlist. The agent host applies whatever default tools are available at runtime.

Source

SKILL.md / Manifest

https://raw.githubusercontent.com/pproenca/dot-skills/master/skills/.experimental/metric-validation-harness/SKILL.md

Registry

github (via claudemarketplaces.com)

Trust Score

53Fair

Verification10/30

metric-validation-harness

Allowed Tools

Source

Trust Score

Details