terminal-bench-loop Guide
Run a single Terminal-Bench problem through Paperclip in a bounded, human-in-the-loop improvement cycle until the smoke passes, the board rejects the next fix, the iteration budget is exhausted, or a real blocker is named. Each iteration runs a bounded smoke against an isolated Paperclip App worktree, captures artifacts, diagnoses the exact stop point with `/diagnose-why-work-stopped`, requests board confirmation before any product fix, then reruns against the same worktree. Use whenever an issue asks to "run Terminal-Bench in a loop", "drive Terminal-Bench until it passes", "loop fix-git through Paperclip", or otherwise points at a Terminal-Bench task and asks for bounded iteration with diagnosis.
When to use terminal-bench-loop
Run a single Terminal-Bench problem through Paperclip in a bounded, human-in-the-loop improvement cycle until the smoke passes, the board rejects the next fix, the iteration budget is exhausted, or a real blocker is named. Each iteration runs a bounded smoke against an isolated Paperclip App worktree, captures artifacts, diagnoses the exact stop point with `/diagnose-why-work-stopped`, requests board confirmation before any product fix, then reruns against the same worktree. Use whenever an issue asks to "run Terminal-Bench in a loop", "drive Terminal-Bench until it passes", "loop fix-git through Paperclip", or otherwise points at a Terminal-Bench task and asks for bounded iteration with diagnosis.
How to use terminal-bench-loop
terminal-bench-loop is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.