Submitting to the leaderboard¶
Submissions are evaluated against a hidden ranked benchmark set by a maintainer-operated runner. This keeps the ranking defensible against overfitting.
Overview¶
- Write and test your compactor against the public
elite_practicesuite. - Open a PR adding your method source to
submissions/<your-handle>/<method-name>/. - A maintainer labels the PR for evaluation.
- The self-hosted runner executes your method against Elite Ranked cases.
- Scores are posted back to the PR as a comment.
- If your method qualifies, the PR is merged and the leaderboard updates.
Submission directory layout¶
submissions/
└─ your-handle/
└─ method-name/
├─ method.py # subclass of Compactor, exports one public class
├─ config.yaml # method config (provider, model, compression tier, any method-specific knobs)
├─ requirements.txt # third-party pip dependencies beyond compactbench[providers]
└─ README.md # one page on your approach
A starter scaffold lives at submissions/_template/.
Qualification requirements¶
Your submission must pass all of these to land on the leaderboard:
- Implements
Compactorcorrectly; returns a validCompactionArtifacton every call. - Declares a compression tier (
Elite-Light/Elite-Mid/Elite-Aggressive) and clears the floor (≥ 2× / 4× / 8×). - Completes all configured cases and all configured drift cycles without errors.
- Contradiction rate ≤ 0.10.
- No single Elite family drops below 0.40 case-level pass rate (category diversity guard).
- Dependencies are pinned in
requirements.txtand install from PyPI.
Full ranking formula and qualification details: methodology.
What gets published¶
On merge, the leaderboard publishes:
- Your chosen method name
- Your GitHub handle or declared org name
- Overall, drift, constraint retention, contradiction, compression, and per-family scores
- Benchmark version, scorer version, target model, method version
The hidden test content itself is never published.
Resubmitting¶
Push new commits to the same PR. Re-evaluation is gated on a maintainer adding the reevaluate label so we control runner spend.
New versions of an already-ranked method should go under a new subfolder (method-name-v2) so the old entry stays pinned to its benchmark version.