Submitting to the leaderboard¶

Submissions are evaluated against a hidden ranked benchmark set by a maintainer-operated runner. This keeps the ranking defensible against overfitting.

Overview¶

Write and test your compactor against the public elite_practice suite.
Open a PR adding your method source to submissions/<your-handle>/<method-name>/.
A maintainer labels the PR for evaluation.
The self-hosted runner executes your method against Elite Ranked cases.
Scores are posted back to the PR as a comment.
If your method qualifies, the PR is merged and the leaderboard updates.

Submission directory layout¶

submissions/
└─ your-handle/
   └─ method-name/
      ├─ method.py          # subclass of Compactor, exports one public class
      ├─ config.yaml        # method config (provider, model, compression tier, any method-specific knobs)
      ├─ requirements.txt   # third-party pip dependencies beyond compactbench[providers]
      └─ README.md          # one page on your approach

A starter scaffold lives at submissions/_template/.

Qualification requirements¶

Your submission must pass all of these to land on the leaderboard:

Implements Compactor correctly; returns a valid CompactionArtifact on every call.
Declares a compression tier (Elite-Light / Elite-Mid / Elite-Aggressive) and clears the floor (≥ 2× / 4× / 8×).
Completes all configured cases and all configured drift cycles without errors.
Contradiction rate ≤ 0.10.
No single Elite family drops below 0.40 case-level pass rate (category diversity guard).
Dependencies are pinned in requirements.txt and install from PyPI.

Full ranking formula and qualification details: methodology.

What gets published¶

On merge, the leaderboard publishes:

Your chosen method name
Your GitHub handle or declared org name
Overall, drift, constraint retention, contradiction, compression, and per-family scores
Benchmark version, scorer version, target model, method version

The hidden test content itself is never published.

Resubmitting¶

Push new commits to the same PR. Re-evaluation is gated on a maintainer adding the reevaluate label so we control runner spend.

New versions of an already-ranked method should go under a new subfolder (method-name-v2) so the old entry stays pinned to its benchmark version.