Exclusions

Nothing is silently dropped.

Each excluded run or row carries a reason code, source file, run ID, and model slug where available. This log is frozen paper-release documentation, kept as a historical record; live pages carry current benchmark data.

Claim Evidence

Exclusion claims link to the pages documenting the exclusion log, run table, and manifest before sampled rows are shown.

The exclusion log is intentionally separate from the live leaderboard. It explains why a row was left out of the paper release, whether the issue came from run status, parsing, source data, or eligibility, and which artifact should be checked before drawing conclusions from the benchmark snapshot.

ClaimEvidence
Excluded rows and runs stay logged with reason codes instead of being silently dropped. Exclusions · Run table · Manifest
EntityIDReasonRunModelSource

Evidence note

PoliBench is a public benchmark surface for model outputs under fixed political prompts. Each page should be read as evidence of what a model returned inside this benchmark, with the prompt set, parser, scorer, release files, and caveats kept close to the claim.

The site keeps the claims narrow on purpose. Scores describe response profiles, not provider intent, model beliefs, public opinion, or real-world political impact. Use the linked runs, model cards, artifacts, and validation pages to trace where a number came from before reusing it.

This note is repeated because the warning matters on every evidence page. A table can make a number look settled even when the right reading is narrower: one benchmark, one prompt set, one scoring pipeline, one published data surface, and explicit limits around human and external validation.