Paper release

Paper artifacts are outside the frontend repo.

PoliBench now keeps the website focused on current backend-backed model profiles. Frozen analysis packets, raw response files, and paper-writing exports should be generated outside this public app repository.

Current Public Surface

The public pages show completed full-suite runs from the active Convex backend only. Rows from old question banks, old runners, incomplete executions, and profile-only diagnostics are not eligible for the live UI. Current evidence lives on the live runs, models, and items pages.

That separation keeps the site from mixing two different jobs. The live app should help readers inspect current benchmark results, while a paper artifact should freeze exact files, manifests, exclusions, duplicate decisions, and truth-gate checks in a reproducible packet outside the frontend repository.

Evidence note

PoliBench is a public benchmark surface for model outputs under fixed political prompts. Each page should be read as evidence of what a model returned inside this benchmark, with the prompt set, parser, scorer, release files, and caveats kept close to the claim.

The site keeps the claims narrow on purpose. Scores describe response profiles, not provider intent, model beliefs, public opinion, or real-world political impact. Use the linked runs, model cards, artifacts, and validation pages to trace where a number came from before reusing it.

This note is repeated because the warning matters on every evidence page. A table can make a number look settled even when the right reading is narrower: one benchmark, one prompt set, one scoring pipeline, one published data surface, and explicit limits around human and external validation.