Data dictionary

Rows need grains, not vibes.

The frozen release documents what each generated file represents and what is still missing. This dictionary is frozen paper-release documentation, kept as a historical record; live pages carry current benchmark data.

Claim Evidence

Dictionary claims link to the page documenting the status of the generated dictionaries and schema manifest used by release verification.

Claim	Evidence
Release file grains and field semantics are checked artifacts, not prose-only documentation.	Data dictionary · Field dictionary · Schema manifest

File	Purpose	Rows
not-published	axisDefinitions	Unknown
not-published	axisDiagnostics	Unknown
not-published	axisIntervals	Unknown
not-published	canonicalResponses	Unknown
not-published	canonicalSample	Unknown
not-published	collectionReadinessJson	Unknown
not-published	dataDictionary	Unknown
not-published	duplicateResolution	Unknown
not-published	exclusions	Unknown
not-published	fieldDictionary	Unknown
not-published	itemDiagnostics	Unknown
not-published	limitations	Unknown
not-published	manifest	Unknown
not-published	modelCatalog	Unknown
not-published	modelRosterPreflight	Unknown
not-published	openEndedDiagnostics	Unknown
not-published	promptTemplateMd	Unknown
not-published	questionBankFlags	Unknown
not-published	questionReviewWaivers	Unknown
not-published	questions	Unknown
not-published	releaseSummary	Unknown
not-published	releaseValidation	Unknown
not-published	responseAttempts	Unknown
not-published	responseStyleControls	Unknown
not-published	runPlan	Unknown
not-published	runs	Unknown
not-published	schemaManifestJson	Unknown
not-published	scoringConfig	Unknown
not-published	truthGate	Unknown
not-published	validationManifest	Unknown

Field Dictionary

File	Field	Type	Nullable	Use	Description

Evidence note

PoliBench is a public benchmark surface for model outputs under fixed political prompts. Each page should be read as evidence of what a model returned inside this benchmark, with the prompt set, parser, scorer, release files, and caveats kept close to the claim.

The site keeps the claims narrow on purpose. Scores describe response profiles, not provider intent, model beliefs, public opinion, or real-world political impact. Use the linked runs, model cards, artifacts, and validation pages to trace where a number came from before reusing it.

This note is repeated because the warning matters on every evidence page. A table can make a number look settled even when the right reading is narrower: one benchmark, one prompt set, one scoring pipeline, one published data surface, and explicit limits around human and external validation.