Before strict paper filtering.
Frozen paper candidate · polibench-paper-v1.0.1
Verified model-output profiles, not model beliefs.
PoliBench measures model outputs under a standardized benchmark. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact by itself.
Full, completed, parse-valid, and under the default no-answer threshold.
Latest valid completed full-suite run per model slug.
Every removed row is logged in the duplicate table.
Models passing the release truth gate and suppression checks.
7 static schemas are tracked for the release.
Missing Evidence
These warnings are intentionally public. PoliBench should not look more certain than its evidence.
- no human baseline collected
- human-subjects status unresolved
- not externally validated
- model version unknown
Research Dashboard
| Model | Provider | Run date | Benchmark | Run health | Completion | Parse validity | No-answer | Neutral | Responses | Axis coverage | Score status | Evidence | Raw run |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude Haiku 4.5 | anthropic | 2026-04-24T18:07:53.082Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 18.1% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Claude Opus 4.6 | anthropic | 2026-04-25T00:31:13.193Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 14.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Claude Opus 4.7 | anthropic | 2026-04-24T15:14:52.402Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 23.3% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Claude Sonnet 4.6 | anthropic | 2026-04-24T15:27:55.141Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 15.9% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Command A | cohere | 2026-04-24T23:53:28.223Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 9.6% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Cydonia 24B V4.1 | thedrummer | 2026-04-25T05:41:28.377Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 4.1% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek R1 | deepseek | 2026-04-25T03:19:43.654Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 4.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V3.1 | deepseek | 2026-04-25T17:50:59.358Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 14.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V3.1 Terminus | deepseek | 2026-04-25T07:03:45.762Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 13.7% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V3.2 | deepseek | 2026-04-24T19:17:09.502Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 13.7% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V3.2 Exp | deepseek | 2026-04-25T21:56:46.942Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 12.2% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V4 Flash | deepseek | 2026-04-24T16:30:42.873Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 7.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| DeepSeek V4 Pro | deepseek | 2026-04-24T18:56:50.938Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 13% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Gemini 2.5 Flash | 2026-04-26T17:46:59.574Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 14.8% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemini 2.5 Flash Lite | 2026-04-26T18:27:12.291Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 23% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemini 3 Flash Preview | 2026-04-24T16:10:02.674Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 34.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemini 3.1 Flash Lite Preview | 2026-04-24T18:17:06.254Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 23% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemini 3.1 Pro Preview | 2026-04-24T16:00:37.258Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 84.8% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemma 4 26B A4B | 2026-04-26T20:07:55.585Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 41.5% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| Gemma 4 31B | 2026-04-26T20:21:02.610Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 71.1% | 270 | 9 axes x 30 items | renderable | Level 3 | View run | |
| GLM 4.7 Flash | z-ai | 2026-04-26T18:31:26.596Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 2.2% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| GLM 5 | z-ai | 2026-04-25T17:28:56.883Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 5.9% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| GLM 5.1 | z-ai | 2026-04-24T17:03:18.935Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 12.2% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |
| Goliath 120B | alpindale | 2026-04-25T06:16:43.685Z | v1 | validated full-suite model-output profile | 100% | 100% | 0% | 70.4% | 270 | 9 axes x 30 items | renderable | Level 3 | View run |