Frozen paper candidate · polibench-paper-v1.0.1

Verified model-output profiles, not model beliefs.

PoliBench measures model outputs under a standardized benchmark. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact by itself.

Completed full-suite runs 99

Before strict paper filtering.

Strict eligible runs 95

Full, completed, parse-valid, and under the default no-answer threshold.

Canonical profiles 84

Latest valid completed full-suite run per model slug.

Duplicate pairs resolved 24

Every removed row is logged in the duplicate table.

Renderable scores 84

Models passing the release truth gate and suppression checks.

Schema checks Pass

7 static schemas are tracked for the release.

Missing Evidence

These warnings are intentionally public. PoliBench should not look more certain than its evidence.

Research Dashboard

Model Provider Run date Benchmark Run health Completion Parse validity No-answer Neutral Responses Axis coverage Score status Evidence Raw run
Claude Haiku 4.5 anthropic 2026-04-24T18:07:53.082Z v1 validated full-suite model-output profile 100% 100% 0% 18.1% 270 9 axes x 30 items renderable Level 3 View run
Claude Opus 4.6 anthropic 2026-04-25T00:31:13.193Z v1 validated full-suite model-output profile 100% 100% 0% 14.4% 270 9 axes x 30 items renderable Level 3 View run
Claude Opus 4.7 anthropic 2026-04-24T15:14:52.402Z v1 validated full-suite model-output profile 100% 100% 0% 23.3% 270 9 axes x 30 items renderable Level 3 View run
Claude Sonnet 4.6 anthropic 2026-04-24T15:27:55.141Z v1 validated full-suite model-output profile 100% 100% 0% 15.9% 270 9 axes x 30 items renderable Level 3 View run
Command A cohere 2026-04-24T23:53:28.223Z v1 validated full-suite model-output profile 100% 100% 0% 9.6% 270 9 axes x 30 items renderable Level 3 View run
Cydonia 24B V4.1 thedrummer 2026-04-25T05:41:28.377Z v1 validated full-suite model-output profile 100% 100% 0% 4.1% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek R1 deepseek 2026-04-25T03:19:43.654Z v1 validated full-suite model-output profile 100% 100% 0% 4.4% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V3.1 deepseek 2026-04-25T17:50:59.358Z v1 validated full-suite model-output profile 100% 100% 0% 14.4% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V3.1 Terminus deepseek 2026-04-25T07:03:45.762Z v1 validated full-suite model-output profile 100% 100% 0% 13.7% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V3.2 deepseek 2026-04-24T19:17:09.502Z v1 validated full-suite model-output profile 100% 100% 0% 13.7% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V3.2 Exp deepseek 2026-04-25T21:56:46.942Z v1 validated full-suite model-output profile 100% 100% 0% 12.2% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V4 Flash deepseek 2026-04-24T16:30:42.873Z v1 validated full-suite model-output profile 100% 100% 0% 7.4% 270 9 axes x 30 items renderable Level 3 View run
DeepSeek V4 Pro deepseek 2026-04-24T18:56:50.938Z v1 validated full-suite model-output profile 100% 100% 0% 13% 270 9 axes x 30 items renderable Level 3 View run
Gemini 2.5 Flash google 2026-04-26T17:46:59.574Z v1 validated full-suite model-output profile 100% 100% 0% 14.8% 270 9 axes x 30 items renderable Level 3 View run
Gemini 2.5 Flash Lite google 2026-04-26T18:27:12.291Z v1 validated full-suite model-output profile 100% 100% 0% 23% 270 9 axes x 30 items renderable Level 3 View run
Gemini 3 Flash Preview google 2026-04-24T16:10:02.674Z v1 validated full-suite model-output profile 100% 100% 0% 34.4% 270 9 axes x 30 items renderable Level 3 View run
Gemini 3.1 Flash Lite Preview google 2026-04-24T18:17:06.254Z v1 validated full-suite model-output profile 100% 100% 0% 23% 270 9 axes x 30 items renderable Level 3 View run
Gemini 3.1 Pro Preview google 2026-04-24T16:00:37.258Z v1 validated full-suite model-output profile 100% 100% 0% 84.8% 270 9 axes x 30 items renderable Level 3 View run
Gemma 4 26B A4B google 2026-04-26T20:07:55.585Z v1 validated full-suite model-output profile 100% 100% 0% 41.5% 270 9 axes x 30 items renderable Level 3 View run
Gemma 4 31B google 2026-04-26T20:21:02.610Z v1 validated full-suite model-output profile 100% 100% 0% 71.1% 270 9 axes x 30 items renderable Level 3 View run
GLM 4.7 Flash z-ai 2026-04-26T18:31:26.596Z v1 validated full-suite model-output profile 100% 100% 0% 2.2% 270 9 axes x 30 items renderable Level 3 View run
GLM 5 z-ai 2026-04-25T17:28:56.883Z v1 validated full-suite model-output profile 100% 100% 0% 5.9% 270 9 axes x 30 items renderable Level 3 View run
GLM 5.1 z-ai 2026-04-24T17:03:18.935Z v1 validated full-suite model-output profile 100% 100% 0% 12.2% 270 9 axes x 30 items renderable Level 3 View run
Goliath 120B alpindale 2026-04-25T06:16:43.685Z v1 validated full-suite model-output profile 100% 100% 0% 70.4% 270 9 axes x 30 items renderable Level 3 View run