Models
Every model card carries its evidence limits.
Model version is currently unknown unless independently documented in the source artifacts.
Claim Evidence
The model index links evidence-level claims to release artifacts before showing model rows.
| Claim | Evidence |
|---|---|
| Model cards are sorted alphabetically and carry evidence levels, not leaderboard ranks. | Model catalog , Truth gate |
| Model version uncertainty is a visible limitation unless independently documented. | Limitations , Model roster preflight |
| Evidence levels are model-output evidence levels, not human or external validation. | Human status , External status |
| Model | Provider | Run | Completion | Parse | Evidence | Caveat |
|---|---|---|---|---|---|---|
| Claude Haiku 4.5 | anthropic | jn70fpqyr7an1bca1cn7fq93ys864cx0 | 100% | 100% | Level 2 | current Anthropic low-latency paid route |
| Claude Opus 4.5 | anthropic | jn7839n5vcfsf5zsyqg0098rwd864xxv | 100% | 100% | Level 2 | legacy Anthropic Opus comparison route |
| Claude Opus 4.7 | anthropic | jn7bedafgemk6hecfqtpd6e309864xhq | 100% | 100% | Level 2 | latest Anthropic Opus paid route |
| Claude Sonnet 4.6 | anthropic | jn74qyaygktq550zw4metb3xt5864hfv | 100% | 100% | Level 2 | current Anthropic Sonnet paid route |
| DeepSeek V3.2 | deepseek | jn76nae0e9j4pqakz7zwtj1yn186abyv | 100% | 100% | Level 2 | recent DeepSeek reasoning and agentic paid route |
| DeepSeek V4 Flash | deepseek | jn77m1pvwyaed4n2v6nb6btn1s864kct | 100% | 100% | Level 2 | latest available DeepSeek V4 route with healthy provider capacity |
| DeepSeek V4 Pro | deepseek | jn7aybgpr67x8zswpmegfqytyx869wkp | 100% | 100% | Level 2 | latest DeepSeek V4 Pro paid route |
| Devstral 2512 | mistralai | jn793dym6gfm0tssrp6mgh8es986b8nq | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Gemini 2.0 Flash | jn77f88qts7had7ywj89ncd1yd86715s | 100% | 100% | Level 2 | older Google Flash route for generational comparison | |
| Gemini 2.0 Flash Lite | jn71419q7s8pmwrg8y9095xx9n867qp2 | 100% | 100% | Level 2 | older ultra-cheap Google Flash Lite route | |
| Gemini 2.5 Flash | jn7367rpjc0ar1m1mcpkwq1ahs867sk5 | 100% | 100% | Level 2 | cheap Google Flash route for comparison against Gemini 3 Flash | |
| Gemini 2.5 Flash Lite | jn7a7eaaja7pmzfc76pq7syqc18679yc | 100% | 100% | Level 2 | cheap Google baseline route even though newer Gemini 3 routes are already covered | |
| Gemini 3 Flash Preview | jn72d06xfqwj8pds5qgdq6t2gs8623kp | 100% | 100% | Level 2 | current Google Flash paid route | |
| Gemini 3.1 Flash Lite Preview | jn7ez731knc67nfs7gfshenwhd86777p | 100% | 100% | Level 2 | current Google efficient preview paid route | |
| Gemini 3.1 Pro Preview | jn74r11x1a5denvej7jbyc4p8h8633y8 | 100% | 100% | Level 2 | current Google Pro preview paid route | |
| Gemma 3 12B | jn794e1cgp5ecmz53cx08v2vhx866t4w | 100% | 100% | Level 2 | Gemma 3 mid-size open-model comparison route | |
| Gemma 3 27B | jn79cccx49g16xgxtftyamstsx866ay9 | 100% | 100% | Level 2 | Gemma 3 large open-model comparison route | |
| Gemma 3 4B | jn74gs2nnjg5q8n7ysvsfmdhzh8662ad | 100% | 100% | Level 2 | Gemma 3 small open-model comparison route | |
| Gemma 4 26B A4B | jn770jvh5bx4s74v63yhxmnxnh865vgq | 100% | 100% | Level 2 | newer compact Google Gemma 4 route with public interest | |
| Gemma 4 31B | jn7fdzd95ngzbwn6j42yfs5kzx864qrk | 100% | 100% | Level 2 | recent Google open model route with strong public interest | |
| GLM 4.7 | z-ai | jn7a14xnb15xckyfvyy41q6k4s86bhv1 | 100% | 100% | Level 2 | larger GLM 4.7 route to compare against GLM 4.7 Flash and GLM 5 |
| GLM 5 | z-ai | jn77hqg7j6vmamae2r3hwnv1t1869rww | 100% | 100% | Level 2 | current Z.ai GLM route with strong open-model benchmark interest |
| GLM 5.1 | z-ai | jn77es7pyamhprdbm0bb3dntz1869ydp | 100% | 100% | Level 2 | latest Z.ai flagship paid route |
| GPT OSS 120B | openai | jn74fwhmj5ehh9xmb5jy2rrxqx868gse | 100% | 100% | Level 2 | OpenAI open-weight route people will expect to see benchmarked |
| GPT OSS 20B | openai | jn7f62k61w8er0kyjr36fpph0n862d85 | 100% | 100% | Level 2 | small OpenAI open-weight route for efficient comparison coverage |
| GPT-4.1 Mini | openai | jn7ae2rzspfcdav901hm50bf71868yjf | 100% | 100% | Level 2 | cheap OpenAI workhorse |
| GPT-4.1 Nano | openai | jn70ff7ys17t6z347339a788kh868hka | 100% | 100% | Level 2 | ultra-cheap OpenAI baseline |
| GPT-5.1 | openai | jn7fbr2nfw808e81z8aszvp391864cr9 | 100% | 100% | Level 2 | legacy OpenAI flagship-generation comparison route |
| GPT-5.4 | openai | jn78dxdtvkys549w6ad5sfh6vh863tbm | 100% | 100% | Level 2 | latest OpenAI flagship paid route |
| GPT-5.4 Mini | openai | jn74tnhtxnqj7q7b7pqmg5a7nx863xhv | 100% | 100% | Level 2 | current OpenAI efficient paid route |
| GPT-5.5 | openai | jn7frn897xwxymwwpnbck45ejn8626rf | 100% | 100% | Level 2 | latest OpenAI flagship paid route |
| Granite 4.1 8b | ibm-granite | jn73khjf12dxsp17a9t1eg68ks863rwm | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Grok 3 Mini | x-ai | jn7ftf922rmzy7k0ad1m8e18h5866wed | 100% | 100% | Level 2 | cheap xAI baseline for compact-model compass comparison |
| Grok 4 Fast | x-ai | jn7dsp0pxw3zk1yhe7swg846kh8624vq | 100% | 100% | Level 2 | popular xAI low-cost flagship-family route |
| Grok 4.1 Fast | x-ai | jn7322gpqzvnrj0p808perdjrn867ssk | 100% | 100% | Level 2 | popular current xAI fast paid route |
| Grok 4.20 | x-ai | jn7bpp64vsa9s9n3fj3g0mkdb18677j3 | 100% | 100% | Level 2 | latest xAI paid route |
| Grok 4.3 | x-ai | jn7cfwkqn38wj9715mw02tdxxh8630yc | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Grok Code Fast 1 | x-ai | jn75acm3ttqh3n44gzgafkfqm58660ee | 100% | 100% | Level 2 | cheap xAI specialist route, useful as a weird compass comparison |
| Kimi K2.5 | moonshotai | jn7fgjneas7stn448cbt35fbcs8690g9 | 100% | 100% | Level 2 | recent Moonshot Kimi comparison route |
| Kimi K2.6 | moonshotai | jn7d4w89z1jma790phnzz9d8qh869t8a | 100% | 100% | Level 2 | latest Moonshot Kimi paid route |
| LFM2 24B A2B | liquid | jn790737dwtwgx14e1s31j6ycx867k67 | 100% | 100% | Level 2 | small efficient LiquidAI open-model comparison route |
| Ling 2.6 Flash | inclusionai | jn7ed5ge4j9xakj11zcvnsx2jd865y2k | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Llama 3.3 70b Instruct | meta-llama | jn7bh1gqd6p23gdq346rc3jd1n869bqs | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Llama 4 Maverick | meta-llama | jn7a64svgza9ah7n809x2cbqrx862g9r | 100% | 100% | Level 2 | current Meta Llama paid route |
| Llama 4 Scout | meta-llama | jn7deergpw9v49fk6rj2s0xwb1868tky | 100% | 100% | Level 2 | popular Meta Llama 4 comparison route |
| Mercury 2 | inception | jn7ejrx66rgxbgpy086rg1m665869aam | 100% | 100% | Level 2 | recent Inception comparison route |
| MiniMax M2 | minimax | jn705tsdrjcd0np9gz91ct01ks868wpr | 100% | 100% | Level 2 | cheap MiniMax route for historical small-model comparison coverage |
| MiniMax M2.1 | minimax | jn7f2kxcg3mg932p5txnza06tx8697w7 | 100% | 100% | Level 2 | cheap MiniMax route for small-model comparison coverage |
| MiniMax M2.5 | minimax | jn77fygwkbh1tcwk4sz25kmey58675ct | 100% | 100% | Level 2 | current MiniMax paid route with mandatory reasoning |
| MiniMax M2.7 | minimax | jn797b19f23bqm4ey1n6tr2z0h86980h | 100% | 100% | Level 2 | newer MiniMax route, cheap enough for broad comparison coverage |
| Ministral 3 14B 2512 | mistralai | jn78b7c7pyv9bwxfz63p58xrjs867dg5 | 100% | 100% | Level 2 | cheap Mistral small-model route with full-suite comparison value |
| Ministral 3 3B 2512 | mistralai | jn7cneszh8h4h169wp9m6ftj818669wz | 100% | 100% | Level 2 | tiny Mistral route for low-cost scale comparison |
| Ministral 3 8B 2512 | mistralai | jn72eek26kmhcfna69zsg8m0qs8667a2 | 100% | 100% | Level 2 | very cheap Mistral small route for scale and ideology stability checks |
| Mistral Large 3 2512 | mistralai | jn77znvpt5wtay1jkv1jp7y3an867fk7 | 100% | 100% | Level 2 | current Mistral large paid route |
| Mistral Medium 3.1 | mistralai | jn7egfgd1waqa15wwzyatnjk698673e7 | 100% | 100% | Level 2 | mid-size Mistral route for comparison against Ministral and Saba |
| Mistral Medium 3.5 | mistralai | jn72zhsrwq9m571zcf6mesd5rs865qss | 100% | 100% | Level 2 | current Mistral medium paid route |
| Mistral Saba | mistralai | jn7dn0ckwrdgp6nksteazb2rps866c2a | 100% | 100% | Level 2 | Mistral regional route for Middle East and South Asia comparison |
| Mistral Small 4 | mistralai | jn78dd95913j8fhzm2wpf1wxa1866pjq | 100% | 100% | Level 2 | current Mistral efficient paid route |
| Nemotron 3 Nano 30B A3B | nvidia | jn73p8tfdvn74nyaytf37zjve9867g4t | 100% | 100% | Level 2 | cheap NVIDIA Nemotron 3 route with open-model comparison value |
| Nemotron 3 Super | nvidia | jn79vwtvy4ew6phzgp37bkxncx866ngb | 100% | 100% | Level 2 | current NVIDIA reasoning-capable paid route |
| Nemotron Nano 9B V2 | nvidia | jn715rwtcrnae9trwpm6kwq74d867kba | 100% | 100% | Level 2 | very cheap NVIDIA route with small-model comparison value |
| OLMo 3.1 32B Instruct | allenai | jn70sha61z7rvxw9m1ebac4w298669rz | 100% | 100% | Level 2 | fully open Ai2 American instruct route |
| Phi 4 | microsoft | jn72y2njy87gzbvvymbanmm0b18686rt | 100% | 100% | Level 2 | popular small Microsoft model comparison route |
| Qwen3.5 397B A17B | qwen | jn745fmm7et4q7nq87x5r7yh1586ajta | 100% | 100% | Level 2 | large Qwen open-weight comparison route |
| Qwen3.5 Plus 20260420 | qwen | jn74kq8ve2jy1ms5czap7sqd4d86apss | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Qwen3.6 35B A3B | qwen | jn72r49hq5g308wv4xv0y6rf9s864bjp | 100% | 100% | Level 2 | open-weight mid-size Qwen route for size-class coverage |
| Qwen3.6 Flash | qwen | jn7ccgp27qt3ecc85dq0vaabhh8659q1 | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Qwen3.6 Max Preview | qwen | jn7b14sehdj2rhgte9k6pw824d864pmp | 100% | 100% | Level 2 | live completed Convex full-suite run |
| Reka Edge | rekaai | jn7ca900bsv6ychdfzf2r879js864j6a | 100% | 100% | Level 2 | new low-cost Reka edge-model comparison route |
| Solar Pro 3 | upstage | jn75x7fnknr6d2xwgq8px34pg5869a6m | 100% | 100% | Level 2 | Upstage Korean model route with regional comparison value |
| Trinity Large Preview | arcee-ai | jn76fvf4gfy08hdmsnfxmxdrbx869qc3 | 100% | 100% | Level 2 | high-usage US open-weight Arcee preview route |
| Trinity Large Thinking | arcee-ai | jn7bkw0f7z6fa6z1anr0t4xf2x869xp5 | 100% | 100% | Level 2 | US open-weight Arcee reasoning route |
| Trinity Mini | arcee-ai | jn7ayysp9g6ett652tdhefmpj586550k | 100% | 100% | Level 2 | small US open-weight Arcee MoE route |