Full Table
Detailed per-run data
| Rank ↑ | Model | Solved | Failed | Guesses | Cost | Time | Completed |
|---|---|---|---|---|---|---|---|
| #1 | GPT-5.4 Mini (high) | 77/100 | 0 | 695 | $2.74 | 5527.3s | 18/03/2026, 15:23:50 |
| #2 | Gemini 3 Flash (dynamic) | 75/100 | 0 | 716 | $0.28 | 665.5s | 12/03/2026, 03:08:38 |
| #3 | GPT-5 Mini (medium) | 71/100 | 3 | 855 | $0.92 | 4739.9s | 12/03/2026, 03:09:20 |
| #4 | Gemini 3.1 Flash Lite (medium) | 62/100 | 0 | 978 | $0.32 | 788.9s | 12/03/2026, 03:09:27 |
| #5 | GPT-5.4 Mini (medium) | 60/100 | 0 | 1019 | $0.84 | 1649.0s | 17/03/2026, 17:51:34 |
| #6 | Gemini 3.1 Flash Lite (minimal) | 46/100 | 0 | 1294 | $0.10 | 394.0s | 12/03/2026, 03:09:31 |
| #7 | GPT-5.4 Nano (medium) | 44/100 | 0 | 1326 | $0.25 | 1797.2s | 17/03/2026, 17:53:17 |
| #8 | Claude Haiku 4.5 | 33/100 | 0 | 1522 | $0.43 | 597.4s | 12/03/2026, 03:09:38 |