Models
6 weights on H100 · benchmarks live from SQLiteEvery model we've pulled, with what it is (registry) and what it does (benchmarks). One SSR query per table, joined in memory, no client-side fetch. Fastest GPU wins the model's throughput row.
see also: raw catalog · /api/models · /api/models/catalog
qwen3.6:35b-a3b
qwen 3.6moeharness defaultrace- params
- 35B
- active
- 3.0B
- size
- 23 GB
- released
- 2026-04-16
no benchmark samples yet
llama3.2:1b
llama 3.2denserace- params
- 1.0B
- active
- —
- size
- 1.3 GB
- released
- 2024-09-25
live throughput571 tok/s on H100H100571 tok/s$1.21/MA10048 tok/s$4.56/MA600044 tok/s$2.20/Mllama3.1:8b
llama 3.1denserace- params
- 8.0B
- active
- —
- size
- 4.9 GB
- released
- 2024-07-23
live throughput192 tok/s on H100H100192 tok/s$3.60/MA10017 tok/s$12.62/MA600013 tok/s$7.76/Mqwen2.5:72b
qwen 2.5denserace- params
- 72B
- active
- —
- size
- 47 GB
- released
- 2024-09-19
live throughput28 tok/s on H100H10028 tok/s$24.33/Mqwen2.5:14b
qwen 2.5denserace- params
- 14B
- active
- —
- size
- 9.0 GB
- released
- 2024-09-19
live throughput106 tok/s on H100H100106 tok/s$6.56/MA600016 tok/s$6.05/MA10012 tok/s$18.74/Mqwen2.5:1.5b
qwen 2.5denserace- params
- 1.5B
- active
- —
- size
- 986 MB
- released
- 2024-09-19
live throughput185 tok/s on H100H100185 tok/s$3.73/MA600060 tok/s$1.62/MA10038 tok/s$5.70/M