run detail
thunder-h100-3800c438c8
H100 eval suite (backfill from profile.json)
started 4h agoended 1h agoduration 153mstatus completedsession thunder-backfill
lanes
| gpu | tok/s | latency (ms) | $/hr | $/1M tokens |
|---|---|---|---|---|
| H100winner | 154.7 | 742 | $2.49 | $4.47 |
eval scores · 120
Rubric-scored quality measurements on this run's model outputs. Higher composite = better.
| model | use case | test | composite | tok/s |
|---|---|---|---|---|
| qwen2.5:14b | chunking | chunk_technical_doc | 94.0 | 2.6 |
| qwen2.5:14b | chunking | chunk_technical_doc | 94.0 | 3.1 |
| qwen2.5:14b | chunking | chunk_mixed_content | 92.0 | 2.7 |
| qwen2.5:14b | chunking | chunk_mixed_content | 92.0 | 2.9 |
| qwen2.5:14b | chunking | chunk_code_narrative | 94.0 | 3.0 |
| qwen2.5:14b | chunking | chunk_code_narrative | 94.0 | 2.8 |
| qwen2.5:14b | chunking | chunk_short_text | 97.0 | 3.5 |
| qwen2.5:14b | chunking | chunk_short_text | 97.0 | 3.4 |
| qwen2.5:14b | search_query | sq_temporal_filter | 84.0 | 3.4 |
| qwen2.5:14b | search_query | sq_temporal_filter | 84.0 | 3.4 |
| qwen2.5:14b | search_query | sq_code_search | 100.0 | 3.4 |
| qwen2.5:14b | search_query | sq_code_search | 84.0 | 3.0 |
| qwen2.5:14b | search_query | sq_multi_source | 96.3 | 2.9 |
| qwen2.5:14b | search_query | sq_multi_source | 96.3 | 3.0 |
| qwen2.5:14b | search_query | sq_memory_recall | 100.0 | 3.1 |
| qwen2.5:14b | search_query | sq_memory_recall | 100.0 | 3.0 |
| qwen2.5:14b | search_query | sq_delta_search | 84.0 | 3.3 |
| qwen2.5:14b | search_query | sq_delta_search | 84.0 | 3.2 |
| qwen2.5:14b | context_synthesis | synth_architecture | 93.2 | 3.2 |
| qwen2.5:14b | context_synthesis | synth_architecture | 92.6 | 3.1 |
| qwen2.5:14b | context_synthesis | synth_dietary | 87.0 | 3.5 |
| qwen2.5:14b | context_synthesis | synth_dietary | 87.0 | 3.6 |
| qwen2.5:14b | context_synthesis | synth_conflicting | 67.0 | 3.5 |
| qwen2.5:14b | context_synthesis | synth_conflicting | 87.0 | 3.5 |
| qwen2.5:14b | memory_extraction | mem_dietary | 90.9 | 3.2 |
| qwen2.5:14b | memory_extraction | mem_dietary | 94.7 | 3.3 |
| qwen2.5:14b | memory_extraction | mem_incident | 100.0 | 3.3 |
| qwen2.5:14b | memory_extraction | mem_incident | 100.0 | 3.4 |
| qwen2.5:14b | memory_extraction | mem_preferences | 100.0 | 3.3 |
| llama3.1:8b | adapter_extraction | adapt_email | 100.0 | 4.7 |
| llama3.1:8b | adapter_extraction | adapt_email | 100.0 | 5.4 |
| llama3.1:8b | adapter_extraction | adapt_imessage | 100.0 | 5.4 |
| llama3.1:8b | adapter_extraction | adapt_imessage | 100.0 | 5.8 |
| llama3.1:8b | adapter_extraction | adapt_code_file | 58.0 | 5.2 |
| llama3.1:8b | adapter_extraction | adapt_code_file | 58.0 | 6.4 |
| llama3.1:8b | adapter_extraction | adapt_voice_memo | 100.0 | 6.2 |
| llama3.1:8b | adapter_extraction | adapt_voice_memo | 100.0 | 6.4 |
| llama3.1:8b | classification | cls_email | 100.0 | 6.4 |
| llama3.1:8b | classification | cls_email | 100.0 | 6.8 |
| llama3.1:8b | classification | cls_imessage | 100.0 | 6.5 |
showing 40 of 120