research / evidence

Why Local Matters

The claim behind Crucible is simple: local AI is already here, already cheap, and already yours. Here is the evidence we earned for that claim, measured the hard way on three rented GPUs at ordinary cloud prices.

Finding 1 · The expensive GPU is the cheap one

Running the same eight-billion-parameter model under the same inference config on three Thunder Compute tiers, the most expensive hourly rate is the cheapest per token.

live snapshot · qwen2.5:14b

gpu	$/hr	tok/s	$ per 1M tokens
H100	$2.49	83.5	$8.28← winner
A6000	$0.35	2.6	$37.92
A100	$0.78	5.1	$42.56

spread: 5.1× cheaper per token on the fastest lane

The intuition is hardware-physics. Throughput scales much faster than hourly price as you move up Thunder's tiers — fifteen times faster on an H100, only three times the cost. The ratio is inverted and the cheap GPU ends up wasting money.

Evidence grade: REASONED · Measured 2026-04-17 · Ollama 0.21.0 · OLLAMA_CONTEXT_LENGTH=8192 · q4 KV cache · flash-attn. Source: cockpit-eidos/briefs/2026-04-17-gpu-battery-and-live-eidosagi.md.

Finding 2 · A narrator on dedicated silicon is ~50× cheaper

A freshly-restarted RTX A6000 at $0.35/hr delivers 82 tokens per second on llama3.1:8b. A thirty-token headline summary takes ~0.37 seconds, which costs $0.0000356 of GPU time — less than one-hundredth of a cent.

The same summary through a hosted frontier model runs around $0.004 per event by conservative estimate. That is approximately fifty times more.

And the A6000 is already paid for by the hour — so the marginal cost of a summary is effectively zero once the GPU is warm. That's the number powering the mission bar you see above: every event authored by the local silicon is ≈50× leverage banked.

Evidence grade: REASONED · Derived from A6000 unconstrained measurement (verified by curl /api/generate) · Thunder prototyping price sheet · conservative hosted-event cost floor (env CLAUDE_EVENT_COST_USD=0.004).

Why we're moving ourselves

The site is running a live migration of its own narration — from a hosted frontier model (Claude, authoring as eidos) to a local llama on the A6000 (authoring as eidos-local). The mission progress bar in the header shows the split rising toward the 90% goal.

We are doing this in public because we believe self-monitoring, self-improving AI should do so at lower and lower costs as its capabilities grow — the opposite of how the frontier is priced today. If the claim is right, the site you are reading will mostly be writing itself by the end of the event.