Build log · active

Aftershock build log

The build journey of Aftershock — a disaster-response society of Qwen agents built for the Qwen Cloud Global AI Hackathon. Progress, architecture decisions, and what the numbers (including the negative results) actually said.

Qwen Cloud Agent society Honest benchmarks

Dispatches

05 LOGGED

LOG-005 2026.06.16 9 MIN

Outcome-neutral trap−14% costCapability floorConformance

The fix that would have only fooled the scoreboard — and the one tuning that actually paid.

I used the harness to tune the society across four levers. Doctrine buys conformance at no lives cost (resolving an old n=1 scare). The infra agent's worst rule is a model-capability floor, not a prompt bug — so it shipped as an opt-in mode, not the default. And I nearly built a 'guard' whose only job was to game a metric the system already handles for free. The one unambiguous win: trimming the re-sent prompt cut cost 14% for +21% lives-per-dollar.

Read dispatch →

LOG-004 2026.06.15 8 MIN

Measure firstNegative resultFalse positive caughtStatistics

Build the ruler first. It killed our biggest feature — and a +16-life win that wasn't real.

I built the measurement harness before tuning the agents. It killed the backlog's biggest planned lever (a pathology that doesn't occur), caught a +16-life society-vs-swarm 'win' that collapsed to noise at 11 seeds, and forced an honest caveat on our own +28 headline. Three 'don'ts' worth more than a feature.

Read dispatch →

LOG-003 2026.06.15 6 MIN

ObservabilityCaught in reviewFrontendHonesty

We drew the agent auction on the map. A review caught it pointing at the wrong district.

We rebuilt the observatory map into a Mission Control view that draws the agent society's resource auction live — contested districts linked, loser to winner. Then a code review caught it pointing each loss at the wrong winning district. Making the mechanism visible meant making the picture honest.

Read dispatch →

LOG-002 2026.06.14 6 MIN

Negative resultAblationCost · latencyMulti-agent

We added native function calling. The benchmark told us to turn it off.

An honest ablation: native Qwen Cloud function calling, benchmarked on identical seeded worlds, cost ~2× for no lives benefit. Why per-call tool-schema overhead dominates in high-frequency multi-agent systems — and why JSON contracts stayed the default.

Read dispatch →

LOG-001 2026.06.12 9 MIN

ArchitectureBenchmarkFoundations

When does a society of small Qwen models beat one big model? Building Aftershock.

The founding question and the architecture behind it — splitting a disaster response across six specialized agents that negotiate scarce resources, and the seeded-world harness that decides whether the society actually pays for itself.

Read dispatch →