windags.ai / architecture

Local cascade. Server telemetry. One loop.

Every retrieval call runs on your machine. Anonymized usage events stream to a Cloudflare Worker at api.windags.ai. Server-side aggregates feed back into ranking so every install gets sharper as the network sees more tasks. Honest status of each piece is below.

The picture in one diagram

The MCP server ships with the skill catalog, the BM25 index, the Tool2Vec embeddings, the cross-encoder weights, and an attribution DB. The retrieval cascade runs there — no API key, no network round-trip on the hot path.

The Cloudflare Worker is a one-way street for writes (telemetry events) and a read-only API for aggregates (popular skills, gap clusters, per-skill health). Aggregates are what makes the flywheel turn across installs.

Solid arrows = synchronous request/response. Dashed grey = anonymous fire-and-forget telemetry, never blocks the user. Dashed blue = aggregates read-back (the cross-user feedback path — fetched by every MCP at startup and blended into cascade Stage 6).

Verify it yourself

The Worker is public-read. You can hit it right now:

# health check
$ curl https://api.windags.ai/v1/health
{"status":"ok","service":"windags-api",...}

# most-grafted skills in the last 7 days
$ curl "https://api.windags.ai/v1/popular?window=7d&limit=5"

# task clusters that scored below 0.4 — i.e. skills the catalog is missing
$ curl "https://api.windags.ai/v1/gaps?min_requests=3"

# aggregated health for one skill
$ curl https://api.windags.ai/v1/skills/api-architect/health

And you can simulate a telemetry event the way the MCP does:

$ curl -X POST https://api.windags.ai/v1/telemetry \
    -H "Content-Type: application/json" \
    -d '{
      "event_type":"graft",
      "installation_id":"my-test-id",
      "mcp_version":"2.3.1",
      "manifest_hash":"deadbeef",
      "runtime":"claude-code",
      "task_hash":"abc123",
      "selected_skills":["api-architect"],
      "top_score":0.82
    }'
{"ok":true}

That row is now visible in /v1/popular and contributing to /v1/skills/api-architect/health within a few seconds.

What's live, what's planned

Piece	Where	Status
Stage 1 — BM25 (stemmed + bigrams)	Local MCP, always on	live
Stage 2 — Tool2Vec bi-encoder	Local MCP, when embeddings load	live
Stage 3 — RRF fusion (k=60)	Local MCP, when Stage 2 ran	live
Stage 4 — Cross-encoder re-rank (MS-MARCO MiniLM)	Local MCP, when ML deps loaded	live
Stage 5 — Attribution k-NN over local outcomes	Local MCP, reads `.windags/triples/`	live
Telemetry write — `POST /v1/telemetry`	MCP → Worker → D1, fire-and-forget	live
Aggregate reads — `/v1/popular`, `/v1/gaps`, `/v1/skills/:id/health`	Worker → D1, public	live
Version & manifest check — `GET /v1/version`	MCP checks at startup	live
Cross-user feedback into ranking — MCP reads global priors (3-tier: manifest ⊆ exact ⊆ any) and blends as Stage 6	Worker: `/v1/priors/batch` + `/v1/priors/freshness`. Client: `GlobalPriorsClient` with adaptive refresh.	live
Gap-driven skill authoring — auto-PR from `/v1/gaps` clusters	Worker → GitHub Actions	planned

The loop is closed and live. The cascade runs locally, telemetry flows to the Worker, the Worker aggregates correctly, and POST /v1/priors/batch serves 3-tier priors (manifest_match ⊆ exact_match ⊆ any_match) back into ranking as cascade Stage 6. The MCP attaches a GlobalPriorsClient at startup, warms the cache once (chunked at 500 skills per call), then polls /v1/priors/freshness every five minutes. A skill re-fetches mid-session only when its new event count would measurably shift the mean (Δn ≥ 10 absolute AND Δn / n_cached ≥ 25% relative, rate-limited to one refresh per hour). Every install learns from every other.

The retrieval cascade (all local)

Each stage runs on your machine against the skill catalog that ships with the plugin. Each stage gracefully degrades — BM25 alone returns a coherent answer; later stages refine it as data and compute become available.

Why local? The skill bodies, the BM25 index, the embeddings, and the cross-encoder weights all ship with the plugin. Keeping the cascade local means no API key, no network on the hot path, no per-call cost, and no provider can see the prompts you're routing through it. Stage 5's attribution k-NN reads from ~/.windags/triples/ — also local.

The flywheel

Every graft, reference load, and feedback event the MCP fires becomes a row in the Worker's D1 table. SQL aggregates surface popularity, gap clusters, and per-skill health. The MCP fetches those aggregates at startup via /v1/priors/batch and blends them into cascade Stage 6 — three nested tiers (manifest ⊆ exact ⊆ any), decomposed into exclusive subsets, capped at a 0.2 blend weight. A background freshness probe keeps trending skills current without thundering the API.

1 · Use

Agent calls graft. Cascade returns primary skill bodies + reference manifests, all from disk.

2 · Observe

MCP fires anonymized event to /v1/telemetry. Worker inserts into telemetry_events.

3 · Learn

SQL aggregates: graft counts, top scores, gap clusters when top_score < 0.4, reference load rates.

4 · Apply

MCP pulls /v1/priors/batch at startup; blends 3-tier priors as cascade Stage 6 (capped 0.2). A background freshness probe refreshes only when Δ would measurably shift the mean.

5 · Improve

Skills with high acceptance climb across every install; stale ones demote; gaps surface to skill authors via /v1/gaps.

Privacy

Telemetry is opt-out via WINDAGS_TELEMETRY=off. Default is anonymous mode: task hash, not task text. The skill catalog, your prompts, your repo, your CLAUDE.md, and the actual graft response body never leave your machine.

off Disabled

No HTTP calls to api.windags.ai.
Plugin still works — 100% functional.
No flywheel contribution from this install.

anonymous Default

Sent: installation UUID, task hash, skill IDs, rating, top_score.
Never sent: raw task text, skill bodies, repo paths, CLAUDE.md, conversation summary.

full Opt-in

Adds: raw task text (helps cluster similar low-score queries into gaps).
Still never sent: skill bodies, repo paths, CLAUDE.md, file diffs.

D1 schema, abridged

Table	What's in it
telemetry_events	Append-only event log. `event_type ∈ {graft, reference, search, catalog}`. Indexed on type + installation + timestamp.
tool_call_events	Lighter-weight per-tool-call stream, sampled to once per 24h per machine. Drives the internal usage dashboard.
installations	One row per anonymous installation UUID. First/last seen, runtime, MCP version, event count.
skill_gaps	Auto-populated when a graft scores below 0.4. Same task pattern recurring → "skill that should exist."

Aggregates like /v1/popular are computed on the fly with json_each(selected_skills) over telemetry_events — no precomputed rollup tables today, which keeps the schema small and the truth in one place.