windags.ai  /  architecture

Local cascade. Server telemetry. One loop.

Every retrieval call runs on your machine. Anonymized usage events stream to a Cloudflare Worker at api.windags.ai. Server-side aggregates feed back into ranking so every install gets sharper as the network sees more tasks. Honest status of each piece is below.

The picture in one diagram

The MCP server ships with the skill catalog, the BM25 index, the Tool2Vec embeddings, the cross-encoder weights, and an attribution DB. The retrieval cascade runs there — no API key, no network round-trip on the hot path.

The Cloudflare Worker is a one-way street for writes (telemetry events) and a read-only API for aggregates (popular skills, gap clusters, per-skill health). Aggregates are what makes the flywheel turn across installs.

EDITOR / AGENT Claude Code · Codex Cursor · Cline /next-move slash skill 5-stage meta-DAG one MCP call per node no API key needed LOCAL MCP SERVER windags-skills plugin catalog · BM25 · embeddings · cascade windags_skill_search windags_skill_graft · _graft_batch windags_skill_reference windags_history · _validate_dag windags_estimate_cost · _node_requirements implicit telemetry hook fire-and-forget after each tool CLOUDFLARE WORKER api.windags.ai WRITES POST /v1/telemetry POST /v1/events READS GET /v1/popular · /v1/gaps GET /v1/skills/:id/health D1 (SQLITE) windags-telemetry events · installs · gaps · health MCP call skill body anonymous · fire-and-forget popularity / gap bias (cross-user feedback: live)

Solid arrows = synchronous request/response. Dashed grey = anonymous fire-and-forget telemetry, never blocks the user. Dashed blue = aggregates read-back (the cross-user feedback path — fetched by every MCP at startup and blended into cascade Stage 6).

Verify it yourself

The Worker is public-read. You can hit it right now:

# health check
$ curl https://api.windags.ai/v1/health
{"status":"ok","service":"windags-api",...}

# most-grafted skills in the last 7 days
$ curl "https://api.windags.ai/v1/popular?window=7d&limit=5"

# task clusters that scored below 0.4 — i.e. skills the catalog is missing
$ curl "https://api.windags.ai/v1/gaps?min_requests=3"

# aggregated health for one skill
$ curl https://api.windags.ai/v1/skills/api-architect/health

And you can simulate a telemetry event the way the MCP does:

$ curl -X POST https://api.windags.ai/v1/telemetry \
    -H "Content-Type: application/json" \
    -d '{
      "event_type":"graft",
      "installation_id":"my-test-id",
      "mcp_version":"2.3.1",
      "manifest_hash":"deadbeef",
      "runtime":"claude-code",
      "task_hash":"abc123",
      "selected_skills":["api-architect"],
      "top_score":0.82
    }'
{"ok":true}

That row is now visible in /v1/popular and contributing to /v1/skills/api-architect/health within a few seconds.

What's live, what's planned

PieceWhereStatus
Stage 1 — BM25 (stemmed + bigrams) Local MCP, always on live
Stage 2 — Tool2Vec bi-encoder Local MCP, when embeddings load live
Stage 3 — RRF fusion (k=60) Local MCP, when Stage 2 ran live
Stage 4 — Cross-encoder re-rank (MS-MARCO MiniLM) Local MCP, when ML deps loaded live
Stage 5 — Attribution k-NN over local outcomes Local MCP, reads .windags/triples/ live
Telemetry write — POST /v1/telemetry MCP → Worker → D1, fire-and-forget live
Aggregate reads — /v1/popular, /v1/gaps, /v1/skills/:id/health Worker → D1, public live
Version & manifest check — GET /v1/version MCP checks at startup live
Cross-user feedback into ranking — MCP reads global priors (3-tier: manifest ⊆ exact ⊆ any) and blends as Stage 6 Worker: /v1/priors/batch + /v1/priors/freshness. Client: GlobalPriorsClient with adaptive refresh. live
Gap-driven skill authoring — auto-PR from /v1/gaps clusters Worker → GitHub Actions planned

The loop is closed and live. The cascade runs locally, telemetry flows to the Worker, the Worker aggregates correctly, and POST /v1/priors/batch serves 3-tier priors (manifest_match ⊆ exact_match ⊆ any_match) back into ranking as cascade Stage 6. The MCP attaches a GlobalPriorsClient at startup, warms the cache once (chunked at 500 skills per call), then polls /v1/priors/freshness every five minutes. A skill re-fetches mid-session only when its new event count would measurably shift the mean (Δn ≥ 10 absolute AND Δn / n_cached ≥ 25% relative, rate-limited to one refresh per hour). Every install learns from every other.

The retrieval cascade (all local)

Each stage runs on your machine against the skill catalog that ships with the plugin. Each stage gracefully degrades — BM25 alone returns a coherent answer; later stages refine it as data and compute become available.

STAGE 1 BM25 stemmed + bigrams lexical baseline STAGE 2 Tool2Vec task embedding · cosine vs skill vecs STAGE 3 RRF Fusion reciprocal rank k = 60 STAGE 4 Cross-encoder MS-MARCO MiniLM reranks top-20 STAGE 5 Attribution k-NN similar-task outcome blend Query → ranked skill list always on when embedding present when stage 2 ran when ML deps loaded when DB has data

Why local? The skill bodies, the BM25 index, the embeddings, and the cross-encoder weights all ship with the plugin. Keeping the cascade local means no API key, no network on the hot path, no per-call cost, and no provider can see the prompts you're routing through it. Stage 5's attribution k-NN reads from ~/.windags/triples/ — also local.

The flywheel

Every graft, reference load, and feedback event the MCP fires becomes a row in the Worker's D1 table. SQL aggregates surface popularity, gap clusters, and per-skill health. The MCP fetches those aggregates at startup via /v1/priors/batch and blends them into cascade Stage 6 — three nested tiers (manifest ⊆ exact ⊆ any), decomposed into exclusive subsets, capped at a 0.2 blend weight. A background freshness probe keeps trending skills current without thundering the API.

1 · USE graft locally 2 · OBSERVE events to D1 3 · LEARN aggregate + gap detect 4 · APPLY priors → Stage 6 blend 5 · IMPROVE sharper recs next time next user · next task · sharper graft

1 · Use

Agent calls graft. Cascade returns primary skill bodies + reference manifests, all from disk.

2 · Observe

MCP fires anonymized event to /v1/telemetry. Worker inserts into telemetry_events.

3 · Learn

SQL aggregates: graft counts, top scores, gap clusters when top_score < 0.4, reference load rates.

4 · Apply

MCP pulls /v1/priors/batch at startup; blends 3-tier priors as cascade Stage 6 (capped 0.2). A background freshness probe refreshes only when Δ would measurably shift the mean.

5 · Improve

Skills with high acceptance climb across every install; stale ones demote; gaps surface to skill authors via /v1/gaps.

Privacy

Telemetry is opt-out via WINDAGS_TELEMETRY=off. Default is anonymous mode: task hash, not task text. The skill catalog, your prompts, your repo, your CLAUDE.md, and the actual graft response body never leave your machine.

off Disabled

  • No HTTP calls to api.windags.ai.
  • Plugin still works — 100% functional.
  • No flywheel contribution from this install.

anonymous Default

  • Sent: installation UUID, task hash, skill IDs, rating, top_score.
  • Never sent: raw task text, skill bodies, repo paths, CLAUDE.md, conversation summary.

full Opt-in

  • Adds: raw task text (helps cluster similar low-score queries into gaps).
  • Still never sent: skill bodies, repo paths, CLAUDE.md, file diffs.

D1 schema, abridged

TableWhat's in it
telemetry_eventsAppend-only event log. event_type ∈ {graft, reference, search, catalog}. Indexed on type + installation + timestamp.
tool_call_eventsLighter-weight per-tool-call stream, sampled to once per 24h per machine. Drives the internal usage dashboard.
installationsOne row per anonymous installation UUID. First/last seen, runtime, MCP version, event count.
skill_gapsAuto-populated when a graft scores below 0.4. Same task pattern recurring → "skill that should exist."

Aggregates like /v1/popular are computed on the fly with json_each(selected_skills) over telemetry_events — no precomputed rollup tables today, which keeps the schema small and the truth in one place.