Skip to content
Agents & MCP · the remediation loop

Watchdog finds it. Your agent fixes it. The next scan proves it.

Watchdog is the independent codebase-assurance surveyor — it measures, it never fixes. It's read-only by doctrine, so instead of touching your code it serves every finding to your own coding agent over a Model Context Protocol server: prioritised by impact, briefed with the rule that fired and the exact file and line, and verified by the next scan. The hand on your code is always yours.

Works with any MCP client — Claude Code, GitHub Copilot, Cursor, opencode. C#/.NET · a measurement, not an opinion.

The closed loop

Measure → fix → prove, on a loop.

Watchdog scans on your cadence and turns the findings into a ranked, briefed task list. Your agent works the highest-leverage item in your own repo and records a provisional fix. The next scan re-measures and decides: the finding is credited as fixed only when it genuinely stops firing. Nothing the agent asserts ever moves the score.

Watchdog scans

On your calendar cadence — weekly, per sprint, monthly or quarterly — plus the daily security watch.

Prioritised tasks, over MCP

Every finding becomes a briefed task ranked by impact ÷ effort — the rule that fired, the file and line, the estimated point-gain.

Your agent fixes, in your own repo

Claude Code, Copilot, Cursor, opencode — the agent works the fix and records a provisional resolve. Watchdog never touches your code.

The re-scan verifies

The fingerprint delta on the next scan is what marks a finding fixed — never the agent's say-so. The loop repeats.

Read-only throughout — the next scan is the arbiter.

  1. Watchdog scanson your calendar cadence — weekly, per sprint, monthly, quarterly
  2. Prioritised tasksserved over MCP, ranked by impact ÷ effort, briefed to file:line
  3. Your agent fixesin your own repo — records a provisional resolve
  4. The re-scan verifiesfingerprint delta marks it fixed — never the agent's say-so
The loop repeats — the score only moves when the re-scan proves it.
Built for an agent to act on

A ranked, briefed task list — not a wall of findings.

Point an agent at the repo and it starts with audit() — repo health, every lens score, and a remediation plan ranked by impact ÷ effort. next_task() hands back the single highest-leverage item, fully briefed in one object; get_task("D34") pulls the full packet for one dimension. No round-trips, no guessing.

One call, everything needed

Each task packet carries what the dimension measures, the current score, the estimated point-gain and effort, the remediation brief, and every open instance with its fingerprint and file:line.

Ranked by leverage, not noise

The plan ranks whole dimensions by impact ÷ effort — "lift D34 from 6 to 8 for +1.8" — so the agent works what moves the grade, instead of clearing a hundred Info-level findings in an area that's already strong.

No hallucinated locations

Every finding is content-addressed — a stable fingerprint plus exact file and line. The agent never invents a location, and it can diff two scans (or two SARIF runs) and get the same adds and removes.

Honest absence

When a dimension can't be measured — coverage on a suite that won't run, an npm scan on a Python repo — it returns not-measured with the reason, never a phantom 0.

Scores are pinned to a frozen rubric version. Findings ship as SARIF too — helpUri to the rule's intent, partialFingerprints for stable cross-tool identity, security findings tagged with their CWE.

The hollow code that compiles

What we catch that line-scanners pass.

The failure mode of fast, high-volume code isn't bad syntax — a linter catches that. It's code that *looks finished and does nothing*: stubs under confident names, tests that assert nothing, errors quietly swallowed. It type-checks, it reads as done, it sails past a line-level scanner. Watchdog measures the hollowness itself — by shape, deterministically.

SignatureWhat it looks likeWhy a line-scanner passes it
Stubs that look implemented IC1A CalculateTax that returns 0, an async that never awaits, skeleton types, dead branchesValid C#, type-checks. We weight a lone stub differently from pervasive scaffolding.
Tests that assert nothing D10Green tests with no assertions; skipped tests dressed up as coverageCoverage counts the test file; it never asks whether the test actually *checks* a result.
Errors made invisible X3Empty catch {}, a bare rethrow that loses the stack traceType-based checks pass; the failure disappears.
Untracked debt & dead code D17TODO/FIXME/HACK, blanket suppressions, commented-out code, unreferenced symbolsCounted raw at best.
Copy-paste, never parameterised D4Near-duplicate blocksOurs is type-aware and scored by density across the codebase.
One signature, whoever wrote itWe don't guess whether a machine typed it — stylometric "AI detection" is a credibility trap, and we don't make the claim. We measure the hollowness by shape, so a rushed human and an eager model produce the *same* finding and the *same* fix. And it compounds: each signature feeds a per-file quality reading with diminishing returns and floors — the "one stub versus a hundred stubs" distinction a flat rule can't make.
Why it can't be papered over

The re-scan is a ratchet — the score is earned, not talked up.

Because the measurement is deterministic and the next scan re-runs it, a finding can't be dressed up or churned away — it either stops firing or it doesn't. This holds on every scheduled scan, with or without an agent.

Line-insensitive identity

A finding's identity is a hash of its dimension, title and file — never the line number. Reformat, rename a variable, move the block: the finding stays put. You can't churn your way out of it.

Provisional until proven

An agent's "resolved" is a claim, not a verdict. The next scan re-measures and credits it only when the fingerprint is genuinely gone — and reopens it if the rule still fires. No self-reported fixes.

Deterministic, floored scoring

Repeats of the same problem decay — each costs half the last — and every category has a floor, so you can't flood a file with cosmetic findings to game it. Same commit in, same score out.

This runs the same whether the code came from a coding assistant, a contractor, or a 2 a.m. hotfix. The deterministic re-scan is the ratchet under all of it.

The MCP surface

Twelve tools, four jobs — and a way to push back.

Everything runs over one HTTPS endpoint (/mcp), Bearer-token-scoped to a single repo. There is no write-to-repo tool — no commit, no push, no open-PR.

JobToolWhat it does
Read & planauditRepo health, lens scores, and a plan ranked by impact ÷ effort — start here
Read & plannext_taskThe single highest-leverage item, fully briefed
Read & planget_taskThe full packet for one dimension (e.g. D34)
Read & planlist_findingsLatest-scan findings — fingerprint, dimension, level, file:line
Read & planget_findingOne finding by fingerprint
Work the fixclaim_findingA short lease (≈45 min) so two agents don't collide
Work the fixrelease_findingHand a claimed finding back to the open queue
Work the fixresolve_findingRecord a provisional fix — the next scan confirms by fingerprint delta
Push backdispute_findingFlag a scored finding as a false positive → human triage: Fixed or Declined
Push backflag_advisoryFlag an advisory LLM-judged note as unhelpful — a separate channel that never touches the score
Push backreport_detector_gapReport something we missed → improves the detector, never your score
Verifyrequest_rescanA verify re-scan — opt-in per repo, off by default, counts toward your scan budget
The agent proposes; the measurement disposesNo dispute, claim or resolve ever moves the score or erases a finding. A dispute goes to a human — and if declined, the finding carries a maintainer's note explaining why it stands, so the same one isn't re-litigated next scan. Disagreement is a first-class signal; it just isn't a back door to the number.
Open, not closed

The hand on your code is always yours.

Watchdog runs against a throwaway clone and exposes no way to write to your repo. That's deliberate: a measurer that also rewrites the thing it grades can't stay neutral, and you'd lose chain-of-custody on every change. Tools that auto-refactor inside their own engine make the opposite trade.

What Watchdog does

Serves prioritised, briefed findings over MCP; records a provisional resolve; re-measures on the next scan and proves what genuinely moved.

What it never does

Edit, commit, push, or open a PR; move the score on an agent's say-so; touch your working tree. The change — and the credit, and the chain of custody — is always yours.

Connect an agentEach repository has an MCP endpoint and a scoped bearer token. Add them to your agent's MCP configuration — Claude Code, GitHub Copilot, Cursor, opencode, or your own — and the task list is live. The exact endpoint URL and the per-repo token are shown for each repository once it's surveyed. Verify-now re-scans are opt-in per repo (off by default) and count toward your scan budget.

Stop triaging findings by hand. Hand them to your agent.

Sign in with GitHub · no card · C#/.NET · the first full report is €0.