What we measure — the Codebase Assurance Index
The Codebase Assurance Index (CAI) is one reproducible 0–100 score for a whole C#/.NET codebase. It rolls up ten lenses — most dimensions measured by deterministic tools, a few given an advisory, tolerance-banded LLM read. A measurement, not an opinion.
Sign in with GitHub · no card · C#/.NET · the first full report is €0.
What makes the Codebase Assurance Index different.
Reproducible
Every dimension is computed by a deterministic tool reading your code. Commit + frozen rubric → exactly one score. Same commit, same rubric, same advisory data — same number.
Depth is never gated
Every survey — including the free first report — computes the full CAI: all dimensions, all lenses. You never pay for depth; you pay for breadth and cadence.
Verifiable
Read the exact rule each dimension was scored by, and run the same measurement on your own code — the algorithm and rubric are open.
Verify a score yourself →Graded by the open CAI standard — across ten lenses.
Five are always on; five light up with your architecture. Watchdog doesn't grade by house style — it measures against CAI, the Codebase Assurance Index: an open, reproducible 0–100 standard. The full algorithm, the worst-first fold, the firewall, and the four git-history-mined dimensions all live on the standard — open to read, cite, or recompute.
Code health
Complexity, duplication, code shape and naming — how maintainable the code itself is.
Dimensions →Architecture
Module boundaries, coupling, cohesion and dependency direction — whether structure holds up as the repo grows.
Dimensions →Maturity
Docs, ADRs, comments and process signals — how well the project explains and governs itself.
Dimensions →Readiness
Tests, CI gates, observability, resilience and rollback — readiness to run in production.
Dimensions →Security & Compliance
Secrets, dependency CVEs, SAST and licence/PII posture — the deep-scan security lens.
Dimensions →Domain Modelling
DDD tactical health — aggregates, value objects and the invariants your business rules depend on.
Dimensions →Event-Driven
Messaging and integration discipline — outbox, async handlers and contract coupling.
Dimensions →Event Sourcing
Event-store correctness — immutable events, deterministic folds and PII-in-events.
Dimensions →Accessibility
Text alternatives, labels, keyboard semantics, ARIA and a11y enforcement.
Dimensions →Performance
Benchmarks, allocation-aware APIs and async hygiene.
Dimensions →The full vocabulary — every dimension, its evaluator and rubric version — lives on the open standard. Browse the catalog →
Nothing moves the number but the code.
The deterministic score sits on one side of a firewall; an advisory LLM read sits on the other — and it can never cross. That's the difference between a measurement and asking an LLM, which answers differently every time.
The AI only ever advises
A few findings get an advisory, tolerance-banded LLM read that can never, by construction, move the headline number. It explains in plain English; it never scores. The measurement stays pure.
Your inputs never score
Your compliance declarations, a suppressed finding, your contract profile — they change what the artifact says, never the CAI. A declaration is presentation; the score is measurement. Neither party to a contract can tilt the number — only the code changing moves it (or a disclosed advisory refresh like a new CVE).
How the lenses roll up.
The CAI is a weighted roll-up of the lens scores under the frozen rubric — not an average you can't see inside.
Core always counts; conditional lenses only when they apply
The five core lenses always contribute. The conditional lenses contribute only when the code calls for them, and the weights re-normalise — so a repo is never penalised for a lens that doesn't apply.
A critical lens caps the headline
The roll-up can't read Strong while a lens reads Critical: a single critical-band lens caps the CAI, so the one number can't hide a serious failure in one dimension behind strong scores elsewhere.
Mined from git history, not just the code
Four behavioural dimensions — hotspots, bus factor, knowledge freshness, change coupling — are read deterministically from your git history and scored into the same CAI.
So a contract floor of CAI ≥ 80 means every always-on lens is Strong or better with no lens Critical — decomposable, not opaque. The authoritative spec: cai.canine.dev/spec
One fixed scale, five bands — and a pin that never moves for anyone.
Every score renders on the same worst→best scale: Critical / Weak / Adequate / Strong / Exemplary, cut at 25 / 50 / 70 / 90. The pin marks the score's exact spot — position on the fixed scale *is* the reading, never a corpus-relative rank.
A reading you can act on — not a thousand findings to triage.
Watchdog is calibrated against a corpus of real .NET codebases, so the idioms a line-level checker trips over — a repository that coheres through a base class, a test that asserts through a harness, an interface a façade is obliged to implement — don't read as defects. Zero setup, no rule-tuning weekend: the false-positives are calibrated out before you ever see them.
Tuned on real code
Every detector is tested that it fires on the real defect and stays quiet on the idiom, against a public reference corpus. On that corpus the typical repository's findings are over 95% real, and reference clean-architecture codebases exceed 99%.
Disagree, and it learns
Any scored finding can be disputed in one click — routed to human triage, and a confirmed false-positive becomes a detector test so it can't recur. The instrument sharpens with use; the score never bends to the dispute.
Quiet by design
Findings are ranked by what moves the grade, folded so one stray stub barely registers, and a lens returns not-measured, with the reason rather than a phantom zero. Volume is never mistaken for rigour.
Calibration is an ongoing programme — idiom-heavy codebases still surface residual noise we keep tuning down, and every dispute feeds the next round.
Order with AddItem, ChangeShipping, Cancel) is cohesive by its invariant, yet looks like a god-class to the raw metric. Watchdog recognises the shapes LCOM4 provably mis-measures — domain aggregates, data-access repositories, source-generated view-models, contract-mandated plumbing — and exempts them, while still flagging the real god-object. The result: a cohesion signal you can act on, not a list to triage.The hollow code that compiles — measured deterministically.
The failure mode of fast, high-volume code isn't bad syntax — a linter catches that. It's code that *looks finished and does nothing*. Watchdog measures the hollowness itself — by shape.
Stubs that look implemented
A CalculateTax that returns 0, an async method that never awaits, skeleton types, dead branches. Valid C#, type-checks — and does nothing.
Tests that assert nothing
Green tests with no assertions; skipped tests dressed up as coverage. Coverage counts the test file; it never asks whether the test actually checks a result.
Errors made invisible
Empty catch {} blocks, a bare rethrow that loses the stack trace. Type-based checks pass; the failure disappears.
Untracked debt & dead code
TODO/FIXME/HACK, blanket suppressions, commented-out code, unreferenced symbols — counted raw at best by other tools; scored here.
Copy-paste, never parameterised
Near-duplicate blocks — type-aware detection, scored by density across the codebase.
We don't guess whether a machine typed it — stylometric "AI detection" is a credibility trap, and we refuse to make the claim. We measure the hollowness by shape, so a rushed human and an eager model produce the same finding and the same fix.
Freeze the rubric, keep the score constant.
Watchdog scores with a versioned rubric. Any change that can move a score for unchanged code bumps the rubric version.
Versioned and contestable
The rubric is contestable: a scoring change that isn't reflected in the published spec fails our CI, so every number stays re-derivable from a rule you can read.
Contract rubrics
Pin a repository to a frozen rubric and the ruler stops moving under you — the same commit re-scores to the same number under that rubric, so any movement you see is the asset changing, never the ruler. Advisory data still refreshes, so a new CVE can legitimately move a security finding — a real signal, disclosed in the changelog.
Get the measurement. No depth is ever gated.
Sign in with GitHub · no card · C#/.NET · the first full report is €0.