← Threshold Signalworks
Driftwatch is a measurement engine for language model behaviour. It detects when models drift from expected behaviour across versions, prompt changes, and workflow modifications. Reproducible evaluation runs with full provenance chains.
Where Keel prevents bad outcomes in real time, Driftwatch measures whether the model is trending toward them. It answers: "is this model still doing what we expect, and can we prove it?"
What it measures
NADR
Needs-Ask Detection Rate
Does the model detect when a user needs help, even when they don't ask directly?
ORR
Overconfident Response Rate
How often does the model assert confidence it hasn't earned?
SCR
Safety Compliance Rate
Do safety constraints hold across extended sessions and compaction?
WTR
Windsock Trigger Rate
How often does the model's uncertainty signal fire before a failure?
Key capabilities
- Cross-model validation. Results showing +65 percentage point improvements in needs-detection across Mistral 7B, Llama 3.1 8B, and Llama 3.1 70B with zero regression.
- Reproducible artefact packs. Every evaluation run produces a provenance-chained artefact pack: inputs, outputs, model versions, timestamps, hashes. Anyone can re-run and verify.
- Drift detection. Track metric changes across model updates, prompt modifications, and deployment changes. Know when behaviour shifts before users report it.
- Keel integration. Consumes Keel telemetry events. When Keel blocks an action or quarantines a deletion, Driftwatch records it as a data point for behavioural analysis.
Driftwatch is in active development. The evaluation harness is functional and has been used in published research. The public release will follow as the first external-facing artefact pack is prepared.
Research and methodology details at threshold.systems. Licence: Apache 2.0 (evaluation harness). ORCID 0009-0004-1442-1743.