ADR 0017: Queue health check and drift-detection layer¶
- Status: Accepted
- Date: 2026-05-25
Context¶
The framework has a queue schema validator (scripts/check-framework-task-queue.sh) that
verifies structural correctness: required fields, allowed status values, dependency
references, and uniqueness of task IDs. It also enforces the deterministic-preparation
rule that at most one dependency-ready pending task exists at a time.
However, schema correctness does not guarantee that queue state accurately reflects repository reality. Several failure modes can cause the queue to drift silently:
-
Stale
related_docsreferences — a queue entry references a doc that has been renamed, moved, or deleted. The schema validator does not check whether referenced files exist in the repository. -
Missed state transitions after merge — a task's implementation PR is merged but the queue entry is not updated from
in_progresstodone. The queue then has anin_progressitem with no corresponding active PR. -
Missed
blocked → pendingpromotion — a task remainsblockedafter all its dependencies reachdone, preventing merge-triggered preparation from selecting it. -
Premature
pendingmarking — a task is markedpendingbefore its dependencies aredone, which can cause merge-triggered preparation to select the wrong task. -
Superseded-dependency traps — a non-terminal task depends on a task that has been marked
superseded. Since onlydonesatisfies the dependency model, the blocked task can never become dependency-ready. -
Multiple concurrent
in_progressentries — more than one task inin_progresssimultaneously, which may indicate a missed state transition.
None of these drift patterns are caught by check-framework-task-queue.sh because they
require semantic awareness of state consistency, not just schema validity.
Decision¶
Introduce a dedicated lightweight queue health and drift-detection layer alongside the existing schema validator, following the same pattern as prior check additions (ADR 0015, ADR 0016).
1. scripts/check-queue-health.sh¶
A new script that checks semantic drift signals not covered by the schema validator:
-
Check 1: related_docs existence — every file path in every
related_docsarray must resolve to an actual file in the repository. Stale references are errors. -
Check 2: blocked tasks with all deps done — tasks in
blockedstatus whose entiredepends_onlist isdonemust be promoted topending. Remainingblockedis an error. -
Check 3: pending tasks with unresolved deps — tasks in
pendingstatus that have dependencies not yetdonehave drifted; they must beblockedor the dependencies must be resolved first. -
Check 4: superseded-dependency traps — non-terminal tasks (
blocked,pending,in_progress) that depend onsupersededtasks can never become dependency-ready. The dependency reference must be updated or replaced. -
Check 5: multiple in_progress simultaneously — more than one
in_progresstask at a time is reported as a warning. This may be intentional in some cases, but is unusual and worth human review.
The script follows the existing framework check pattern: set -euo pipefail, repo-root
detection via BASH_SOURCE, and an embedded Python block for JSON processing.
2. docs/runbooks/run-queue-health-check.md¶
A new operator runbook that provides:
- When to run the queue health check.
- Step-by-step procedure (automated + manual verification).
- A drift signal matrix mapping each failure mode to its recovery action.
- Mobile quick action guidance.
3. Integration into framework-audit.yml¶
The new script is added as a parallel queue-health job in the consolidated
framework-audit.yml workflow, so it runs:
- On the monthly scheduled audit (first of each month).
- On manual
workflow_dispatch. - On pull requests that touch the queue file, the new script, or the workflow itself.
Alternatives considered¶
-
Expand
check-framework-task-queue.shwith drift checks: rejected because the existing script has a focused single responsibility (schema validation). Mixing structural validation with semantic drift detection would make both harder to maintain and debug. Keeping them separate follows the principle of one job per script. -
GitHub API-based drift checking (check open issues/PRs against queue state): rejected because it requires a GitHub token with Issues read scope, introduces network dependencies, and couples the check to the GitHub API surface. Structural drift signals detectable from the queue file itself are sufficient for most drift patterns and are simpler, faster, and reusable in other repos.
-
Human-only drift detection (no script): rejected because the governance checklist already identifies queue drift as a recurring concern. Without a repeatable automated signal, drift accumulates silently between manual audits.
Consequences¶
Positive:
- Semantic drift in queue state is caught automatically at CI time and during monthly audits, not only during manual walkthrough.
- Four common drift patterns become errors rather than advisory notes.
- Multiple-in_progress is surfaced as a warning without blocking merge.
- The new runbook gives operators a structured walkthrough for cases that require human verification beyond what the script checks.
- The check is lightweight and reusable: any repo adopting this queue model can copy the script without additional dependencies.
Negative:
- Authors making queue state changes must ensure the file remains health-check-clean, not just schema-valid.
Follow-ups:
- Update
docs/framework-health.mdto reflect the new automated check. - Update
docs/governance-checklist.mdto reference the queue health check. - Update
docs/framework-queued-execution-memory.mdto link to the new runbook and expand drift guidance with the full signal matrix. - Update
AGENTS.mdvalidation commands to includecheck-queue-health.sh.