Ver: 1.1
2/3/26
Trent Carter
Product: Verdict PAS (Code Lane) Status: Draft (peer review ready) Feature name (user-facing): Guarded Consensus Mode Internal name (suggested): guarded_consensus (policy) / validation_loop (engine)⸻
1) SummaryGuarded Consensus Mode adds an optional, feature-gated workflow to the Verdict PAS Code Lane that enforces a deterministic Build → Validate → Retry loop and introduces two persistent workflow artifacts—spec.md (immutable input) and plan.md (the task-state ledger)—to prevent drift and “false completion.”
When disabled, the system behaves exactly as it does today, with near-zero blast radius (no new required fields, no extra tool calls, no artifact creation).
⸻
2) ProblemVerdict’s current multi-agent coding flow can produce “done” work that is not objectively correct:
• Code changes that _look_ correct but fail tests/lint/build/typecheck
• Long-running chats that drift from the original intent
• Implicit state living in chat history instead of an explicit, parseable workflow ledger
This causes reliability issues and makes it hard to reproduce, audit, or resume work.
⸻
3) Goals⸻
4) Non-goals• No new permanent “Validator service” tier; reuse existing PAS roles and orchestration.
• No global DAG scheduler across all lanes (v1 is Code Lane only).
• No requirement to introduce a new sandbox platform in v1 (use existing safe execution path).
• No “auto-write perfect tests” guarantee (optional later).
⸻
5) Target users• Power users who want reliable patches and reproducible runs.
• Teams who need auditability (what was intended, what was planned, what was validated).
• Developers who want a mode that prevents “it compiled in the agent’s imagination.”
⸻
6) Key concept: “Plan with Team” inside PASGuarded Consensus Mode borrows the “Plan with Team” concept (as seen in Claude-style multi-agent workflows) but maps it to PAS’s existing hierarchy:
• Architect/Manager creates the plan (plan.md)
• Programmer executes tasks (build)
• Manager enforces gates (validate + retry + completion authority)
Crucially: completion is not based on agent narrative; it is based on validation evidence.
⸻
7) Feature flagging and controls 7.1 Source of truth: RunPackGuarded Consensus Mode is enabled/disabled per-run via RunPack configuration.
Required RunPack fields (names can be finalized in peer review):
• features.guarded_consensus.enabled (bool)
• features.guarded_consensus.level (speed | balanced | strict)
• features.guarded_consensus.max_retries (int)
• features.guarded_consensus.fail_open (bool; default false)
• features.guarded_consensus.gates (list of gate definitions)
7.2 Optional test UX: Chat checkboxA chat UI checkbox may exist for testing/demos:
• It must apply only as a run override.
• It must display the resolved state (RunPack vs override).
• It must not create global configuration drift.
7.3 “Off means off”When enabled=false:
• No spec/plan artifacts are created.
• No new orchestration states are emitted.
• No additional validation tool calls occur.
⸻
8) Workflow artifactsGuarded Consensus Mode introduces a documented, mandatory artifact pair (plus an optional third):
8.1 Canonical storage locationTo avoid polluting repos and to minimize conflicts, canonical paths are:
• .verdict/spec.md
• .verdict/plan.md
• .verdict/prime.md (optional)
(Compatibility aliases like repo-root spec.md/plan.md may exist only if explicitly enabled later; v1 does not require it.)
8.2 Artifact definitions Artifact Role Authority Mutability Purposespec.md Input Orchestrator creates snapshot Immutable for the run “Source of truth” requirements
plan.md Output + State Orchestrator writes state Mutable (checkbox progress only) Shared ledger of tasks & completion
prime.md (optional) Analysis Orchestrator creates Mutable Initial repo scan / risk notes
8.3 plan.md formatting requirementplan.md must be formatted as a strict Markdown task list with [ ] / [x] checkboxes so it can be parsed deterministically:
• Unchecked task = next runnable work item
• Checked task = completed item
• The plan must remain stable and machine-readable across retries and partial runs
8.4 Write authority ruleTo avoid concurrency conflicts:
• Only the orchestrator/manager authority may flip [ ] → [x] in plan.md.
• Builders may read plan/spec but do not directly edit plan state.
⸻
9) Validation model 9.1 Gate types (v1)Supported validation gates should be restricted and explicit. v1 supports:
• Shell command gate (run a configured command in repo context)
• File exists gate
• Regex match gate
• (Optional) Syntax check gate if it’s materially different from shell commands in your environments
9.2 Gate execution environment (v1)Gates run in the same controlled execution pathway already used for code operations in Verdict (local runner, service tool, or equivalent). v1 does not require a new sandbox platform, but must enforce:
• timeouts
• output truncation
• resource bounds where available
• command allowlisting/denylisting policy (see Security)
⸻
10) State machineFor each runnable plan task:
⸻
11) Levels: speed, balanced, strict• Speed
• Minimal gating (or gating disabled)
• May allow fail_open=true (record failures, still proceed)
• Intended for prototyping, not reliable delivery
• Balanced
• Fast, high-signal checks (lint + targeted tests)
• Default recommended mode for most users
• Strict
• Strong checks (tests + lint + typecheck/build as configured)
• fail_open=false default; completion requires passing evidence
(If you prefer, map these to your Parametric Equalizer naming so UX stays consistent.)
⸻
12) Observability and receiptsEach task lifecycle must produce an evidence bundle:
• Mode + level + retries configured
• Gates executed (names/types)
• Pass/fail per gate
• Command/pattern used (as configured)
• Exit codes, durations
• Truncated stdout/stderr (with pointers to full logs if stored)
• Retry count
• Final disposition (completed vs validation_failed_max_retries)
This evidence should be visible in the run timeline and available for audit/export.
⸻
13) Security and safety requirementsShell-based validation is inherently risky. v1 must include a defensible stance:
• Command policy: allowlist/denylist (project or system level)
• Timeouts: per gate and per task lifecycle
• Output limits: avoid log flooding and token blowups
• Workspace isolation: ensure commands run only within intended repo scope
• No secret exfiltration: logs must redact known secret patterns where feasible
If a strict allowlist cannot be enforced in v1, then Guarded Consensus Mode must default to disabled unless the user explicitly enables it and configures gates.
⸻
14) Compatibility and performance• When enabled, additional time cost is expected from validations and retries.
• When disabled, no measurable overhead should occur (no extra parsing, no artifact writes, no extra orchestration states).
• plan/spec artifacts must not interfere with normal developer workflows (kept under .verdict/).
⸻
15) Rollout plan Phase 0 (internal)• Balanced level only
• Gate types: shell command + file exists
• Max retries default 2–3
• .verdict/spec.md + .verdict/plan.md created and used
Phase 1• Strict level
• Better evidence bundling and UI surfacing
• Regex match gate
Phase 2• Optional prime.md
• Optional semantic reviewer pass (style/security) as a separate, non-blocking gate type
• Optional hardened sandbox
⸻
16) Acceptance criteria⸻
17) Open questions for peer review⸻
If you want, I’ll format this into your repo’s preferred PRD template style (section ordering, metadata header, naming conventions) while keeping it strictly non-code and review-ready.