Trent Carter
⸻
1. Purpose & ConceptThe Power Law Engine (PLE) is a cross-cutting analytics module that:
• Models fat-tailed risk (time, tokens, cost) from historical runs.
• Adjusts estimates & confidence intervals to reflect avalanche / outlier behavior.
• Monitors cascading failures (“avalanches”) and outlier events.
• Raises early warnings and suggests load shedding / controlled burns.
• Updates lane-level and system-level tail statistics (exponents, thresholds).
• Feeds back into Blueprint, pricing, and PLE-driven dashboards.
⸻
2. Goals / Non-Goals Goals• Capture heavy-tailed behavior in PAS runs (cost, time, retries, cascades).
• Provide tail-aware estimates to Blueprint / PLMS (90–99% CI, not just averages).
• Detect avalanches (large cascades of tasks/actions) in real time.
• Provide lane-level risk scores (tail-risk) consumable by:
• PLMS Blueprint
• PAS Resource Manager
• HMI dashboards
Non-Goals• PLE does not change core PAS orchestration logic directly.
• PLE does not define product pricing by itself (it informs pricing).
• PLE does not perform ML training; it ingests telemetry and outputs analytics.
⸻
3. High-Level Architecture 3.1 Placement in Verdict Systemflowchart LR
subgraph User/HMI
HMI[Verdict HMI\nDashboards & Blueprint UI]
end
subgraph Planning
BP[Blueprint / PLMS\nEstimation & Scoping]
end
subgraph Runtime
PAS[ PAS Root & Agents\n(Dir/Mgr/Prog) ]
RM[Resource Manager]
end
subgraph Analytics
LOGS[(Action / Run Logs)]
PLE[Power Law Engine\n(Analytics Service)]
end
PAS --> LOGS
RM --> LOGS
LOGS --> PLE
PLE --> BP
PLE --> RM
PLE --> HMI
Type: Separate internal service (FastAPI / gRPC style), stateless, reading from logs/warehouse, writing back summaries/flags.⸻
4. Core Concepts & Data 4.1 Event & Avalanche Event = a root-cause–grouped unit of work, typically:• One Blueprint job or PAS project_run.
• Can contain 1..N actions / job cards across agents.
Avalanche = a large event where:• A single triggering failure / mis-estimate leads to:
• Many retries
• Rollbacks / re-plans
• Test re-runs
• Large token/time overrun
Key metrics per event:• event_id
• root_cause_id (if available)
• lane (Code, Docs, Data, QA, etc.)
• num_actions (job cards / tool calls / steps)
• tokens_actual, tokens_expected, tokens_delta
• time_actual, time_expected, time_delta
• cost_actual, cost_expected, cost_delta
• num_rollbacks, num_retries
PLE uses distributions of these to detect power-law tails.
⸻
5. Functional Requirements 5.1 Pre-Execution: Blueprint / PLMS Integration Inputs:• Historical event summaries (per lane, per project type).
• Draft Blueprint estimate: (time_est, cost_est, token_est, lane_mix, risk_profile).
Behavior:• For each lane and project type, PLE maintains:
• Approximate tail exponent (α) for cost/time/overrun.
• Thresholds for “normal” vs “extreme” events per metric.
• Given a new Blueprint:
• Adjust cost / time estimates with tail-aware confidence bands (e.g. P50, P90, P99).
• Produce a Tail Risk Score (0–100) for the project.
• Display:
• “Expected” vs “worst-case (P90 / P95)” ranges.
• Lane contributions to tail risk (which lanes dominate the tail).
• Optionally: Recommend rehearsal intensity or sandboxing level for high tail-risk jobs.
5.2 During Execution: PAS Runtime Monitoring Inputs:• Streaming (or batched) action_logs from PAS and Resource Manager:
• Action type, lane, timestamps.
• Tokens, retries, rollbacks, errors.
Behavior:• Group actions into ongoing events (by project_run, run_id, or root_cause_id).
• Monitor for:
• num_actions > k × median for similar jobs.
• tokens_delta, time_delta, num_retries, num_rollbacks exceeding lane-specific thresholds.
• When crossing thresholds:
• Emit “Avalanche Warning” to:
• Resource Manager (for load shedding / pause / replan).
• HMI (visual alert and context).
• Maintain a per-lane load metric (“pile height”):
• queued complexity, active events, historical risk.
• When load exceeds safe band:
• Suggest slowing new intake to that lane.
• Optionally mark lane as “critical” in scheduling.
Outputs:• avalanche_alerts (event_id, level, lane, suggested action).
• lane_load_state (per-lane load index, risk flag).
• run_risk_summary (for current project).
⸻
6. Non-Functional Requirements• Low overhead:
• PLE must not block PAS execution.
• Operates on logs / async streams; outputs are advisory.
• Configurable sensitivity:
• Tail thresholds & alert levels tunable per lane.
• Explainable:
• HMI must show _why_ a job is high risk (e.g. “overrun 5× lane median for similar complexity”).
⸻
7. User Experience (HMI)Two primary UI surfaces:
• Shows:
• “Average vs Tail” bars for cost/time.
• Tail Risk Score badge (e.g., Low / Medium / High).
• Tooltips: short explanations like:
“Historically, 5% of similar Code+QA projects cost ≥4× the median. We’ve widened your budget band accordingly.”
• Per-project view:
• Current event size vs typical.
• Avalanche alerts (timeline).
• Lane view:
• Load gauge (pile height).
• Tail-risk indicator (how often this lane produces big avalanches).
⸻
8. Data Flow (Simplified)flowchart TD
subgraph Logs
A[Action Logs\n(tokens, time, errors,...)]
E[Event Summaries\n(historical)]
end
subgraph PLE
B[PLE Batch\nTail Modeling]
C[PLE Runtime\nAvalanche Monitor]
end
subgraph Consumers
PLMS[PLMS / Blueprint]
RM[Resource Manager]
HMI[HMI UI]
end
A --> C
A --> B
E --> B
B --> PLMS
B --> HMI
C --> RM
C --> HMI
⸻
9. Three Additional Power-Law Features for VerdictThese are extra PLE-driven features that exploit power-law dynamics beyond basic risk modeling:
9.1 Customer Value & “Whale Radar” Observation: Customer revenue / usage typically follows a power law: a few whales contribute most revenue / usage, long tail contributes the rest. Feature:• PLE ingests:
• Per-account usage (tokens, projects, active seats).
• Revenue / plan tier.
• Models the tail of account value distribution.
• Outputs to:
• Growth / Success dashboards:
• Identify whales early.
• Warn when whales are doing high-tail-risk projects (combine PLE project risk × account value).
• Blueprint:
• For high-value customers, suggest more conservative risk bands or extra rehearsal.
⸻
9.2 Blueprint Component / Template “Hit Map” Observation: Use frequency of Blueprint components, templates, and code recipes is heavy-tailed: a small set will power most usage. Feature:• PLE tracks:
• Which Blueprint components are used in which projects.
• Downstream outcomes: overruns, bugs, SLO violations.
• Identifies:
• Power components: the 20% components used in 80% of successful projects.
• Toxic components: components that disproportionately appear in tail failures.
• Uses:
• HMI marks power components with a “Golden Path” badge.
• PLMS preferentially suggests power components in auto-generated plans.
• PAS flags toxic components for refactor / replacement work (controlled burns).
⸻
9.3 Incident Severity & Controlled Burn Planner Observation: Incident severity (downtime, bug impact, hotfix cost) is typically power-law: a few incidents dominate total pain. Feature:• PLE ingests:
• Incident logs (severity, impacted users, time-to-resolve, cost).
• Models:
• Severity distribution tail (α).
• Which lanes, components, and providers dominate catastrophic incidents.
• Outputs to:
• Controlled Burn Planner:
• Suggests specific maintenance / refactor projects with maximal tail-risk reduction per unit effort.
• Blueprint:
• For critical components (high incident tail), suggests extra testing / redundancy budgets.
• HMI:
• “Top 10 sources of catastrophic incidents” list.
⸻
10. Open Questions & Next Steps• Do we define event_id strictly as project_run, or allow cross-project root causes (e.g., shared infra outage)?
• Exact method (e.g., Hill estimator + goodness-of-fit checks) can be decided later; PLE PRD only requires:
• Ability to identify heavy-tailed vs non-heavy-tailed metrics.
• Approximate exponents & thresholds.
• Which alerts are auto-surfaced to the user vs only used internally by Resource Manager?
⸻
Yeah, 100% this belongs in Tron-land too. Here’s a clean section you can just bolt onto the bottom of the PRD.
⸻
11. Tron Integration (System-of-Systems Layer) Objective: Expose PLE’s tail-awareness and avalanche signals to Tron, so Tron can make global, cross-project decisions (intake, routing, safety modes, maintenance scheduling). 11.1 Tron’s Role w.r.t. PLEAt the Tron level, PLE is treated as a global risk oracle:
• Provides lane- and project-level tail risk.
• Surfaces avalanche patterns and fuel load (tech-debt / incident tail).
• Informs policy switches:
• When to accept/reject / defer incoming work.
• How aggressively to run rehearsal / sandboxing.
• When to schedule controlled burns.
High-level:
flowchart LR
subgraph Tron
T[Tron\nGlobal Policy & Governance]
end
subgraph Estimation & Planning
BP[Blueprint / PLMS]
end
subgraph Runtime
PAS[ PAS Root & Agents ]
RM[Resource Manager]
end
subgraph Analytics
PLE[Power Law Engine]
end
PAS --> PLE
RM --> PLE
BP --> PLE
PLE --> BP
PLE --> RM
PLE --> T
T --> BP
T --> RM
T --> PAS
11.2 Tron–PLE Interaction Points 1. Intake & Project Gating• Input to Tron:
• Project-level Tail Risk Score from PLE (Low/Med/High).
• P50 / P90 / P99 cost & time bands.
• Tron policy:
• For High tail risk + high-value customer:
• Require rehearsal mode and/or manual approval.
• For High tail risk + low-value / trial user:
• Throttle size/scope or defer.
• For Low tail risk:
• Auto-approve, minimal guardrails.
2. Global Lane Pressure & Intake Throttling• Input to Tron:
• Per-lane load index (“pile height”) and recent avalanche frequency.
• Tron policy:
• If a lane’s load/risk exceeds threshold:
• Reduce intake from Blueprint into that lane.
• Prefer routing new work to alternative lanes / templates when possible.
• If a lane is consistently low risk & underutilized:
• Allow Tron to steer more experimental / high-variance work there.
3. Controlled Burn Scheduling• Input to Tron:
• PLE’s incident tail analysis and fuel load metrics:
• Components / repos / lanes driving the largest catastrophic incidents.
• Tron policy:
• Create maintenance / refactor epics and feed them into Blueprint.
• Reserve a fixed fraction of global capacity (e.g., 10–20%) for Tron-initiated controlled burns targeted at:
• Top incident sources.
• Lanes with worst tail exponents.
11.3 Configuration Ownership• PLE implementation & metrics: owned by PLE/Analytics.
• PLE thresholds, policies, and “what to do when…”:
• Owned by Tron config:
• Tail-risk level → required rehearsal level.
• Lane load index → intake multiplier (e.g., 1.0, 0.5, 0.0).
• Incident tail severity → maintenance budget rules.
Tron remains the source of truth for:
• Global safety posture.
• How aggressive or conservative Verdict should be, given current tail risk and resource posture.
11.4 Open Questions (Tron-specific)• Does Tron treat PLE as a hard gate (can block work), or as a strong advisory (humans can override)?
• Where do we surface Tron+PLE decisions in HMI so users understand:
• “This project was throttled because global tail-risk is elevated.”
• “This maintenance epic was auto-created by Tron based on PLE incident data.”
⸻
Yeah, that’s exactly the right place to nail it down: Tron owns what happens, PLE just screams into the void. Let’s bolt on a new subsection you can paste into the PRD under the Tron section.⸻
11.5 Alert vs Action Policies & Tron UI 11.5.1 Alert vs Action: Two Independent Settings Tron treats PLE signals with two separate policy axes, both user-configurable:Dropdown in Settings → “Tron / PLE Notifications”:
• Silent:
• No popups.
• Status bar only (icon + color).
• Tron View still records everything.
• Normal (default):
• Status bar indicator.
• Inline toast for High-severity events (e.g., PLE avalanche, lane overload).
• Tron View error list entries.
• Verbose:
• All of the above plus extra warnings for Medium-level anomalies.
• Suitable for debugging / power users.
Separate dropdown: “Tron Actions on PLE Events”:
• Off
• Tron never blocks or throttles based on PLE.
• Only logs & visualizes (pure advisory).
• Advisory Only
• Tron recommends actions (slow lane, rehearse, split project) but does not enforce.
• UI shows “Recommended by Tron” badges; user may apply or ignore.
• Soft Gate
• For high tail-risk projects or active avalanches:
• Tron pauses execution at decision points and prompts the user:
“Tron recommends pausing intake to Code lane. Proceed anyway?”
• User can override on a per-case basis.
• Hard Gate
• Tron blocks risky operations unless explicitly overridden with a strong action (e.g., “Force Run Anyway”).
• Intended for high-stakes / enterprise / compliance-heavy environments.
Design note: Both settings are per-user (for UX noise) but can also be workspace-locked by policy in enterprise mode.⸻
11.5.2 Status Bar Indicators During project execution (in Verdict IDE / HMI main window):• A dedicated Tron/PLE status item appears in the status bar with:
• Icon state:
• ✅ Green: No active Tron warnings.
• 🟡 Yellow: Warnings (medium tail-risk or lane load).
• 🔴 Red: Active Tron errors / hard gates.
• Badge count: number of active Tron events affecting current project.
Behavior:
• Clicking the Tron/PLE status item opens the Tron View, focused on:
• The current project_run.
• The most recent warning/error if any.
⸻
11.5.3 Tron View (Global PLE / Tron Console) The Tron View is a dedicated panel / window that complements Sequence and Tree views:• Entry points:
• Status bar Tron icon.
• Clicking any “Tron error” or “Tron warning” link from:
• Run logs.
• Notifications.
• Blueprint / PLMS risk panel.
• Layout (conceptual):
flowchart TD
A[Tron Error Toast / Status Bar] --> B[Tron View\n(Global Control Panel)]
B --> C[Project Tab\n(Current run-focused)]
B --> D[Lanes Tab\nLane load & tail risk]
B --> E[Incidents Tab\nHistorical PLE/Tron events]
Project Tab (default when opened during a run):• Summary strip:
• Tail Risk Score for the project.
• Current Action Level (Off / Advisory / Soft / Hard).
• Current Alert Level (Silent / Normal / Verbose).
• Event timeline:
• PLE signals (avalanche warnings, lane overloads, incident risk spikes).
• Tron decisions (throttled intake, enforced rehearsal, blocked operation).
• Details panel:
• Clicking an event shows:
• Raw PLE metrics (event size, Δtokens/time, lane load).
• Tron’s decision (what it did / suggested).
• If applicable, user override logged (“Force run at 12:32 by Trent”).
Lanes Tab:• Table / dashboard:
• Each lane with:
• Load index (“pile height”).
• Tail-risk rating (Low / Med / High).
• Recent avalanche count.
• Buttons or links to:
• Open lane configuration.
• Schedule maintenance (controlled burns) if PLE suggests.
Incidents Tab:• Historical list of Tron+PLE events:
• Sortable by severity, date, lane, project, customer.
• Clicking any row jumps to the detailed incident view (same as Project Tab detail panel, but historical).
⸻
11.5.4 Tron Error Click-Through Behavior Whenever Tron generates an error or hard gate:• UI shows a clickable message, e.g.:
“Tron: Hard gate activated for Project — PLE detected extreme tail-risk in Code lane. [View Details]”
• Clicking “View Details”:
• Opens Tron View → Project Tab, focused on:
• That project.
• That specific Tron/PLE event highlighted.
• User can:
• See full context: metrics, distributions, why it triggered.
• Change Action Level (e.g., temporarily drop from Hard Gate → Soft Gate).
• Choose to override (where allowed by policy).
⸻