PP Artifact Stack - Spec System

What documents exist, what each is for, and what loads as agent context. Each layer has a "View example" link that scrolls to a concrete excerpt below. Future iteration will add direct "open full file" links per excerpt.

Constitution (CLAUDE.md)
Always loaded
Tech stack, conventions, security & compliance posture, agent ownership rules, "done" criteria, prior-bug registry pointer. The agent-context anchor every other artifact rides on top of.
Target: under 500 lines · Rarely changes
View example ↓
App Spec
Loaded per app
Anchor doc per app. Purpose, scope (with major feature sections inside), architecture, data model, cross-cutting AC, non-goals, glossary. Tickets reference it. Constitution `@include`s it.
Target: 2-3 pages · Stable across sprints · Major features live as sections inside, not as separate apps
View example ↓
Ticket Spec
Loaded per task
Per-task agent payload. Type: Feature, Bug, Refactor, or Spike. In-scope file list (load-bearing for scope discipline), prior fixes to preserve, definition of done, PM-annotated risks. Every ticket carries an outcome metric - delivery without outcome verification is incomplete.
Target: 1 page · One ticket = one unit of agent work
View feature example ↓ View bug example ↓
Acceptance Criteria
Embedded in ticket
Behavioral (Gherkin GIVEN / WHEN / THEN) and static. Concrete values only. "phone 678-555-0100" not "valid phone." The single rule that fails the ticket at Gate 1 review.
Maps 1:1 to test cases · Drives test design, not the other way around
View example ↓
Tests
Executable proof
TDD-first: tests written from AC before the build agent starts, not from code after. One named test per AC. For Bug tickets, a named regression_<short> that fails before the fix and passes after is required. The test agent runs the suite plus continuous unhappy-path attacks via cheap models (Ben's framing from the May 7 standup). QA validates AC coverage was sufficient at Gate 2.
Tests written FROM AC BEFORE build, not from code AFTER build (TDD-first)
View example ↓

What loads when

Every agent session
Constitution
Working on an app
Constitution + App Spec
Working a feature ticket
Constitution + App Spec + Ticket + AC
Working a bug ticket
Constitution + App Spec + Ticket + AC + Prior-bug registry
Running tests
AC + Prior-bug registry
Concrete-values rule Every AC uses real example values (phone 678-555-0100, not <phone>). Abstract AC fail Gate 1 regardless of how complete the rest of the spec is. This is the single most load-bearing discipline in the system.
Conflict precedence When tiers disagree, higher tiers win: Constitution > App Spec > Ticket > AC. Constitution rules (security posture, "done" criteria, agent fleet) supersede anything an App Spec or Ticket asserts. App Spec non-goals supersede Ticket scope creep. Tickets resolving against AC means the AC was wrong - update the AC, don't ship the resolution.
Variant When the ticket type is Bug
The same five-layer stack runs, but two additional dependencies kick in - both pointing at the Prior-bug registry (the "named regressions we never break again" list referenced from the Constitution): Net effect: the bug variant turns each fix into permanent immunity for the next agent that touches the same code path.

View bug ticket example ↓   View registry example ↓

Examples

Examples below show both paths through the system: the building path (what becomes a ticket and ships) and the killing path (what gets rejected at Gate 0 before becoming a ticket). Without the killing path, the system only ships - it never learns to say no.

Gate 0 rejection example - the killing path

Killed at Gate 0
Constructed example. Most diagrams only show what gets shipped. This shows what gets killed before becoming a Ticket. The structural fix that prevents "killed forever" from becoming the default is the re-validate trigger at the bottom.
# Feature request: Agent leaderboard

Source:      Mario, 2026-05-22 standup
Status:      REJECTED at Gate 0
Owner:       PM (Aaron)

## Gate 0 evaluation

### What customer problem?
Mario's framing: "agents need more competitive incentive to push harder."

### What user?
Writing agents (Evan, Richie, downline).
# But: no agent has asked for this. Mario is a proxy, not the user.

### What evidence?
- Mario observed Evan saying he likes when his numbers beat the team average. (n=1)
- No data showing leaderboards drive behavior change in commissioned roles.
- No interviews validating that agents currently feel "uncompetitive."
- Existing commission structure already creates measurable competition.

## Verdict: REJECTED

Reasons:
1. No validated problem. We don't have evidence agents are under-motivated.
2. Anecdote-of-one. Mario observing Evan is not a customer signal.
3. Compliance risk. Leaderboards in life-insurance sales can incentivize
   shortcuts on health screen / suitability - landmine in NAIC AI Model Bulletin scope.
4. Opportunity cost. 1-2 weeks of build that could go toward
   Pricing API substrate (v1.5) or PAL Coach latency (P2).

## What happens next

- Defer, don't kill. Logged in ops/gate0-rejections.md.
- Re-validate trigger: revisit if "I'd push harder if I could see ranking"
  surfaces as an unprompted theme in 3+ agent interview transcripts, OR at 6-month
  product review, whichever comes first.
- Discovery output: if re-validated, this rejection note becomes the
  starting evidence base for the next Gate 0 attempt.
A Gate 0 rejection isn't a hard kill - it's a defer with an explicit re-validate trigger. The trigger is what prevents "killed forever" from becoming the default response to good ideas presented without evidence. Without rejected examples documented, every input becomes a build candidate, and the Build Trap reasserts itself.
↑ Back to top

Constitution example

Constitution
Not yet drafted The PP Constitution is the highest-leverage missing artifact in the stack. Below is the structure it should take when drafted. Greenfield: no existing file to excerpt.
# Peach Pilot Engineering Constitution
# Loaded as agent context on every session. Update only when stack, conventions, or "done" criteria change.

## 1. Stack
Frontend:      Next.js (App Router), TypeScript, Tailwind
Backend:       Postgres on GCP + GKE Autopilot
Auth:          Server-side RBAC (5 roles, see App Spec)
SMS / phone:   Twilio (A2P-registered)
Hosting:       GCP, namespace-per-tenant

## 2. Code conventions
- All PII columns encrypted-at-rest via pgcrypto AES-256
- Timestamps stored TIMESTAMPTZ in UTC, displayed in user's local TZ
- All HTTP responses UTF-8
- Audit log is append-only (no UPDATE / DELETE endpoint)
- <...>

## 3. Agent fleet
Implementation agent:   <Hermes agent name TBD>
Test agent:             <Hermes agent name TBD>
Deploy agent:           <Hermes agent name TBD>
Orchestrator:           MCP

## 4. Definition of Done
# Delivery checks (output-level)
- All AC use concrete values, no placeholders
- All AC tests pass
- All "Prior fixes to preserve" tests still pass
- No files outside the "In scope" list were modified
- Human reviewer signs off (Ben for prod merges)
- (Bug fix) Regression test added per Test plan
# Outcome checks (post-release loop)
- Outcome metric defined & telemetry instrumented (per ticket §8)
- Outcome reviewed at T+14d post-merge by PM
- Decision logged: did the metric move? If not, ticket reopens for re-discovery

## 5. Standing references
App Specs:            @include product/right-quote-app-spec.md
Glossary:             @include product/glossary.md
Prior-bug registry:   @include ops/prior-bug-registry.md
Severity model:       @include ops/severity-model.md
Target: under 500 lines. Lives at repo root. The @include directives load downstream artifacts so agents have full context without re-establishing it per session.
↑ Back to top

App Spec example - Right Quote

App Spec
Excerpt from product/right-quote-app-spec.md. Right Quote is a single app; admin features live as a section inside its scope, not as a separate App Spec.
# Right Quote v1 - App Spec

Status: Draft · Owner (PM): Aaron · Tech lead: Ben · Version: 1.0 (MVP)

## 1. Purpose
Right Quote replaces SafeLife agents' swivel-chair quoting workflow with a single
browser app: login, lead intake, demographics, health screen, quote results,
secure SMS handoff for SSN + banking, application handoff.

## 2. Scope (v1 MVP, ship target 2026-05-20)
In scope:
  - Right Quote agent flow (6 screens)
  - Admin features (carriers, products + comp grids, agents + state auth, health logic)
    # NOT a separate app - admin sits at /admin/* in the same Next.js codebase, role-gated
  - Audit log of all admin writes
  - RBAC: 5 roles (super admin / admin / agent / agent+downline / PP staff)
  - Twilio SMS for SSN + banking secure entry

Out of scope (v1):
  - Real-time pricing API integration (substrate decision pending - v1.5)
  - Zendesk inbound webhook (v2)
  - PAL Coach overlay (v2, separate product)
  - <...>

## 5. Cross-cutting acceptance criteria
AC-X1 (Auth): GIVEN an unauthenticated HTTP request WHEN it hits any route under
  /app/* or /api/* THEN the server returns HTTP 401 with body
  {"error": "unauthenticated"} and browser requests get redirected to /login.

AC-X2 (RBAC): GIVEN a user with role agent and email richie@safelife.com
  WHEN they request GET /admin/carriers THEN the server returns HTTP 403 with
  body {"error": "forbidden"} and the admin nav does not render in response HTML.

<sections 3, 4, 6, 7, 8 omitted for excerpt>
The full file has 8 sections including data model, non-goals, open questions, and glossary. Admin features sit inside Section 2 (Scope) and Section 5 (cross-cutting AC), not as a sibling spec.
↑ Back to top

Ticket Spec example - Feature ticket

Ticket / Feature
# Health screen - branching follow-ups for diabetes

Type: Feature
Parent App Spec: right-quote-app-spec.md
Status: Spec Approved
Implementation agent: <Hermes implementation agent>

## 1. What this feature does
On the health screen, when the agent flags Diabetes as Yes, surface two
follow-up questions: insulin use (Yes/No) and last A1C (numeric input).
Submit blocks until both are answered.

## 2. Acceptance criteria
AC1: GIVEN agent richie@safelife.com on the Health screen for quote 789
  WHEN they select "Diabetes: Yes"
  THEN two new fields appear: "Taking insulin? (Yes/No)" and
  "Last A1C value (number, 4.0-15.0)".

AC2: GIVEN AC1 state and the two follow-ups are unanswered
  WHEN agent clicks "Continue to Quote"
  THEN button is disabled and tooltip reads
  "Answer all diabetes follow-ups before continuing."

AC3: GIVEN the agent submits insulin=Yes, A1C=8.2
  WHEN the screen submits
  THEN quote_health_responses row is written with
  {condition_id: 'diabetes', response: 'yes', follow_ups: {insulin: 'yes', a1c: 8.2}}.

## 3. In scope (files)
- src/components/HealthScreen/DiabetesFollowups.tsx (new)
- src/api/quotes/health/save.ts (modify - extend payload schema)
- prisma/schema.prisma (modify - add follow_ups JSONB column to quote_health_responses)

## 4. Out of scope
- Do not modify other condition follow-up flows (heart, cancer, etc.)
- Do not change health-elimination rules - separate ticket

## 5. Test plan
| AC | Test name | What it asserts |
|----|-----------|-----------------|
| AC1 | test_diabetes_yes_surfaces_followups | Two follow-up fields render |
| AC2 | test_continue_blocked_until_followups | Disabled state + tooltip text |
| AC3 | test_health_response_saves_with_followups | DB row matches expected JSON |

## 6. Prior fixes to preserve
<none - greenfield area>

## 7. Definition of done
[x] All AC use concrete values, no placeholders
[ ] All AC tests pass
[ ] No files outside "In scope" modified
[ ] Ben PR-reviews and signs off
[ ] Outcome metric instrumented (see §8)
[ ] Outcome reviewed T+14d, decision logged

## 8. Outcome metric
Metric:      Diabetic-flag quote completion rate
Baseline:    18% complete (last 30 days, current health screen, n=42 diabetic flags)
Target:      ≥ 35% complete within 14 days post-launch
Window:      14 days post-merge
Telemetry:   quote_health_responses joined to quotes (status='quoted')
Owner:       PM (Aaron) reviews T+14, decides if follow-ups helped or added friction
# If metric moves < +5pts: ticket reopens, re-discover whether follow-ups are the right intervention.
Concrete values throughout: real email, real quote ID, real condition strings, real A1C bounds. Translates 1:1 to test cases. Section 8 (outcome metric) anchors the swimlane's "PM owns outcome metric" reframe at the ticket level - without it, outcome ownership is aspirational only.
↑ Back to top

Ticket Spec example - Bug ticket

Ticket / Bug
Constructed example No real bugs filed yet (greenfield). Below shows the bug-ticket structure with the bug-specific fields populated.
# SMS handoff link expires too quickly during long calls

Type: Bug
Severity: P2 # per severity model: customer-impacting but workaround exists
Parent App Spec: right-quote-app-spec.md
Reported by: Evan (writing agent), 2026-05-22
Status: Spec Approved

## 1. Bug description
The Twilio SMS link sent to customers for SSN + banking entry expires after
10 minutes. On long calls (~25 min average), the link expires before the
customer reaches the banking step.

## 2. Reproduction steps
1. Agent starts quote for customer phone 678-555-0100 at 09:00:00
2. Agent clicks "Send banking link" at 09:01:00 (link generated)
3. Customer ignores SMS for 12 minutes (typical mid-call delay)
4. Customer taps SMS link at 09:13:00
5. Observed: "Session expired" error page
6. Expected: Banking entry form loads

## 3. Acceptance criteria
AC1: GIVEN sms_handoff_session created at 09:00:00 with default expiry
  WHEN customer taps the link at 09:25:00
  THEN the banking entry form loads (HTTP 200), no "Session expired" error.
  # Implies expiry extension to at least 30 minutes.

AC2: GIVEN sms_handoff_session created at 09:00:00
  WHEN customer taps the link at 09:31:00
  THEN they see "Session expired" with a "Request new link" button
  that posts to /api/quotes/789/banking/resend.

## 4. In scope (files)
- src/api/quotes/banking/handoff.ts (modify - increase expires_at to 30 min)
- src/components/Banking/SessionExpired.tsx (new - resend button)

## 5. Test plan
| AC | Test name | What it asserts |
|----|-----------|-----------------|
| AC1 | regression_sms_link_30min_validity | Link valid 25 min after issue |
| AC2 | test_expired_session_resend_button | Resend button posts to correct endpoint |

## 6. Prior fixes to preserve
<none in this area yet>

## 7. Registry update (post-fix)
On merge, append to ops/prior-bug-registry.md:
  - regression_sms_link_30min_validity: SMS handoff link must remain valid
    for at least 30 min from issue. (Bug #BUG-014, fixed 2026-05-23)

## 8. Outcome metric
Metric:      SMS-handoff abandonment rate (sessions created but not completed)
Baseline:    23% within 14 days (Zendesk telemetry, last 30 days pre-fix)
Target:      drops to < 15% within 14 days post-fix
Window:      14 days post-merge
Telemetry:   sms_handoff_sessions table, status='abandoned' / total
Owner:       PM (Aaron) reviews T+14, validates that link expiry was the actual driver
# If abandonment doesn't move: bug fix landed but bug wasn't the real driver. Re-discover.
Three things make this a Bug ticket: (1) named regression_* test in the test plan, (2) explicit Section 7 listing the registry update that follows the fix, (3) Section 8 outcome metric proving the fix actually moved the customer-visible problem. Without §8, the regression test passes and the registry grows, but you can't prove the bug was the real driver.
↑ Back to top

Acceptance Criteria example - concrete vs abstract

AC
The single rule: every AC uses real example values. Below shows the same intent written abstractly (fails Gate 1) and concretely (passes).

× Abstract (fails Gate 1)

AC1: GIVEN an authenticated user
  WHEN they submit a valid quote
  THEN a quote record is created with
  the appropriate status.

AC2: GIVEN an unauthorized user
  WHEN they hit a protected route
  THEN the system returns an error.

AC3: GIVEN the agent sends an SMS link
  WHEN it fails
  THEN the agent sees a fallback option.

✓ Concrete (passes Gate 1)

AC1: GIVEN agent richie@safelife.com
  WHEN they submit demographics
  {dob: "1962-03-15", zip: "30303", gender: "F"}
  THEN quotes row written with
  status="in_progress", agent_id="richie@safelife.com".

AC2: GIVEN role="agent"
  WHEN GET /admin/carriers
  THEN HTTP 403, body
  {"error": "forbidden"}.

AC3: GIVEN Twilio returns HTTP 500
  WHEN agent clicks "Send banking link"
  THEN UI shows "SMS could not be sent. Copy this link manually:"
  with copyable URL https://app.peachpilot.ai/banking/789.
Why concrete: agents fill blanks in abstract AC with assumptions that diverge from intent. "Valid quote" becomes whatever the agent decides is valid. "Concrete" passes because every value can be asserted in a test.

Note on error paths: AC3 above shows one error code (Twilio HTTP 500). The concrete-values rule applies per AC, not per API call. A complete error-path suite would have separate ACs for HTTP 429 (rate limit), HTTP 503 (degraded service), network timeout, and A2P-revocation - each concrete, each with its own observable result. "One concrete example" is not the same as "concrete for the whole error surface."
↑ Back to top

Tests example - feature test + regression test

Tests
Feature tests are named after the AC they verify. Regression tests use the regression_<short> prefix and are mandatory for Bug tickets.
# Feature test (Feature ticket: diabetes follow-ups)
def test_diabetes_yes_surfaces_followups(authenticated_agent):
    quote = create_quote(agent=authenticated_agent, customer_phone="678-555-0100")
    response = client.post(f"/api/quotes/{quote.id}/health", json={
        "diabetes": "yes"
    })
    assert response.status_code == 200
    page = client.get(f"/app/quote/{quote.id}/health")
    assert "Taking insulin?" in page.text
    assert "Last A1C value" in page.text

# Regression test (Bug ticket: SMS link expiry)
def regression_sms_link_30min_validity(twilio_mock):
    # This test FAILED before the fix. It MUST pass after.
    session = create_sms_handoff(quote_id=789, issued_at="2026-05-23T09:00:00Z")
    with freeze_time("2026-05-23T09:25:00Z"):
        response = client.get(f"/banking/{session.id}")
    assert response.status_code == 200
    assert "Session expired" not in response.text
Test names are stable identifiers. The Prior-bug registry references regression test names as the load-bearing link from registry entry to executable proof.
↑ Back to top

Prior-bug registry example

Registry
Schema not yet defined The registry is referenced from the Constitution and from every Bug ticket's "prior fixes to preserve" field, but the file format and storage location are not yet decided. Below shows the structure proposed.
# PP Prior-Bug Registry
# Source of truth for "named regressions we never break again."
# Read by: every Bug ticket's "prior fixes to preserve" field, populated by area.
# Written by: agent (or PR author) when a Bug ticket merges.

## Right Quote / SMS handoff

- regression_sms_link_30min_validity
  Bug:      #BUG-014
  Fixed:    2026-05-23
  Symptom:  SMS handoff link expired before customers tapped it on long calls.
  Files:    src/api/quotes/banking/handoff.ts
  Test:     tests/banking/test_handoff.py::regression_sms_link_30min_validity
  Note:     Link expiry is 30 min from issue. Do not lower without product sign-off.

## Right Quote / RBAC

- regression_admin_route_returns_403_for_agent_role
  Bug:      #BUG-007
  Fixed:    2026-05-15
  Symptom:  /admin/carriers returned HTML page (with admin nav hidden) for agent role.
  Files:    src/middleware/rbac.ts
  Test:     tests/rbac/test_admin_routes.py::regression_admin_route_returns_403_for_agent_role
  Note:     Server-side 403 is the load-bearing check. Hiding nav is not authorization.
Format proposed: grouped by app + functional area, each entry has bug ID, fix date, symptom, files touched, test name, and a "do not undo" note. Final format pending decision - this is a draft schema for discussion.
↑ Back to top

Sources: product/right-quote-app-spec.md, product/template-app-spec.md, product/template-ticket-spec.md, May 6 impromptu Ben+Mario architecture chat (re: regression coverage as the v1-platform failure mode).
Greenfield items in this diagram (Constitution, Prior-bug registry schema, severity model): not yet drafted as of 2026-05-07. Highest-leverage missing artifact is the Constitution - drafting it unblocks per-screen Right Quote PRDs.
Future iteration: each example will get an "Open full file →" link in addition to the inline excerpt.