Files
dev-puppeteer/docs/plan.md
2026-05-13 08:39:19 +09:00

54 KiB

Devflow Implementation Plan v3 r12

0. Document Status

  • This document supersedes v2 and all earlier v3 drafts where conflicting.
  • Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
  • Target OS: macOS 13+ / Linux. No Windows.
  • All paths are Unix-style. All times are stored UTC.
  • Decisions in this document are locked unless explicitly marked (provisional). Override requires updating this document, not only code.
  • r1 applied CC-1 through CC-5.
  • r2 applied CC-6 through CC-10.
  • r3 applied CC-11 through CC-15.
  • r4 applies CC-16 through CC-18.
  • r5 applies CC-19.
  • r6 applies CC-20.
  • r7 applies CC-21 through CC-23.
  • r8 applies CC-24 through CC-26.
  • r9 applies CC-27 through CC-28.
  • r10 applies CC-29 through CC-31.
  • r11 applies CC-32.
  • r12 applies CC-33 through CC-35.

1. Stack Decisions

1.1 Workspace

  • pnpm 9 with workspaces. No Turbo.
  • Node 22 LTS, pinned by .nvmrc and package.json#engines.
  • TypeScript 5.6 with project references via tsc -b.
  • strict: true.
  • No any unless accompanied by an explicit annotation comment explaining why.

1.2 Tooling

  • Build:
    • tsup for libraries, CJS + ESM dual output.
    • vite for apps/web.
    • tsx for apps/cli, apps/api, and apps/worker in dev.
    • node for prod-ish local runs.
  • Test:
    • vitest with workspace config.
    • Coverage via @vitest/coverage-v8.
    • No coverage gate at M1.
    • M9 adds coverage gate: >=70% lines on packages/core, packages/session, packages/run-engine.
  • Lint/format:
    • biome.
    • One root config.
  • Pre-commit:
    • lefthook.
    • Runs biome check --write on staged files.
    • Runs tsc -p tsconfig.typecheck.json --noEmit.
    • Runs related Vitest tests on changed packages.

1.3 Database

  • Postgres 16 via Docker Compose.
  • Drizzle ORM + drizzle-kit generate.
  • Generated SQL migrations are committed.
  • Migrations are never auto-applied at runtime except through the explicit migration runner invoked by devflow up.
  • Migration runner:
    • scripts/migrate.ts.
    • Takes DATABASE_URL.
    • devflow up waits for Postgres health and then runs pending migrations.

1.4 Logging

  • pino.
  • pino-pretty in dev, JSON otherwise.
  • Standard fields:
    • time
    • level
    • module
    • runId?
    • phaseId?
    • role?
    • eventId?
  • Levels:
    • trace: transcript chunks only.
    • debug: internal state transitions.
    • info: run events.
    • warn: recoverable errors.
    • error: human-required or fatal errors.

1.5 Config

  • Single Zod schema in packages/core/src/config.ts.
  • Source precedence, high to low:
    • process.env
    • .env.local
    • .env
    • schema defaults
  • Config is loaded once at process start, validated, frozen, and exported as typed Config.
  • Config validation failure is fatal.
  • Required keys at M1:
    • DATABASE_URL
    • WORKSPACE_ROOT
    • LOG_LEVEL
  • M5 adds:
    • TEMPORAL_ADDRESS
  • Path canonicalization:
    • WORKSPACE_ROOT is resolved through fs.realpathSync and stored as an absolute path at config load.
    • Any path entering the system must be canonicalized before storage or hashing.
    • repo_path and worktree_root rules are defined in section 4.

Backend registration:

const BackendConfig = z.object({
  id: Backend,                       // codex | claude | fake
  enabled: z.boolean(),
  binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude
});
  • fake is always available.
  • codex and claude are available only when:
    • enabled=true
    • binary resolves at process start.
  • Resolution failure:
    • doctor warns.
    • binding fails fast at run start with human_required:backend_unavailable.
  • Binding reads from config.backends, never directly from PATH.

1.6 HTTP

  • fastify 5.
  • @fastify/sensible.
  • SSE primary strategy:
    • Try fastify-sse-v2.
    • Fastify 5 compatibility is not assumed.
    • M1 includes a smoke test.
  • SSE fallback:
    • Native reply.raw.
    • Headers:
      • content-type: text/event-stream
      • cache-control: no-cache
      • connection: keep-alive
    • Write data: <json>\n\n.
    • Manage heartbeats and reconnect manually.
  • WebSocket is deferred unless SSE fails under transcript volume.

2. Directory Layout

devflow/
├── package.json
├── pnpm-workspace.yaml
├── tsconfig.base.json
├── biome.json
├── lefthook.yml
├── vitest.workspace.ts
├── docker-compose.yml
├── .nvmrc
├── .env.example
├── docs/
│   ├── plan.md
│   ├── adr/
│   └── schemas/
│       ├── artifacts/
│       ├── personas/
│       └── templates/
├── scripts/
│   ├── migrate.ts
│   └── seed.ts
├── packages/
│   ├── core/
│   │   └── src/
│   │       ├── config.ts
│   │       ├── enums.ts
│   │       ├── hash.ts
│   │       ├── errors.ts
│   │       ├── template.ts
│   │       ├── persona.ts
│   │       ├── binding.ts
│   │       ├── prompt-envelope.ts
│   │       ├── artifact-schema.ts
│   │       ├── run-event.ts
│   │       └── index.ts
│   ├── db/
│   │   └── src/
│   │       ├── schema/
│   │       ├── migrations/
│   │       ├── repositories/
│   │       └── client.ts
│   ├── session/
│   │   └── src/
│   │       ├── adapter.ts
│   │       ├── fake.ts
│   │       ├── tmux.ts
│   │       ├── profiles/
│   │       │   ├── codex.ts
│   │       │   └── claude.ts
│   │       ├── recovery.ts
│   │       └── transcript.ts
│   ├── harness/
│   │   └── src/
│   │       ├── git.ts
│   │       ├── worktree.ts
│   │       ├── runner.ts
│   │       ├── review.ts
│   │       └── backtest.ts
│   ├── run-engine/
│   │   └── src/
│   │       ├── engine.ts
│   │       ├── phase-executor.ts
│   │       └── approval.ts
│   └── workflows/
│       └── src/
│           ├── workflow.ts
│           └── activities.ts
├── apps/
│   ├── api/
│   ├── web/
│   ├── cli/
│   └── worker/
└── tests/
    ├── e2e/
    └── fixtures/

3. devflow doctor

Exit codes:

  • 0: all green.
  • 1: one or more red checks.
  • 2: internal or unknown error.

Each check emits:

  • name
  • status: pass | fail | warn
  • detail
  • remediation

Closed check list:

  1. Node version satisfies >=22.0.0 <23.
  2. pnpm version >=9.0.0.
  3. tmux exists, version >=3.3.
  4. git version >=2.40.
  5. Docker daemon reachable.
  6. Postgres container running, pg_isready ok, DATABASE_URL connects.
  7. No pending Drizzle migrations.
  8. WORKSPACE_ROOT exists and is writable.
  9. .env resolves to valid Config.
  10. codex in PATH, warn-only.
  11. claude in PATH, warn-only.
  12. Free disk on WORKSPACE_ROOT partition:
    • warn under 10GB.
    • fail under 2GB.
    • target green threshold: >=5GB.

Output:

  • Human table by default.
  • --json for machine-readable output.
  • --quiet prints only nonzero results.
  • --list-orphans lists orphaned worktrees only; it never removes them.

4. Database Schema

First migration prelude:

CREATE EXTENSION IF NOT EXISTS pgcrypto;

All tables use gen_random_uuid() primary keys unless noted. All times are timestamptz. Mutable rows include updated_at. JSON columns use jsonb.

4.1 workflow_templates

  • id uuid primary key default gen_random_uuid()
  • name text not null
  • version int not null
  • hash text not null unique
  • definition jsonb not null
  • created_at timestamptz not null default now()
  • unique (name, version)

4.2 agent_personas

  • id uuid primary key default gen_random_uuid()
  • name text not null
  • version int not null
  • hash text not null unique
  • definition jsonb not null
  • created_at timestamptz not null default now()
  • unique (name, version)

4.3 runs

  • id uuid primary key default gen_random_uuid()
  • template_id uuid not null references workflow_templates(id)
  • template_hash text not null
  • state text not null
  • repo_path text not null
    • canonical absolute path
    • resolved through fs.realpathSync before insert
  • base_branch text not null
  • worktree_root text not null
    • canonical absolute path under WORKSPACE_ROOT/<runId>/
  • current_phase_id uuid references run_phases(id) nullable and deferrable
  • started_at timestamptz
  • ended_at timestamptz
  • final_report_path text
  • paused_from_state text
    • set when transitioning to paused
    • cleared on resume
    • null when state is not paused
  • created_at timestamptz not null default now()
  • updated_at timestamptz

Active-run uniqueness:

CREATE UNIQUE INDEX ux_active_run_repo_base
ON runs (repo_path, base_branch)
WHERE state NOT IN ('completed', 'failed', 'aborted');

4.4 run_inputs

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null unique references runs(id) on delete cascade
  • requirements_md text not null
  • objective jsonb
  • extra jsonb
  • input_hash text not null

input_hash is based on:

  • requirements_md
  • objective
  • extra
  • canonical repo_path
  • base_branch

4.5 run_bindings

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • role_id text not null
  • persona_id uuid not null references agent_personas(id)
  • persona_hash text not null
  • backend text not null
  • binding_hash text not null
  • unique (run_id, role_id)

4.6 run_phases

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • phase_key text not null
  • seq int not null
  • state text not null
  • attempts int not null default 0
  • started_at timestamptz
  • ended_at timestamptz
  • unique (run_id, phase_key)

4.7 run_events

Append-only.

  • id bigserial primary key
  • run_id uuid not null references runs(id) on delete cascade
  • phase_id uuid references run_phases(id)
  • seq bigint not null
  • type text not null
  • payload jsonb not null
  • idempotency_key text not null
  • ts timestamptz not null default now()
  • unique (run_id, seq)
  • unique (run_id, idempotency_key)
  • index (run_id, ts)

Concurrency:

  • All inserts go through RunEventRepository.append().
  • Raw SQL inserts into run_events are forbidden.
  • append() takes pg_advisory_xact_lock(hash64('devflow:run-events', run_id)).
  • Inside that same transaction it assigns:
seq := COALESCE(MAX(seq), 0) + 1

4.8 approval_requests

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id)
  • phase_id uuid references run_phases(id)
  • gate_key text not null
  • state text not null
  • idempotency_key text not null
  • payload jsonb not null
  • created_at timestamptz not null default now()
  • resolved_at timestamptz
  • unique (idempotency_key)

4.9 approval_decisions

Append-only and immutable.

  • id uuid primary key default gen_random_uuid()
  • approval_request_id uuid not null references approval_requests(id)
  • action text not null
    • approve
    • reject
    • request_changes
    • abort
  • comment text
  • decided_at timestamptz not null default now()
  • idempotency_key text not null unique

pause is not an approval decision.

4.10 tui_sessions

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • role_id text not null
  • backend text not null
  • cwd text not null
  • expected_artifact_path text
  • expected_schema text
  • last_prompt_hash text
  • last_prompt_at timestamptz
  • last_capture_seq bigint not null default 0
  • last_known_pane_pid int
  • tmux_session text
  • tmux_window text
  • state text not null
  • recovery_attempts int not null default 0
  • unique (run_id, role_id)

4.11 tui_transcript_chunks

Append-only.

  • id bigserial primary key
  • session_id uuid not null references tui_sessions(id) on delete cascade
  • seq bigint not null
  • content text not null
  • captured_at timestamptz not null default now()
  • unique (session_id, seq)

4.12 artifacts

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • phase_id uuid references run_phases(id)
  • path text not null
  • schema_id text not null
  • hash text not null
  • valid boolean not null
  • validation_error jsonb
  • created_at timestamptz not null default now()
  • unique (run_id, path, hash)

4.13 commands

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • phase_id uuid references run_phases(id)
  • kind text not null
    • git
    • test
    • e2e
    • doctor
    • backtest
    • other
  • argv text[] not null
  • cwd text not null
  • exit_code int
  • stdout_path text
  • stderr_path text
  • started_at timestamptz
  • ended_at timestamptz

4.14 review_findings

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • phase_id uuid references run_phases(id)
  • reviewer_role text not null
  • severity text not null
    • info
    • low
    • medium
    • high
    • critical
  • category text not null
    • correctness
    • evidence
    • style
    • security
    • performance
    • other
  • file_path text
  • line int
  • summary text not null
  • evidence text
  • verifier_status text not null default 'unverified'
    • unverified
    • confirmed
    • rejected
  • created_at timestamptz not null default now()

4.15 Backtest Stub Tables

backtest_iterations and backtest_metrics are created at M1 as stub tables:

  • id uuid primary key default gen_random_uuid()
  • run_id uuid not null references runs(id) on delete cascade
  • payload jsonb
  • created_at timestamptz not null default now()

Full schema is deferred to M12.

5. Enums

All enums live in packages/core/src/enums.ts as TypeScript const objects and Zod enums.

5.1 Backend

  • codex
  • claude
  • fake

Future gemini support adds an enum entry and a BackendProfile; no design change.

5.2 Capability

  • spec_write
  • phase_planning
  • task_dag_planning
  • code_edit
  • test_first_development
  • code_review
  • evidence_check
  • command_execute
  • backtest_run
  • metric_extract
  • failure_mining
  • objective_eval
  • final_report_compose

5.3 RiskLevel

  • low
  • medium
  • high

Risk is declared per phase in the template. Persona has maxRiskLevel. Binding fails when phase.risk > persona.maxRiskLevel.

5.4 ApprovalDecisionAction

  • approve
  • reject
  • request_changes
  • abort

pause is a run-level control operation, not an approval decision.

5.5 ApprovalState

  • pending
  • approved
  • rejected
  • changes_requested
  • aborted
  • paused

paused is not an auto-decision.

5.6 RunState

  • created
  • bound
  • planning
  • awaiting_approval
  • executing
  • paused
  • completed
  • failed
  • aborted

5.7 RunPhaseState

  • pending
  • running
  • awaiting_artifact
  • validating
  • awaiting_approval
  • completed
  • failed
  • skipped

5.8 SessionState

  • CREATED
  • BOOTSTRAPPING
  • READY
  • BUSY
  • WAITING_FOR_APPROVAL
  • ARTIFACT_TIMEOUT
  • HUNG
  • CRASHED
  • RESUMING
  • REBOOTSTRAPPED
  • FAILED_NEEDS_HUMAN

6. Content-Addressed Hashing

6.1 Canonical JSON

  • Object keys sorted lexicographically by UTF-16 code units.
  • No insignificant whitespace.
  • Strings use standard JSON escaping.
  • No Unicode normalization.
  • Numbers use shortest round-trippable representation.
  • Integers have no decimal point.
  • No leading zeros.
  • Arrays preserve order.
  • No trailing newline.

packages/core/src/hash.ts exports:

canonicalize(value: unknown): string
hash(value: unknown): string

hash() returns sha256hex(canonicalize(value)).

6.2 Hash Subjects

  • Template hash:
    • { name, version, roles, phases, gates, capabilitiesRequired }
  • Persona hash:
    • { name, version, capabilities, backend, maxRiskLevel, allowedRoles, promptConfig, modelConfig }
  • Binding hash:
    • { runId, roleId, templateHash, personaHash, backend, override }
  • Run input hash:
    • { templateHash, bindings: sorted[bindingHash], requirementsMd, objective, repoPath, baseBranch, extra }
  • Prompt hash:
    • { runId, roleId, phaseKey, expectedArtifact, expectedSchema, instructions, attempt }
  • Artifact hash:
    • SHA-256 of file bytes.

Prompt hash uses phaseKey, not phaseId, because PromptEnvelope carries phaseKey.

7. Template, Persona, Binding

7.1 Template Schema

const TemplatePhase = z.object({
  key: z.string(),
  title: z.string(),
  risk: RiskLevel,
  roles: z.array(z.string()),
  expectedArtifact: z
    .object({
      path: z.string(),
      schema: z.string(),
    })
    .optional(),
  gates: z.array(z.string()).default([]),
  timeoutMs: z.number().int().positive().optional(),
});

const TemplateRole = z.object({
  id: z.string(),
  requiredCapabilities: z.array(Capability),
  preferredBackends: z.array(Backend).default([]),
  count: z.number().int().min(1).default(1),
  diversity: z
    .object({
      requireDifferentBackends: z.boolean().default(false),
    })
    .optional(),
});

const Template = z.object({
  name: z.string(),
  version: z.number().int().positive(),
  roles: z.array(TemplateRole),
  phases: z.array(TemplatePhase),
  defaultGates: z.array(z.string()).default([]),
});

7.2 Persona Schema

const Persona = z.object({
  name: z.string(),
  version: z.number().int().positive(),
  backend: Backend,
  capabilities: z.array(Capability),
  maxRiskLevel: RiskLevel,
  allowedRoles: z.array(z.string()).optional(),
  promptConfig: z
    .object({
      systemPrompt: z.string().optional(),
      instructionsPrelude: z.string().optional(),
    })
    .default({}),
  modelConfig: z.record(z.string(), z.unknown()).default({}),
});

7.3 Override Semantics

  • Override may swap persona for a role.
  • Override may constrain backend to a specific allowed backend.
  • Override cannot add capabilities.
  • Override cannot raise risk above persona maxRiskLevel.
  • Diversity rules apply after override.
  • Lock-time validation runs the full binding algorithm.
  • On first binding failure, the run does not start.

7.4 Binding Algorithm

For each role:

  1. Select override persona if present; otherwise run autoSelect.
  2. Assert backend is enabled in config.backends.
  3. Assert non-fake backend binary resolved at process start.
  4. Assert role id is in allowedRoles, unless allowedRoles is absent.
  5. Assert required capabilities are a subset of persona capabilities.
  6. Assert every phase using the role has risk <= persona maxRiskLevel.
  7. Expand roles with count > 1 into roleId#0, roleId#1, etc.
  8. Enforce diversity rules after expansion.
  9. Compute and persist binding_hash per role instance.

autoSelect is deterministic. Sort candidates by:

  1. role preferredBackends order.
  2. persona.version desc.
  3. persona.name asc.
  4. persona.hash asc.

Personas whose backend is not in preferredBackends are eligible only if all preferred-backend personas fail capability or risk checks.

Binding fails with human_required:no_eligible_persona if no persona satisfies requirements.

7.5 Seeding

Personas:

  • docs/schemas/personas/<name>@<version>.yaml
  • filename encodes immutable identity.
  • loader parses with Persona schema.
  • loader computes personaHash.
  • loader upserts keyed by (name, version).
  • hash mismatch on an existing row is fatal.

Templates:

  • docs/schemas/templates/<name>@<version>.yaml
  • same immutable version rule.

Deleting a published file is allowed only when no run references that hash.

8. Session Runtime

8.1 SessionAdapter Interface

export interface SessionAdapter {
  start(input: StartInput): Promise<SessionHandle>;
  sendPrompt(handle: SessionHandle, envelope: PromptEnvelope): Promise<{ promptId: string }>;
  probe(handle: SessionHandle): Promise<ProbeResult>;
  resume(handle: SessionHandle): Promise<SessionHandle>;
  rebootstrap(handle: SessionHandle): Promise<SessionHandle>;
  capture(handle: SessionHandle, fromSeq: bigint): AsyncIterable<TranscriptChunk>;
  dispose(handle: SessionHandle): Promise<void>;
}

export interface StartInput {
  runId: string;
  roleId: string;
  backend: Backend;
  cwd: string;
  expectedArtifactPath?: string;
  expectedSchema?: string;
  envelopePrelude?: string;
}

export interface SessionHandle {
  sessionId: string;
  pid?: number;
  tmuxSession?: string;
  tmuxWindow?: string;
}

export interface ProbeResult {
  alive: boolean;
  paneActive: boolean;
  lastOutputAt?: Date;
  hint?: string;
}

export interface TranscriptChunk {
  seq: bigint;
  content: string;
  capturedAt: Date;
}

8.2 Session State Machine

  • CREATED -> BOOTSTRAPPING -> READY
  • READY <-> BUSY
  • BUSY -> WAITING_FOR_APPROVAL
  • BUSY -> ARTIFACT_TIMEOUT
  • BUSY -> HUNG
  • BUSY -> CRASHED
  • HUNG | CRASHED | ARTIFACT_TIMEOUT -> RESUMING -> READY
  • RESUMING -> REBOOTSTRAPPED -> READY
  • exhausted errors -> FAILED_NEEDS_HUMAN

8.3 Recovery Counters

  • sendPrompt retry: 2.
    • Means one initial send plus two adapter-level retries, three physical send attempts max.
  • resume retry: 2.
  • rebootstrap retry: 1.
  • artifact repair retry: 1.
  • max hung time: configurable; default 20 minutes.

Exhaustion creates a human gate with recoveryHint.

8.4 SessionManager Singleton

  • M4: hosted in apps/api.
  • M5+: hosted in apps/worker.
  • Only SessionManager may call mutating SessionAdapter methods.
  • Holds in-memory Map<sessionId, SessionHandle>.
  • Takes pg_advisory_lock(hash64('devflow:session-manager')).
  • Second instance exits code 3.
  • On start:
    • query non-terminal tui_sessions.
    • call adapter.resume(handle).
    • success: place handle in map.
    • failure: session -> FAILED_NEEDS_HUMAN, append session.failed, create recovery gate.
  • On SIGTERM/SIGINT:
    • refuse new prompts.
    • allow in-flight artifact polling up to 30s.
    • persist last_capture_seq.
    • release advisory lock.

9. Prompt Envelope

9.1 Wire Format

DEVFLOW_PROMPT_BEGIN <uuid>
Run: <run-id>
Role: <role-id>
Phase: <phase-key>
Attempt: <int>
Expected artifact: <absolute-path>
Expected schema: <schema-id>
Dedup-Key: <prompt-hash>
Instructions:
<freeform multi-line instructions>
DEVFLOW_PROMPT_END <uuid>

9.2 Schema

const PromptEnvelope = z.object({
  uuid: z.string().uuid(),
  runId: z.string().uuid(),
  roleId: z.string(),
  phaseKey: z.string(),
  attempt: z.number().int().nonnegative(),
  expectedArtifact: z.string(),
  expectedSchema: z.string(),
  dedupKey: z.string(),
  instructions: z.string(),
});

9.3 Rules

  • Prompt identity is dedupKey.
  • Adapter treats duplicate dedupKey for the same session within a run lifetime as idempotent success and does not reprocess the prompt.
  • attempt increments only when the engine intentionally re-sends after timeout or repair.
  • Adapter-level retry does not increment attempt.
  • Completion is never inferred from transcript text.
  • Completion requires a schema-valid artifact.

9.4 Backend Prelude

Sent once at session bootstrap before the first envelope.

Required structure:

  1. Backend identity statement.
  2. Persona instructionsPrelude.
  3. Protocol declaration: completion is signaled only by writing expected artifact files.
  4. Envelope marker contract.
  5. Approval/probe contract: DEVFLOW_PROBE must respond with one line READY or BUSY <reason>.

Codex and Claude-specific addenda live in packages/session/src/profiles/{codex,claude}.ts and are populated at M10.

10. Artifact Schema Registry

10.1 Layout

JSON Schema 2020-12 documents live at:

docs/schemas/artifacts/<schema_id>.json

schema_id format:

<domain>/<name>@<version>

Examples:

  • dev/spec@1
  • dev/phase-plan@1
  • dev/dag@1
  • dev/review-finding-batch@1
  • bt/objective@1
  • bt/iteration-result@1
  • common/final-report@1

10.2 Loader

packages/core/src/artifact-schema.ts exports:

function loadSchema(id: string): JsonSchema;
function validateArtifact(
  id: string,
  data: unknown
): { ok: true } | { ok: false; errors: ValidationError[] };

Unknown schema id is fatal.

10.3 Validation Flow

  1. Engine waits for expectedArtifactPath to appear.
  2. Debounce 500ms after last mtime change.
  3. Read file.
  4. Compute SHA-256.
  5. Validate against expectedSchema.
  6. Valid:
    • insert artifact row with valid=true.
    • append artifact.validated.
    • advance phase.
  7. Invalid:
    • insert artifact row with valid=false.
    • append artifact.invalid.
    • trigger one repair prompt.
    • after repair exhaustion, create human gate.
  8. Timeout:
    • append artifact.timeout.
    • probe session.
    • enter recovery flow.

10.4 Final Report

At terminal run state, write atomically:

  • <WORKSPACE_ROOT>/<runId>/<runId>.report.md
  • <WORKSPACE_ROOT>/<runId>/<runId>.report.json

Both are written even on failed or aborted, best-effort.

common/final-report@1 minimum fields:

  • runId
  • templateHash
  • bindings[]
  • inputs
  • phases[]
  • approvals[]
  • findings[]
  • commands[]
  • artifacts[]
  • events.tail
  • unresolved[]
  • endedAt
  • status

10.5 Backtest Objective Stub

bt/objective@1:

{
  "targets": [
    { "metric": "sharpe", "op": "gte", "value": 1.5, "weight": 1.0 },
    { "metric": "mdd", "op": "lte", "value": 0.15, "weight": 1.0 }
  ],
  "stopWhen": "all"
}
  • op: gte | lte | eq | gt | lt
  • stopWhen: all | weighted
  • weighted threshold is hardcoded at 0.8 at M12.
  • Full DSL deferred to M12.

11. Run Events

Closed event types:

run.created
run.started
run.paused
run.resumed
run.completed
run.failed
run.aborted
phase.started
phase.completed
phase.failed
phase.skipped
prompt.sent
prompt.repaired
artifact.expected
artifact.validated
artifact.invalid
artifact.timeout
approval.requested
approval.resolved
session.created
session.ready
session.busy
session.idle
session.crashed
session.recovered
session.failed
command.started
command.completed
command.failed
review.batch_recorded
finding.verifier_resolved
backtest.iteration_started
backtest.iteration_completed
backtest.objective_evaluated

11.1 Idempotency Keys

Every event append requires deterministic idempotency_key.

Event family Key formula
run.created, run.started, run.completed, run.failed, run.aborted <type>:<run_id>
run.paused run.paused:<run_id>:<cause>
run.resumed run.resumed:<run_id>:<cause>
phase.started, phase.completed, phase.failed, phase.skipped <type>:<phase_id>:<phase_attempt>
prompt.sent, prompt.repaired <type>:<prompt_dedup_key>
artifact.expected, artifact.timeout <type>:<phase_id>:<phase_attempt>:<expected_path>
artifact.validated, artifact.invalid <type>:<phase_id>:<expected_path>:<artifact_hash>
approval.requested approval.requested:<approval_idempotency_key>
approval.resolved approval.resolved:<approval_request_id>:<action>
session.created, session.failed <type>:<session_id>
session.busy, session.idle <type>:<session_id>:<prompt_dedup_key>
session.ready, session.crashed, session.recovered <type>:<session_id>:<recovery_attempts>
command.started, command.completed, command.failed <type>:<command_id>
review.batch_recorded review.batch_recorded:<phase_id>:<reviewer_role>:<phase_attempt>
finding.verifier_resolved finding.verifier_resolved:<finding_id>
backtest.iteration_started, backtest.iteration_completed, backtest.objective_evaluated <type>:<iteration_id>

Definitions:

  • phase_attempt is incremented before event append.
  • recovery_attempts is incremented before event append.
  • prompt_dedup_key is the envelope dedup key.
  • approval_idempotency_key is from approval_requests.
  • Artifact expected/timeout events are per-attempt.
  • Artifact validated/invalid events are content-keyed by path + hash.

12. Fake Session Adapter

12.1 Behavior

  • Deterministic.
  • In-process.
  • No PTY.
  • No tmux.
  • Drives engine end-to-end without real backends.

12.2 Sentinel Triggers

On sendPrompt, inspect expectedSchema.

Fixture path:

tests/fixtures/fake-artifacts/<expectedSchema>/<scenarioName>.json

scenarioName comes from instruction header:

Scenario: <name>

Default scenario: ok.

Scenarios:

  • ok: write fixture to expectedArtifactPath after 50ms by default.
  • invalid: write deliberately schema-invalid payload.
  • timeout: never write.
  • crash: throw RecoverableError.

12.3 Transcript

Fake adapter emits chunks such as:

[fake] received prompt <uuid>; will write <path> in 50ms

13. State Machines

13.1 Run State

States:

  • created
  • bound
  • planning
  • awaiting_approval
  • executing
  • paused
  • completed
  • failed
  • aborted

Transitions:

From Trigger To Side effects
created lockBindings ok bound persist bindings; emit run.started
created lockBindings fail failed emit run.failed
bound phase plan needed planning emit phase.started
planning plan artifact valid awaiting_approval request approval
awaiting_approval approve executing emit approval.resolved, run.resumed
awaiting_approval reject failed emit run.failed
awaiting_approval request_changes planning increment phase attempts
awaiting_approval timeout paused set paused_from_state='awaiting_approval'
executing phase ok, more phases executing next phase
executing normal workflow approval gate awaiting_approval request gate
executing all phases done completed emit run.completed, write final report
executing unrecoverable error failed emit run.failed
executing manual pauseRun paused set paused_from_state='executing'
planning manual pauseRun paused set paused_from_state='planning'
paused resume paused_from_state emit run.resumed, clear paused_from_state
any non-terminal state abortRun aborted emit run.aborted, dispose sessions

Non-terminal states for abortRun:

  • created
  • bound
  • planning
  • awaiting_approval
  • executing
  • paused

13.2 Run Phase State

States:

  • pending
  • running
  • awaiting_artifact
  • validating
  • awaiting_approval
  • completed
  • failed
  • skipped

Transitions:

From Trigger To
pending start running
running prompt sent, artifact expected awaiting_artifact
awaiting_artifact artifact appears validating
awaiting_artifact timeout running after probe/repair, or failed after exhaustion
validating valid awaiting_approval if gate, else completed
validating invalid running after one repair, else failed
awaiting_approval approve completed
awaiting_approval reject / abort failed
awaiting_approval request_changes running, attempt + 1

Replay rules:

  • phase.started.payload.repair === true marks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions, prompt.repaired, and must not start a third attempt.
  • Repair replay from running may reuse an existing READY / bootstrapped session even if last_prompt_hash still contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet.
  • If phase state is running, existing artifact files are never accepted unless the current prompt event (prompt.sent or prompt.repaired) for the current dedup key is already recorded. Replay without prompt proof treats existing files as stale.
  • If phase state is running, session state is BUSY, and last_prompt_hash matches the current prompt but the matching prompt event is missing, replay waits for the artifact with the current file signature as the baseline. This preserves idempotency without validating a stale pre-existing artifact.
  • Baseline-protected waits must not synthesize durable prompt proof before the wait finishes. If replay crashes or is cancelled before validation, the next replay must still treat the existing artifact as baseline/stale unless real prompt proof already exists.
  • If phase state is validating and no artifact row exists yet, replay re-reads and validates the current expectedArtifactPath instead of treating the state as corruption.
  • If phase state is validating and artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current session last_prompt_at; older rows are treated as stale previous-attempt outputs and the file is revalidated.
  • Session bootstrap DB row/state changes and session.created / session.ready events are written in one DB transaction after adapter start succeeds.

14. Approval State

States:

  • pending
  • approved
  • rejected
  • changes_requested
  • aborted
  • paused

14.1 Transitions

From Event To Side effects
pending approve decision approved insert decision row
pending reject decision rejected insert decision row; run -> failed
pending request_changes decision changes_requested insert decision row; increment attempt
pending abort decision aborted insert decision row; run -> aborted
pending timeout paused run -> paused; no decision row
paused unpause pending re-arm gate; no decision row
terminal states any decision unchanged return 409

Rules:

  • A pending request can transition to one non-pending state per pending epoch.
  • Terminal approval states reject further decisions.
  • paused may return to pending only through unpause.
  • Manual pause is run-level pauseRun; it leaves approval gate in pending.
  • Only approve, reject, request_changes, and abort create approval_decisions rows.
  • Default timeout is null.
  • Timeout never auto-approves or auto-rejects.

14.2 Decision Idempotency

  • GUI:
    • UUIDv4 per click.
    • reused across automatic UI retries for the same logical action.
  • CLI:
    • UUIDv4 per invocation.
    • --client-token=<uuid> override for scripted retry.
  • API:
    • existing (approval_request_id, action, client_token) returns existing row with status 200.
    • new decision inserts row and returns 201.
    • same token with different action returns 409.
    • decision on non-pending request returns 409.

14.3 Destructive Command Enforcement

Devflow-direct commands have hard enforcement. TUI-agent commands have best-effort enforcement.

Hard-blocked Devflow-direct patterns:

  • rm -rf
  • git reset --hard
  • git clean
  • git push --force
  • git push --force-with-lease
  • git worktree remove --force
  • git branch -D
  • docker volume rm
  • docker compose down -v
  • DROP DATABASE
  • DROP SCHEMA
  • migration rollback
  • reads/writes touching .env*, ~/.ssh/, ~/.aws/, ~/.config/gcloud/, ~/.kube/
  • files matching *token*, *secret*, *credentials*, *.pem, *.key

TUI-agent command enforcement is best-effort:

  1. Prelude prohibits destructive operations.
  2. Backend permission mode is set to safest available mode.
  3. Transcript audit captures post-hoc evidence.
  4. Human intervention goes through devflow attach.
  5. Worktrees and branches are preserved by default.

v1 does not claim real-time blocking of TUI-internal commands.

15. Run Engine and Temporal Contract

The M4 RunEngine contract is frozen before M5. M5 reimplements the same interface through Temporal.

15.1 Public API

interface RunEngine {
  startRun(input: RunStartInput): Promise<{ runId: string }>;
  signalApproval(
    runId: string,
    approvalRequestId: string,
    action: ApprovalDecisionAction,
    clientToken: string,
    comment?: string
  ): Promise<void>;
  pauseRun(runId: string): Promise<void>;
  resumeRun(runId: string): Promise<void>;
  abortRun(runId: string, reason: string): Promise<void>;
  getStatus(runId: string): Promise<RunStatus>;
}

15.2 Temporal Shape

  • Namespace: devflow.
  • Task queue: devflow-runs.
  • Single worker process: apps/worker.
  • Workflow: runWorkflow(input: RunStartInput).
  • Signals:
    • approve
    • pause
    • resume
    • abort
    • unpause
  • No Updates in M5.
  • Status is read from DB.

Activities:

  • M5 compatibility activity surface:
    • prepareRunActivity(input)
    • lockBindingsActivity(runId)
    • failRunActivity(runId, reason)
    • advanceRunActivity(runId)
    • signalApprovalActivity(runId, approvalRequestId, action, clientToken, comment?)
    • pauseRunActivity(runId)
    • resumeRunActivity(runId)
    • abortRunActivity(runId, reason)
    • getStatusActivity(runId)
    • isRunTerminalActivity(runId)
    • composeFinalReportActivity(runId)
  • advanceRunActivity is the M5 parity wrapper over M4 phase advancement. It may internally perform prompt send, artifact wait/validation, event recording, and approval request creation through the same DB/idempotency contracts already locked in sections 8 through 14.
  • The granular activity split (sendPromptToSession, waitForArtifact, validateArtifact, recordEvent, requestApproval, runCommand) is deferred to a later hardening ADR. It is not an M5 acceptance gate.
  • Prompt/session mutation still occurs only inside worker-hosted activities through SessionManager. M5+ API code never mutates SessionAdapter directly.

Retry policy:

  • Default: max attempts 3, exponential backoff start 1s, max 30s.
  • composeFinalReportActivity: max attempts 1.
  • Activity-level failures serialize DevflowError; non-recoverable Devflow errors are rethrown as non-retryable Temporal failures.
  • advanceRunActivity is cancellation-aware and idempotent by DB state, event idempotency keys, prompt dedup keys, and artifact content keys.
  • Already-applied approval signal replay repairs missing final reports for every terminal run state: completed, failed, and aborted, regardless of whether the replayed approval action was approve, request_changes, reject, or abort.
  • API-side already-applied approval replay is report-repair only. It must not call SessionAdapter mutation methods; reject/abort session disposal belongs to the worker/session-manager path that originally applies the decision.
  • If a workflow closes before the API observes an approval signal result, closed-workflow settlement must first verify the requested decision was applied, then replay approval side effects, then wait for the terminal report.

15.3 Hard Constraints

  • Workflow code holds only serializable state.
  • No tmux handles in workflow state.
  • No PTY refs in workflow state.
  • No DB clients in workflow state.
  • M5+ session interaction happens through activities calling SessionManager in apps/worker.
  • M5+ API never calls mutating SessionAdapter methods.
  • SessionManager advisory lock prevents API/worker ownership conflict during M4 -> M5 transition.
  • Workflow code uses deterministic clock/randomness only.

16. WriteSet and Worktree

16.1 WriteSet

  • Each task declares writeSet: string[].
  • Patterns are relative to repo root.
  • Glob engine: fast-glob.
  • Options:
{
  cwd: worktreeRoot,
  dot: true,
  followSymbolicLinks: false,
  onlyFiles: true,
  suppressErrors: false
}

Conflict detection:

  1. Expand writeSets.
  2. Forbidden globs cause conflict if matched by more than one task:
    • pnpm-lock.yaml
    • package-lock.json
    • **/migrations/**
    • **/*.generated.*
    • root tsconfig*.json
    • biome.json
    • lefthook.yml
    • .github/**
    • .gitlab-ci.yml
  3. Pairwise file intersections must be empty.

Conflict creates parallel_dag_approved gate.

16.2 Worktree Lifecycle

  • Worktree root:
    • WORKSPACE_ROOT/<runId>/<laneId>
    • non-parallel main lane: WORKSPACE_ROOT/<runId>/main
  • Created via git worktree add.
  • Branch name:
devflow/<runId>/<laneId>
  • Terminal run state does not remove worktrees or branches.
  • Output branches are deliverables.
  • Disk growth is accepted.
  • Cleanup is manual:
devflow cleanup <run-id> [--lane=<id>]

Cleanup:

  • uses git worktree remove without --force by default.
  • refuses dirty worktrees.
  • --force requires an additional gate.
  • git branch -D is destructive and gated.
  • doctor --list-orphans lists only; it never removes.

17. SSE Contract

Endpoints:

  • GET /sse/runs/:runId
  • GET /sse/global

Heartbeat every 15 seconds.

Events:

Event Scope
run.state_changed both
run.event_appended run
phase.state_changed run
approval.created both
approval.resolved both
session.state_changed run
transcript.chunk_appended run
artifact.validated run

Reconnect:

  • Last-Event-ID is last run_events.seq.
  • server replays seq > lastSeq.
  • non-run-event SSE types are not replayed; state is re-derived by fetch.

18. Errors

packages/core/src/errors.ts:

type ErrorClass = 'recoverable' | 'human_required' | 'fatal';

class DevflowError extends Error {
  readonly class: ErrorClass;
  readonly code: string;
  readonly runId?: string;
  readonly phaseId?: string;
  readonly recoveryHint?: string;
  readonly cause?: unknown;
}

Recoverable:

  • network_blip
  • pane_briefly_unresponsive
  • prompt_send_transient
  • db_serialization_retry

Human required:

  • artifact_invalid_after_repair
  • artifact_timeout_exhausted
  • prompt_send_exhausted
  • destructive_command_blocked
  • secret_access_blocked
  • backend_unavailable
  • no_eligible_persona
  • writeset_conflict
  • merge_conflict
  • objective_not_met
  • review_dispute_unresolved

Fatal:

  • db_unreachable
  • workspace_permissions
  • internal_state_corruption
  • template_load_failed
  • artifact_schema_unknown
  • artifact_schema_load_failed
  • migration_pending
  • config_invalid

Mapping:

  • recoverable -> retry; exhausted -> human_required.
  • human_required / recovery gate -> run paused and gate created. This is distinct from normal workflow approval gates in §13.1, which use awaiting_approval.
  • fatal -> run failed, sessions disposed, final report best-effort.

19. Concurrent Runs and Crash Recovery

19.1 Active Run Uniqueness

  • MAX_CONCURRENT_RUNS, default 4.
  • DB partial unique index is the source of truth:
    • one active run per (repo_path, base_branch).
  • repo_path is canonicalized before insert.
  • Advisory lock is auxiliary only:
pg_try_advisory_xact_lock(hash64('devflow:start-run', repoPath, baseBranch))
  • Unique-index violation returns:
{ "currentRunId": "...", "currentState": "..." }

with HTTP 409.

19.2 Crash Recovery

M4, no Temporal:

  • On apps/api startup, sweep non-terminal runs.
  • Mark them failed.
  • final_report_path = null.
  • Append synthesized run.failed with reason process_restart_unrecovered.
  • Cascade associated tui_sessions to FAILED_NEEDS_HUMAN.
  • Append session.failed.
  • This frees active-run uniqueness slots.

M5+:

  • No sweep.
  • Temporal durability owns in-flight workflow recovery.
  • SessionManager resumes tmux sessions.
  • Active-run partial index blocks duplicate runs until completion or explicit abort.

20. Milestones

M1: Monorepo + Postgres + CLI Doctor

  • Scaffold workspace.
  • Add pnpm, tsconfig, biome, lefthook, Vitest.
  • Add Docker Compose for Postgres.
  • Add Drizzle and first migration.
  • Add devflow doctor.
  • Implement checks 1-9.
  • Stub checks 10-12 as warn where needed.
  • Add SSE compatibility smoke test:
    • minimal Fastify 5 server.
    • fastify-sse-v2 plugin.
    • 30-second integration test.
    • receive 3 events and reconnect.
    • if plugin fails, implement native reply.raw SSE helper before M1 is green.

M2: Core Schema + Registry + Binding

  • Implement enums.
  • Implement canonical hashing.
  • Implement Template schema.
  • Implement Persona schema.
  • Implement seed loader.
  • Implement binding algorithm.
  • Implement artifact schema registry.
  • Add first schemas:
    • dev/spec@1
    • dev/phase-plan@1
    • common/final-report@1
  • Tests:
    • schema validation.
    • override semantics.
    • risk enforcement.
    • diversity enforcement.
    • deterministic auto-select.

M3: Fake Session Runtime

  • Implement SessionAdapter.
  • Implement FakeSessionAdapter.
  • Implement prompt envelope.
  • Implement event recorder.
  • Implement fake sentinel scenarios.
  • Persist transcript chunks.
  • Tests:
    • prompt correlation.
    • artifact validation.
    • invalid artifact.
    • timeout.
    • fake crash.

M4: Minimal Run Engine

  • Implement packages/run-engine.
  • Used directly by apps/api.
  • No Temporal.
  • Supports:
    • start run.
    • lock bindings.
    • approval.
    • fake prompt.
    • artifact wait/validate.
    • final report.
  • Freeze the RunEngine contract.
  • Full fake development@1 minus reviewers.

M5: Temporal Integration

  • Reimplement RunEngine through Temporal.
  • Preserve M4 behavior.
  • Add parity tests using the same M4 scenarios.
  • M5+ SessionManager lives in apps/worker.

M6: Real tmux SessionManager

  • Implement TmuxSessionAdapter.
  • Decoupled from M5.
  • May begin after M3 is stable.
  • Pre-M5 real tmux is opt-in smoke only.
  • Production run path remains fake until both M5 and M6 are green.

M7: TUI Recovery State Machine

  • Implement session state transitions.
  • Implement recovery counters.
  • Implement escalation to human gates.

M8: API + GUI Minimum

  • Implement Fastify routes.
  • Implement SSE.
  • Implement GUI screens:
    • Dashboard.
    • Templates.
    • Personas.
    • New Run.
    • Run Detail.
    • Approvals.
    • TUI Sessions.

M9: development@1 Fake-Agent Full Run

  • Add curated development@1.
  • Add review consensus.
  • Add verifier flow with fake reviewers.
  • Add coverage gate >=70% lines for core/session/run-engine.

M10: Codex/Claude Opt-In Real Run

  • Implement profiles:
    • packages/session/src/profiles/codex.ts
    • packages/session/src/profiles/claude.ts
  • Real backends become production-default only after both M5 and M6 are green.
  • Until then real tmux/Codex/Claude are developer-flagged opt-in smoke only.

M11: Parallel Lanes

  • Add task DAG scheduler.
  • Add writeSet detection.
  • Add per-lane worktrees.
  • Add merge coordinator.
  • Add conflict gates.

M12: Backtest Workflow

  • Add backtest-strategy@1.
  • Add objective evaluator.
  • Add metric parser extension points.
  • Add failure mining artifacts.
  • Add Backtest Lab GUI.

M13: Template Factory

  • Generate draft template from natural language and repo discovery.
  • Add harness design.
  • Add template review.
  • Add dry-run and promote flow.

21. Out of Scope

  • Authentication.
  • Authorization.
  • Multi-user support.
  • Data retention or archival policy.
  • Observability dashboards.
  • Remote template/persona registries.
  • Multi-machine deployment.
  • HA.
  • Managed backups.
  • Web ingress.
  • TLS.
  • Reverse proxy.

22. Decision Log

Open Questions Closed

# Question Resolution
OQ-1 Persona/template seeding format Immutable YAML at docs/schemas/{personas,templates}/<name>@<version>.yaml
OQ-2 Approval timeout default null; timeout freezes only
OQ-3 Final report format Markdown and JSON
OQ-4 Temporal namespace/queue namespace devflow, task queue devflow-runs
OQ-5 WriteSet glob engine fast-glob
OQ-6 Backtest objective DSL Stub in M12, full DSL deferred
OQ-7 Codex/Claude prompt prelude Structure locked, exact text deferred to M10

Blocking Corrections Applied

# Issue Resolution
CC-1 Terminal state deleted worktrees/branches Preserve by default; manual gated cleanup only
CC-2 SessionManager location conflict M4 API, M5+ worker
CC-3 Event duplicates under retry run_events.idempotency_key
CC-4 Destructive command enforcement overclaimed Devflow-direct hard, TUI best-effort
CC-5 UUID extension missing CREATE EXTENSION IF NOT EXISTS pgcrypto
CC-6 Advisory lock not enough for active-run uniqueness partial unique index
CC-7 Undefined transition sequence in event keys cause-based keys
CC-8 Approval paused transition missing explicit approval transition table
CC-9 AutoSelect order nondeterministic deterministic sort
CC-10 SSE plugin compatibility assumed M1 smoke + native fallback
CC-11 ApprovalAction included pause split ApprovalDecisionAction; pauseRun is run-level
CC-12 Artifact hash key collision include phase id and path
CC-13 Resume previous state not stored runs.paused_from_state
CC-14 repo path aliasing canonical realpath storage
CC-15 M4 sweep left tmux sessions ambiguous cascade session state to FAILED_NEEDS_HUMAN; real tmux production-default only after M5+M6
CC-16 Prompt hash used phaseId but envelope uses phaseKey prompt hash uses phaseKey
CC-17 abortRun transition too narrow abort from any non-terminal run state
CC-18 approval pending transition wording conflicted with pause epoch pending can transition once per pending epoch; paused may unpause to pending
CC-19 tsc -b --noEmit is brittle with TypeScript 5.6 project references on clean worktrees build still uses tsc -b; no-emit verification uses root tsconfig.typecheck.json
CC-20 sendPrompt retry count was ambiguous against Temporal activity attempts §8.3 now states retry budget means initial attempt plus retries; §15.2 remains Temporal-level attempts only
CC-21 Duplicate prompt dedup handling conflicted with adapter retry idempotency duplicate dedupKey returns idempotent success without reprocessing
CC-22 Normal workflow approval gates and human-required recovery gates were easy to conflate §13.1 names normal workflow gates; §18 keeps human_required recovery gates paused
CC-23 Phase start and event append could diverge under retry/error phase start and phase.started append occur in one DB transaction
CC-24 Repair attempt replay lost repair prompt identity and one-repair budget repair attempts are derived from phase.started.payload.repair, replay uses repair instructions and prompt.repaired, and cannot start attempt 3
CC-25 validating replay failed if crash happened before artifact row insert replay revalidates the expected artifact file when state is validating but no artifact row exists
CC-26 Session bootstrap state/events could diverge session row/state and session.created / session.ready events are committed in one DB transaction
CC-27 validating replay could reuse stale previous-attempt artifact rows artifact-row replay requires artifact.created_at >= tui_sessions.last_prompt_at; otherwise the file is revalidated
CC-28 repair running replay rejected existing READY sessions with previous attempt prompt hash current-attempt repair prompt is considered unsent, so replay may reuse the session and send prompt.repaired
CC-29 API Temporal approval replay omitted M4 approval side-effect repair API approval signal reader now wires replayAppliedApprovalSideEffects, so already-applied terminal approval replays can repair missing final reports
CC-30 running replay could validate stale artifacts without prompt proof running replay requires matching prompt event proof; BUSY replay without prompt event uses current artifact signature as baseline and ignores stale files
CC-31 M5 activity list over-specified granular activities not implemented by the M4 parity adapter M5 locks the compatibility activity wrapper surface; granular activity split is deferred to a later hardening ADR
CC-32 Already-applied approve / request_changes replay repaired missing reports for completed / failed but missed aborted approval replay side-effect repair now composes missing final reports for all terminal states
CC-33 API-side already-applied reject / abort replay tried to dispose sessions through DB-only replay validation runtime API replay side effects are report-repair only; worker-side decision application owns session disposal
CC-34 Closed-workflow approval settlement waited for reports but did not replay approval side effects settlement now verifies the requested decision, replays side effects, then waits for the terminal report
CC-35 Baseline-protected BUSY replay recorded synthetic prompt proof before the baseline wait was durable baseline replay no longer records synthetic prompt events; replay without real prompt proof keeps treating existing files as stale

Future Open Questions

  • FOQ-1, M12: full backtest objective DSL.
  • FOQ-2, M13: template factory generation prompts.
  • FOQ-3, post-M10: optional third backend such as Gemini.
  • FOQ-4, post-M8: WebSocket vs SSE if transcript pressure requires it.

23. Kickoff Order

  1. M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
  2. M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
  3. M1.3: apps/cli skeleton + devflow doctor.
  4. M1.4: packages/core skeleton with config, enums, errors, hash, prompt-envelope, run-event types.
  5. M2.1: Zod schemas for Template/Persona, persona YAML loader, hashing.
  6. M2.2: Binding algorithm + tests.
  7. M2.3: Artifact schema registry + first three schemas.
  8. M3.1: SessionAdapter interface + FakeSessionAdapter.
  9. M3.2: Transcript chunk capture + DB persistence.
  10. M3.3: engine-shaped harness running a single fake phase end-to-end.
  11. M4: assemble run engine; lock contract; full fake development@1 minus reviewers.
  12. M5 in parallel with M6 once M4 is green.