1744 lines
46 KiB
Markdown
1744 lines
46 KiB
Markdown
# Devflow Implementation Plan v3 r4
|
|
|
|
## 0. Document Status
|
|
|
|
- This document supersedes v2 and all earlier v3 drafts where conflicting.
|
|
- Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
|
|
- Target OS: macOS 13+ / Linux. No Windows.
|
|
- All paths are Unix-style. All times are stored UTC.
|
|
- Decisions in this document are locked unless explicitly marked `(provisional)`. Override requires updating this document, not only code.
|
|
- r1 applied CC-1 through CC-5.
|
|
- r2 applied CC-6 through CC-10.
|
|
- r3 applied CC-11 through CC-15.
|
|
- r4 applies CC-16 through CC-18.
|
|
|
|
## 1. Stack Decisions
|
|
|
|
### 1.1 Workspace
|
|
|
|
- `pnpm 9` with workspaces. No Turbo.
|
|
- Node 22 LTS, pinned by `.nvmrc` and `package.json#engines`.
|
|
- TypeScript 5.6 with project references via `tsc -b`.
|
|
- `strict: true`.
|
|
- No `any` unless accompanied by an explicit annotation comment explaining why.
|
|
|
|
### 1.2 Tooling
|
|
|
|
- Build:
|
|
- `tsup` for libraries, CJS + ESM dual output.
|
|
- `vite` for `apps/web`.
|
|
- `tsx` for `apps/cli`, `apps/api`, and `apps/worker` in dev.
|
|
- `node` for prod-ish local runs.
|
|
- Test:
|
|
- `vitest` with workspace config.
|
|
- Coverage via `@vitest/coverage-v8`.
|
|
- No coverage gate at M1.
|
|
- M9 adds coverage gate: >=70% lines on `packages/core`, `packages/session`, `packages/run-engine`.
|
|
- Lint/format:
|
|
- `biome`.
|
|
- One root config.
|
|
- Pre-commit:
|
|
- `lefthook`.
|
|
- Runs `biome check --write` on staged files.
|
|
- Runs `tsc -b --noEmit` on changed packages.
|
|
- Runs related Vitest tests on changed packages.
|
|
|
|
### 1.3 Database
|
|
|
|
- Postgres 16 via Docker Compose.
|
|
- Drizzle ORM + `drizzle-kit generate`.
|
|
- Generated SQL migrations are committed.
|
|
- Migrations are never auto-applied at runtime except through the explicit migration runner invoked by `devflow up`.
|
|
- Migration runner:
|
|
- `scripts/migrate.ts`.
|
|
- Takes `DATABASE_URL`.
|
|
- `devflow up` waits for Postgres health and then runs pending migrations.
|
|
|
|
### 1.4 Logging
|
|
|
|
- `pino`.
|
|
- `pino-pretty` in dev, JSON otherwise.
|
|
- Standard fields:
|
|
- `time`
|
|
- `level`
|
|
- `module`
|
|
- `runId?`
|
|
- `phaseId?`
|
|
- `role?`
|
|
- `eventId?`
|
|
- Levels:
|
|
- `trace`: transcript chunks only.
|
|
- `debug`: internal state transitions.
|
|
- `info`: run events.
|
|
- `warn`: recoverable errors.
|
|
- `error`: human-required or fatal errors.
|
|
|
|
### 1.5 Config
|
|
|
|
- Single Zod schema in `packages/core/src/config.ts`.
|
|
- Source precedence, high to low:
|
|
- `process.env`
|
|
- `.env.local`
|
|
- `.env`
|
|
- schema defaults
|
|
- Config is loaded once at process start, validated, frozen, and exported as typed `Config`.
|
|
- Config validation failure is fatal.
|
|
- Required keys at M1:
|
|
- `DATABASE_URL`
|
|
- `WORKSPACE_ROOT`
|
|
- `LOG_LEVEL`
|
|
- M5 adds:
|
|
- `TEMPORAL_ADDRESS`
|
|
- Path canonicalization:
|
|
- `WORKSPACE_ROOT` is resolved through `fs.realpathSync` and stored as an absolute path at config load.
|
|
- Any path entering the system must be canonicalized before storage or hashing.
|
|
- `repo_path` and `worktree_root` rules are defined in section 4.
|
|
|
|
Backend registration:
|
|
|
|
```ts
|
|
const BackendConfig = z.object({
|
|
id: Backend, // codex | claude | fake
|
|
enabled: z.boolean(),
|
|
binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude
|
|
});
|
|
```
|
|
|
|
- `fake` is always available.
|
|
- `codex` and `claude` are available only when:
|
|
- `enabled=true`
|
|
- binary resolves at process start.
|
|
- Resolution failure:
|
|
- `doctor` warns.
|
|
- binding fails fast at run start with `human_required:backend_unavailable`.
|
|
- Binding reads from `config.backends`, never directly from `PATH`.
|
|
|
|
### 1.6 HTTP
|
|
|
|
- `fastify` 5.
|
|
- `@fastify/sensible`.
|
|
- SSE primary strategy:
|
|
- Try `fastify-sse-v2`.
|
|
- Fastify 5 compatibility is not assumed.
|
|
- M1 includes a smoke test.
|
|
- SSE fallback:
|
|
- Native `reply.raw`.
|
|
- Headers:
|
|
- `content-type: text/event-stream`
|
|
- `cache-control: no-cache`
|
|
- `connection: keep-alive`
|
|
- Write `data: <json>\n\n`.
|
|
- Manage heartbeats and reconnect manually.
|
|
- WebSocket is deferred unless SSE fails under transcript volume.
|
|
|
|
## 2. Directory Layout
|
|
|
|
```text
|
|
devflow/
|
|
├── package.json
|
|
├── pnpm-workspace.yaml
|
|
├── tsconfig.base.json
|
|
├── biome.json
|
|
├── lefthook.yml
|
|
├── vitest.workspace.ts
|
|
├── docker-compose.yml
|
|
├── .nvmrc
|
|
├── .env.example
|
|
├── docs/
|
|
│ ├── plan.md
|
|
│ ├── adr/
|
|
│ └── schemas/
|
|
│ ├── artifacts/
|
|
│ ├── personas/
|
|
│ └── templates/
|
|
├── scripts/
|
|
│ ├── migrate.ts
|
|
│ └── seed.ts
|
|
├── packages/
|
|
│ ├── core/
|
|
│ │ └── src/
|
|
│ │ ├── config.ts
|
|
│ │ ├── enums.ts
|
|
│ │ ├── hash.ts
|
|
│ │ ├── errors.ts
|
|
│ │ ├── template.ts
|
|
│ │ ├── persona.ts
|
|
│ │ ├── binding.ts
|
|
│ │ ├── prompt-envelope.ts
|
|
│ │ ├── artifact-schema.ts
|
|
│ │ ├── run-event.ts
|
|
│ │ └── index.ts
|
|
│ ├── db/
|
|
│ │ └── src/
|
|
│ │ ├── schema/
|
|
│ │ ├── migrations/
|
|
│ │ ├── repositories/
|
|
│ │ └── client.ts
|
|
│ ├── session/
|
|
│ │ └── src/
|
|
│ │ ├── adapter.ts
|
|
│ │ ├── fake.ts
|
|
│ │ ├── tmux.ts
|
|
│ │ ├── profiles/
|
|
│ │ │ ├── codex.ts
|
|
│ │ │ └── claude.ts
|
|
│ │ ├── recovery.ts
|
|
│ │ └── transcript.ts
|
|
│ ├── harness/
|
|
│ │ └── src/
|
|
│ │ ├── git.ts
|
|
│ │ ├── worktree.ts
|
|
│ │ ├── runner.ts
|
|
│ │ ├── review.ts
|
|
│ │ └── backtest.ts
|
|
│ ├── run-engine/
|
|
│ │ └── src/
|
|
│ │ ├── engine.ts
|
|
│ │ ├── phase-executor.ts
|
|
│ │ └── approval.ts
|
|
│ └── workflows/
|
|
│ └── src/
|
|
│ ├── workflow.ts
|
|
│ └── activities.ts
|
|
├── apps/
|
|
│ ├── api/
|
|
│ ├── web/
|
|
│ ├── cli/
|
|
│ └── worker/
|
|
└── tests/
|
|
├── e2e/
|
|
└── fixtures/
|
|
```
|
|
|
|
## 3. `devflow doctor`
|
|
|
|
Exit codes:
|
|
|
|
- `0`: all green.
|
|
- `1`: one or more red checks.
|
|
- `2`: internal or unknown error.
|
|
|
|
Each check emits:
|
|
|
|
- `name`
|
|
- `status`: `pass` | `fail` | `warn`
|
|
- `detail`
|
|
- `remediation`
|
|
|
|
Closed check list:
|
|
|
|
1. Node version satisfies `>=22.0.0 <23`.
|
|
2. pnpm version `>=9.0.0`.
|
|
3. `tmux` exists, version `>=3.3`.
|
|
4. `git` version `>=2.40`.
|
|
5. Docker daemon reachable.
|
|
6. Postgres container running, `pg_isready` ok, `DATABASE_URL` connects.
|
|
7. No pending Drizzle migrations.
|
|
8. `WORKSPACE_ROOT` exists and is writable.
|
|
9. `.env` resolves to valid `Config`.
|
|
10. `codex` in `PATH`, warn-only.
|
|
11. `claude` in `PATH`, warn-only.
|
|
12. Free disk on `WORKSPACE_ROOT` partition:
|
|
- warn under 10GB.
|
|
- fail under 2GB.
|
|
- target green threshold: >=5GB.
|
|
|
|
Output:
|
|
|
|
- Human table by default.
|
|
- `--json` for machine-readable output.
|
|
- `--quiet` prints only nonzero results.
|
|
- `--list-orphans` lists orphaned worktrees only; it never removes them.
|
|
|
|
## 4. Database Schema
|
|
|
|
First migration prelude:
|
|
|
|
```sql
|
|
CREATE EXTENSION IF NOT EXISTS pgcrypto;
|
|
```
|
|
|
|
All tables use `gen_random_uuid()` primary keys unless noted. All times are `timestamptz`. Mutable rows include `updated_at`. JSON columns use `jsonb`.
|
|
|
|
### 4.1 `workflow_templates`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `name text not null`
|
|
- `version int not null`
|
|
- `hash text not null unique`
|
|
- `definition jsonb not null`
|
|
- `created_at timestamptz not null default now()`
|
|
- unique `(name, version)`
|
|
|
|
### 4.2 `agent_personas`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `name text not null`
|
|
- `version int not null`
|
|
- `hash text not null unique`
|
|
- `definition jsonb not null`
|
|
- `created_at timestamptz not null default now()`
|
|
- unique `(name, version)`
|
|
|
|
### 4.3 `runs`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `template_id uuid not null references workflow_templates(id)`
|
|
- `template_hash text not null`
|
|
- `state text not null`
|
|
- `repo_path text not null`
|
|
- canonical absolute path
|
|
- resolved through `fs.realpathSync` before insert
|
|
- `base_branch text not null`
|
|
- `worktree_root text not null`
|
|
- canonical absolute path under `WORKSPACE_ROOT/<runId>/`
|
|
- `current_phase_id uuid references run_phases(id)` nullable and deferrable
|
|
- `started_at timestamptz`
|
|
- `ended_at timestamptz`
|
|
- `final_report_path text`
|
|
- `paused_from_state text`
|
|
- set when transitioning to `paused`
|
|
- cleared on resume
|
|
- null when state is not `paused`
|
|
- `created_at timestamptz not null default now()`
|
|
- `updated_at timestamptz`
|
|
|
|
Active-run uniqueness:
|
|
|
|
```sql
|
|
CREATE UNIQUE INDEX ux_active_run_repo_base
|
|
ON runs (repo_path, base_branch)
|
|
WHERE state NOT IN ('completed', 'failed', 'aborted');
|
|
```
|
|
|
|
### 4.4 `run_inputs`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null unique references runs(id) on delete cascade`
|
|
- `requirements_md text not null`
|
|
- `objective jsonb`
|
|
- `extra jsonb`
|
|
- `input_hash text not null`
|
|
|
|
`input_hash` is based on:
|
|
|
|
- `requirements_md`
|
|
- `objective`
|
|
- `extra`
|
|
- canonical `repo_path`
|
|
- `base_branch`
|
|
|
|
### 4.5 `run_bindings`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `role_id text not null`
|
|
- `persona_id uuid not null references agent_personas(id)`
|
|
- `persona_hash text not null`
|
|
- `backend text not null`
|
|
- `binding_hash text not null`
|
|
- unique `(run_id, role_id)`
|
|
|
|
### 4.6 `run_phases`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `phase_key text not null`
|
|
- `seq int not null`
|
|
- `state text not null`
|
|
- `attempts int not null default 0`
|
|
- `started_at timestamptz`
|
|
- `ended_at timestamptz`
|
|
- unique `(run_id, phase_key)`
|
|
|
|
### 4.7 `run_events`
|
|
|
|
Append-only.
|
|
|
|
- `id bigserial primary key`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `phase_id uuid references run_phases(id)`
|
|
- `seq bigint not null`
|
|
- `type text not null`
|
|
- `payload jsonb not null`
|
|
- `idempotency_key text not null`
|
|
- `ts timestamptz not null default now()`
|
|
- unique `(run_id, seq)`
|
|
- unique `(run_id, idempotency_key)`
|
|
- index `(run_id, ts)`
|
|
|
|
Concurrency:
|
|
|
|
- All inserts go through `RunEventRepository.append()`.
|
|
- Raw SQL inserts into `run_events` are forbidden.
|
|
- `append()` takes `pg_advisory_xact_lock(hash64('devflow:run-events', run_id))`.
|
|
- Inside that same transaction it assigns:
|
|
|
|
```sql
|
|
seq := COALESCE(MAX(seq), 0) + 1
|
|
```
|
|
|
|
### 4.8 `approval_requests`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id)`
|
|
- `phase_id uuid references run_phases(id)`
|
|
- `gate_key text not null`
|
|
- `state text not null`
|
|
- `idempotency_key text not null`
|
|
- `payload jsonb not null`
|
|
- `created_at timestamptz not null default now()`
|
|
- `resolved_at timestamptz`
|
|
- unique `(idempotency_key)`
|
|
|
|
### 4.9 `approval_decisions`
|
|
|
|
Append-only and immutable.
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `approval_request_id uuid not null references approval_requests(id)`
|
|
- `action text not null`
|
|
- `approve`
|
|
- `reject`
|
|
- `request_changes`
|
|
- `abort`
|
|
- `comment text`
|
|
- `decided_at timestamptz not null default now()`
|
|
- `idempotency_key text not null unique`
|
|
|
|
`pause` is not an approval decision.
|
|
|
|
### 4.10 `tui_sessions`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `role_id text not null`
|
|
- `backend text not null`
|
|
- `cwd text not null`
|
|
- `expected_artifact_path text`
|
|
- `expected_schema text`
|
|
- `last_prompt_hash text`
|
|
- `last_prompt_at timestamptz`
|
|
- `last_capture_seq bigint not null default 0`
|
|
- `last_known_pane_pid int`
|
|
- `tmux_session text`
|
|
- `tmux_window text`
|
|
- `state text not null`
|
|
- `recovery_attempts int not null default 0`
|
|
- unique `(run_id, role_id)`
|
|
|
|
### 4.11 `tui_transcript_chunks`
|
|
|
|
Append-only.
|
|
|
|
- `id bigserial primary key`
|
|
- `session_id uuid not null references tui_sessions(id) on delete cascade`
|
|
- `seq bigint not null`
|
|
- `content text not null`
|
|
- `captured_at timestamptz not null default now()`
|
|
- unique `(session_id, seq)`
|
|
|
|
### 4.12 `artifacts`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `phase_id uuid references run_phases(id)`
|
|
- `path text not null`
|
|
- `schema_id text not null`
|
|
- `hash text not null`
|
|
- `valid boolean not null`
|
|
- `validation_error jsonb`
|
|
- `created_at timestamptz not null default now()`
|
|
- unique `(run_id, path, hash)`
|
|
|
|
### 4.13 `commands`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `phase_id uuid references run_phases(id)`
|
|
- `kind text not null`
|
|
- `git`
|
|
- `test`
|
|
- `e2e`
|
|
- `doctor`
|
|
- `backtest`
|
|
- `other`
|
|
- `argv text[] not null`
|
|
- `cwd text not null`
|
|
- `exit_code int`
|
|
- `stdout_path text`
|
|
- `stderr_path text`
|
|
- `started_at timestamptz`
|
|
- `ended_at timestamptz`
|
|
|
|
### 4.14 `review_findings`
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `phase_id uuid references run_phases(id)`
|
|
- `reviewer_role text not null`
|
|
- `severity text not null`
|
|
- `info`
|
|
- `low`
|
|
- `medium`
|
|
- `high`
|
|
- `critical`
|
|
- `category text not null`
|
|
- `correctness`
|
|
- `evidence`
|
|
- `style`
|
|
- `security`
|
|
- `performance`
|
|
- `other`
|
|
- `file_path text`
|
|
- `line int`
|
|
- `summary text not null`
|
|
- `evidence text`
|
|
- `verifier_status text not null default 'unverified'`
|
|
- `unverified`
|
|
- `confirmed`
|
|
- `rejected`
|
|
- `created_at timestamptz not null default now()`
|
|
|
|
### 4.15 Backtest Stub Tables
|
|
|
|
`backtest_iterations` and `backtest_metrics` are created at M1 as stub tables:
|
|
|
|
- `id uuid primary key default gen_random_uuid()`
|
|
- `run_id uuid not null references runs(id) on delete cascade`
|
|
- `payload jsonb`
|
|
- `created_at timestamptz not null default now()`
|
|
|
|
Full schema is deferred to M12.
|
|
|
|
## 5. Enums
|
|
|
|
All enums live in `packages/core/src/enums.ts` as TypeScript `const` objects and Zod enums.
|
|
|
|
### 5.1 `Backend`
|
|
|
|
- `codex`
|
|
- `claude`
|
|
- `fake`
|
|
|
|
Future `gemini` support adds an enum entry and a `BackendProfile`; no design change.
|
|
|
|
### 5.2 `Capability`
|
|
|
|
- `spec_write`
|
|
- `phase_planning`
|
|
- `task_dag_planning`
|
|
- `code_edit`
|
|
- `test_first_development`
|
|
- `code_review`
|
|
- `evidence_check`
|
|
- `command_execute`
|
|
- `backtest_run`
|
|
- `metric_extract`
|
|
- `failure_mining`
|
|
- `objective_eval`
|
|
- `final_report_compose`
|
|
|
|
### 5.3 `RiskLevel`
|
|
|
|
- `low`
|
|
- `medium`
|
|
- `high`
|
|
|
|
Risk is declared per phase in the template. Persona has `maxRiskLevel`. Binding fails when `phase.risk > persona.maxRiskLevel`.
|
|
|
|
### 5.4 `ApprovalDecisionAction`
|
|
|
|
- `approve`
|
|
- `reject`
|
|
- `request_changes`
|
|
- `abort`
|
|
|
|
`pause` is a run-level control operation, not an approval decision.
|
|
|
|
### 5.5 `ApprovalState`
|
|
|
|
- `pending`
|
|
- `approved`
|
|
- `rejected`
|
|
- `changes_requested`
|
|
- `aborted`
|
|
- `paused`
|
|
|
|
`paused` is not an auto-decision.
|
|
|
|
### 5.6 `RunState`
|
|
|
|
- `created`
|
|
- `bound`
|
|
- `planning`
|
|
- `awaiting_approval`
|
|
- `executing`
|
|
- `paused`
|
|
- `completed`
|
|
- `failed`
|
|
- `aborted`
|
|
|
|
### 5.7 `RunPhaseState`
|
|
|
|
- `pending`
|
|
- `running`
|
|
- `awaiting_artifact`
|
|
- `validating`
|
|
- `awaiting_approval`
|
|
- `completed`
|
|
- `failed`
|
|
- `skipped`
|
|
|
|
### 5.8 `SessionState`
|
|
|
|
- `CREATED`
|
|
- `BOOTSTRAPPING`
|
|
- `READY`
|
|
- `BUSY`
|
|
- `WAITING_FOR_APPROVAL`
|
|
- `ARTIFACT_TIMEOUT`
|
|
- `HUNG`
|
|
- `CRASHED`
|
|
- `RESUMING`
|
|
- `REBOOTSTRAPPED`
|
|
- `FAILED_NEEDS_HUMAN`
|
|
|
|
## 6. Content-Addressed Hashing
|
|
|
|
### 6.1 Canonical JSON
|
|
|
|
- Object keys sorted lexicographically by UTF-16 code units.
|
|
- No insignificant whitespace.
|
|
- Strings use standard JSON escaping.
|
|
- No Unicode normalization.
|
|
- Numbers use shortest round-trippable representation.
|
|
- Integers have no decimal point.
|
|
- No leading zeros.
|
|
- Arrays preserve order.
|
|
- No trailing newline.
|
|
|
|
`packages/core/src/hash.ts` exports:
|
|
|
|
```ts
|
|
canonicalize(value: unknown): string
|
|
hash(value: unknown): string
|
|
```
|
|
|
|
`hash()` returns `sha256hex(canonicalize(value))`.
|
|
|
|
### 6.2 Hash Subjects
|
|
|
|
- Template hash:
|
|
- `{ name, version, roles, phases, gates, capabilitiesRequired }`
|
|
- Persona hash:
|
|
- `{ name, version, capabilities, backend, maxRiskLevel, allowedRoles, promptConfig, modelConfig }`
|
|
- Binding hash:
|
|
- `{ runId, roleId, templateHash, personaHash, backend, override }`
|
|
- Run input hash:
|
|
- `{ templateHash, bindings: sorted[bindingHash], requirementsMd, objective, repoPath, baseBranch, extra }`
|
|
- Prompt hash:
|
|
- `{ runId, roleId, phaseKey, expectedArtifact, expectedSchema, instructions, attempt }`
|
|
- Artifact hash:
|
|
- SHA-256 of file bytes.
|
|
|
|
Prompt hash uses `phaseKey`, not `phaseId`, because `PromptEnvelope` carries `phaseKey`.
|
|
|
|
## 7. Template, Persona, Binding
|
|
|
|
### 7.1 Template Schema
|
|
|
|
```ts
|
|
const TemplatePhase = z.object({
|
|
key: z.string(),
|
|
title: z.string(),
|
|
risk: RiskLevel,
|
|
roles: z.array(z.string()),
|
|
expectedArtifact: z
|
|
.object({
|
|
path: z.string(),
|
|
schema: z.string(),
|
|
})
|
|
.optional(),
|
|
gates: z.array(z.string()).default([]),
|
|
timeoutMs: z.number().int().positive().optional(),
|
|
});
|
|
|
|
const TemplateRole = z.object({
|
|
id: z.string(),
|
|
requiredCapabilities: z.array(Capability),
|
|
preferredBackends: z.array(Backend).default([]),
|
|
count: z.number().int().min(1).default(1),
|
|
diversity: z
|
|
.object({
|
|
requireDifferentBackends: z.boolean().default(false),
|
|
})
|
|
.optional(),
|
|
});
|
|
|
|
const Template = z.object({
|
|
name: z.string(),
|
|
version: z.number().int().positive(),
|
|
roles: z.array(TemplateRole),
|
|
phases: z.array(TemplatePhase),
|
|
defaultGates: z.array(z.string()).default([]),
|
|
});
|
|
```
|
|
|
|
### 7.2 Persona Schema
|
|
|
|
```ts
|
|
const Persona = z.object({
|
|
name: z.string(),
|
|
version: z.number().int().positive(),
|
|
backend: Backend,
|
|
capabilities: z.array(Capability),
|
|
maxRiskLevel: RiskLevel,
|
|
allowedRoles: z.array(z.string()).optional(),
|
|
promptConfig: z
|
|
.object({
|
|
systemPrompt: z.string().optional(),
|
|
instructionsPrelude: z.string().optional(),
|
|
})
|
|
.default({}),
|
|
modelConfig: z.record(z.string(), z.unknown()).default({}),
|
|
});
|
|
```
|
|
|
|
### 7.3 Override Semantics
|
|
|
|
- Override may swap persona for a role.
|
|
- Override may constrain backend to a specific allowed backend.
|
|
- Override cannot add capabilities.
|
|
- Override cannot raise risk above persona `maxRiskLevel`.
|
|
- Diversity rules apply after override.
|
|
- Lock-time validation runs the full binding algorithm.
|
|
- On first binding failure, the run does not start.
|
|
|
|
### 7.4 Binding Algorithm
|
|
|
|
For each role:
|
|
|
|
1. Select override persona if present; otherwise run `autoSelect`.
|
|
2. Assert backend is enabled in `config.backends`.
|
|
3. Assert non-fake backend binary resolved at process start.
|
|
4. Assert role id is in `allowedRoles`, unless `allowedRoles` is absent.
|
|
5. Assert required capabilities are a subset of persona capabilities.
|
|
6. Assert every phase using the role has risk <= persona `maxRiskLevel`.
|
|
7. Expand roles with `count > 1` into `roleId#0`, `roleId#1`, etc.
|
|
8. Enforce diversity rules after expansion.
|
|
9. Compute and persist `binding_hash` per role instance.
|
|
|
|
`autoSelect` is deterministic. Sort candidates by:
|
|
|
|
1. role `preferredBackends` order.
|
|
2. `persona.version desc`.
|
|
3. `persona.name asc`.
|
|
4. `persona.hash asc`.
|
|
|
|
Personas whose backend is not in `preferredBackends` are eligible only if all preferred-backend personas fail capability or risk checks.
|
|
|
|
Binding fails with `human_required:no_eligible_persona` if no persona satisfies requirements.
|
|
|
|
### 7.5 Seeding
|
|
|
|
Personas:
|
|
|
|
- `docs/schemas/personas/<name>@<version>.yaml`
|
|
- filename encodes immutable identity.
|
|
- loader parses with Persona schema.
|
|
- loader computes `personaHash`.
|
|
- loader upserts keyed by `(name, version)`.
|
|
- hash mismatch on an existing row is fatal.
|
|
|
|
Templates:
|
|
|
|
- `docs/schemas/templates/<name>@<version>.yaml`
|
|
- same immutable version rule.
|
|
|
|
Deleting a published file is allowed only when no run references that hash.
|
|
|
|
## 8. Session Runtime
|
|
|
|
### 8.1 SessionAdapter Interface
|
|
|
|
```ts
|
|
export interface SessionAdapter {
|
|
start(input: StartInput): Promise<SessionHandle>;
|
|
sendPrompt(handle: SessionHandle, envelope: PromptEnvelope): Promise<{ promptId: string }>;
|
|
probe(handle: SessionHandle): Promise<ProbeResult>;
|
|
resume(handle: SessionHandle): Promise<SessionHandle>;
|
|
rebootstrap(handle: SessionHandle): Promise<SessionHandle>;
|
|
capture(handle: SessionHandle, fromSeq: bigint): AsyncIterable<TranscriptChunk>;
|
|
dispose(handle: SessionHandle): Promise<void>;
|
|
}
|
|
|
|
export interface StartInput {
|
|
runId: string;
|
|
roleId: string;
|
|
backend: Backend;
|
|
cwd: string;
|
|
expectedArtifactPath?: string;
|
|
expectedSchema?: string;
|
|
envelopePrelude?: string;
|
|
}
|
|
|
|
export interface SessionHandle {
|
|
sessionId: string;
|
|
pid?: number;
|
|
tmuxSession?: string;
|
|
tmuxWindow?: string;
|
|
}
|
|
|
|
export interface ProbeResult {
|
|
alive: boolean;
|
|
paneActive: boolean;
|
|
lastOutputAt?: Date;
|
|
hint?: string;
|
|
}
|
|
|
|
export interface TranscriptChunk {
|
|
seq: bigint;
|
|
content: string;
|
|
capturedAt: Date;
|
|
}
|
|
```
|
|
|
|
### 8.2 Session State Machine
|
|
|
|
- `CREATED -> BOOTSTRAPPING -> READY`
|
|
- `READY <-> BUSY`
|
|
- `BUSY -> WAITING_FOR_APPROVAL`
|
|
- `BUSY -> ARTIFACT_TIMEOUT`
|
|
- `BUSY -> HUNG`
|
|
- `BUSY -> CRASHED`
|
|
- `HUNG | CRASHED | ARTIFACT_TIMEOUT -> RESUMING -> READY`
|
|
- `RESUMING -> REBOOTSTRAPPED -> READY`
|
|
- exhausted errors -> `FAILED_NEEDS_HUMAN`
|
|
|
|
### 8.3 Recovery Counters
|
|
|
|
- `sendPrompt` retry: 2.
|
|
- `resume` retry: 2.
|
|
- `rebootstrap` retry: 1.
|
|
- artifact repair retry: 1.
|
|
- max hung time: configurable; default 20 minutes.
|
|
|
|
Exhaustion creates a human gate with `recoveryHint`.
|
|
|
|
### 8.4 SessionManager Singleton
|
|
|
|
- M4: hosted in `apps/api`.
|
|
- M5+: hosted in `apps/worker`.
|
|
- Only SessionManager may call mutating `SessionAdapter` methods.
|
|
- Holds in-memory `Map<sessionId, SessionHandle>`.
|
|
- Takes `pg_advisory_lock(hash64('devflow:session-manager'))`.
|
|
- Second instance exits code `3`.
|
|
- On start:
|
|
- query non-terminal `tui_sessions`.
|
|
- call `adapter.resume(handle)`.
|
|
- success: place handle in map.
|
|
- failure: session -> `FAILED_NEEDS_HUMAN`, append `session.failed`, create recovery gate.
|
|
- On SIGTERM/SIGINT:
|
|
- refuse new prompts.
|
|
- allow in-flight artifact polling up to 30s.
|
|
- persist `last_capture_seq`.
|
|
- release advisory lock.
|
|
|
|
## 9. Prompt Envelope
|
|
|
|
### 9.1 Wire Format
|
|
|
|
```text
|
|
DEVFLOW_PROMPT_BEGIN <uuid>
|
|
Run: <run-id>
|
|
Role: <role-id>
|
|
Phase: <phase-key>
|
|
Attempt: <int>
|
|
Expected artifact: <absolute-path>
|
|
Expected schema: <schema-id>
|
|
Dedup-Key: <prompt-hash>
|
|
Instructions:
|
|
<freeform multi-line instructions>
|
|
DEVFLOW_PROMPT_END <uuid>
|
|
```
|
|
|
|
### 9.2 Schema
|
|
|
|
```ts
|
|
const PromptEnvelope = z.object({
|
|
uuid: z.string().uuid(),
|
|
runId: z.string().uuid(),
|
|
roleId: z.string(),
|
|
phaseKey: z.string(),
|
|
attempt: z.number().int().nonnegative(),
|
|
expectedArtifact: z.string(),
|
|
expectedSchema: z.string(),
|
|
dedupKey: z.string(),
|
|
instructions: z.string(),
|
|
});
|
|
```
|
|
|
|
### 9.3 Rules
|
|
|
|
- Prompt identity is `dedupKey`.
|
|
- Adapter refuses duplicate `dedupKey` for the same session within a run lifetime.
|
|
- `attempt` increments only when the engine intentionally re-sends after timeout or repair.
|
|
- Adapter-level retry does not increment attempt.
|
|
- Completion is never inferred from transcript text.
|
|
- Completion requires a schema-valid artifact.
|
|
|
|
### 9.4 Backend Prelude
|
|
|
|
Sent once at session bootstrap before the first envelope.
|
|
|
|
Required structure:
|
|
|
|
1. Backend identity statement.
|
|
2. Persona `instructionsPrelude`.
|
|
3. Protocol declaration: completion is signaled only by writing expected artifact files.
|
|
4. Envelope marker contract.
|
|
5. Approval/probe contract: `DEVFLOW_PROBE` must respond with one line `READY` or `BUSY <reason>`.
|
|
|
|
Codex and Claude-specific addenda live in `packages/session/src/profiles/{codex,claude}.ts` and are populated at M10.
|
|
|
|
## 10. Artifact Schema Registry
|
|
|
|
### 10.1 Layout
|
|
|
|
JSON Schema 2020-12 documents live at:
|
|
|
|
```text
|
|
docs/schemas/artifacts/<schema_id>.json
|
|
```
|
|
|
|
`schema_id` format:
|
|
|
|
```text
|
|
<domain>/<name>@<version>
|
|
```
|
|
|
|
Examples:
|
|
|
|
- `dev/spec@1`
|
|
- `dev/phase-plan@1`
|
|
- `dev/dag@1`
|
|
- `dev/review-finding-batch@1`
|
|
- `bt/objective@1`
|
|
- `bt/iteration-result@1`
|
|
- `common/final-report@1`
|
|
|
|
### 10.2 Loader
|
|
|
|
`packages/core/src/artifact-schema.ts` exports:
|
|
|
|
```ts
|
|
function loadSchema(id: string): JsonSchema;
|
|
function validateArtifact(
|
|
id: string,
|
|
data: unknown
|
|
): { ok: true } | { ok: false; errors: ValidationError[] };
|
|
```
|
|
|
|
Unknown schema id is fatal.
|
|
|
|
### 10.3 Validation Flow
|
|
|
|
1. Engine waits for `expectedArtifactPath` to appear.
|
|
2. Debounce 500ms after last `mtime` change.
|
|
3. Read file.
|
|
4. Compute SHA-256.
|
|
5. Validate against `expectedSchema`.
|
|
6. Valid:
|
|
- insert artifact row with `valid=true`.
|
|
- append `artifact.validated`.
|
|
- advance phase.
|
|
7. Invalid:
|
|
- insert artifact row with `valid=false`.
|
|
- append `artifact.invalid`.
|
|
- trigger one repair prompt.
|
|
- after repair exhaustion, create human gate.
|
|
8. Timeout:
|
|
- append `artifact.timeout`.
|
|
- probe session.
|
|
- enter recovery flow.
|
|
|
|
### 10.4 Final Report
|
|
|
|
At terminal run state, write atomically:
|
|
|
|
- `<WORKSPACE_ROOT>/<runId>/<runId>.report.md`
|
|
- `<WORKSPACE_ROOT>/<runId>/<runId>.report.json`
|
|
|
|
Both are written even on `failed` or `aborted`, best-effort.
|
|
|
|
`common/final-report@1` minimum fields:
|
|
|
|
- `runId`
|
|
- `templateHash`
|
|
- `bindings[]`
|
|
- `inputs`
|
|
- `phases[]`
|
|
- `approvals[]`
|
|
- `findings[]`
|
|
- `commands[]`
|
|
- `artifacts[]`
|
|
- `events.tail`
|
|
- `unresolved[]`
|
|
- `endedAt`
|
|
- `status`
|
|
|
|
### 10.5 Backtest Objective Stub
|
|
|
|
`bt/objective@1`:
|
|
|
|
```json
|
|
{
|
|
"targets": [
|
|
{ "metric": "sharpe", "op": "gte", "value": 1.5, "weight": 1.0 },
|
|
{ "metric": "mdd", "op": "lte", "value": 0.15, "weight": 1.0 }
|
|
],
|
|
"stopWhen": "all"
|
|
}
|
|
```
|
|
|
|
- `op`: `gte` | `lte` | `eq` | `gt` | `lt`
|
|
- `stopWhen`: `all` | `weighted`
|
|
- `weighted` threshold is hardcoded at 0.8 at M12.
|
|
- Full DSL deferred to M12.
|
|
|
|
## 11. Run Events
|
|
|
|
Closed event types:
|
|
|
|
```text
|
|
run.created
|
|
run.started
|
|
run.paused
|
|
run.resumed
|
|
run.completed
|
|
run.failed
|
|
run.aborted
|
|
phase.started
|
|
phase.completed
|
|
phase.failed
|
|
phase.skipped
|
|
prompt.sent
|
|
prompt.repaired
|
|
artifact.expected
|
|
artifact.validated
|
|
artifact.invalid
|
|
artifact.timeout
|
|
approval.requested
|
|
approval.resolved
|
|
session.created
|
|
session.ready
|
|
session.busy
|
|
session.idle
|
|
session.crashed
|
|
session.recovered
|
|
session.failed
|
|
command.started
|
|
command.completed
|
|
command.failed
|
|
review.batch_recorded
|
|
finding.verifier_resolved
|
|
backtest.iteration_started
|
|
backtest.iteration_completed
|
|
backtest.objective_evaluated
|
|
```
|
|
|
|
### 11.1 Idempotency Keys
|
|
|
|
Every event append requires deterministic `idempotency_key`.
|
|
|
|
| Event family | Key formula |
|
|
|---|---|
|
|
| `run.created`, `run.started`, `run.completed`, `run.failed`, `run.aborted` | `<type>:<run_id>` |
|
|
| `run.paused` | `run.paused:<run_id>:<cause>` |
|
|
| `run.resumed` | `run.resumed:<run_id>:<cause>` |
|
|
| `phase.started`, `phase.completed`, `phase.failed`, `phase.skipped` | `<type>:<phase_id>:<phase_attempt>` |
|
|
| `prompt.sent`, `prompt.repaired` | `<type>:<prompt_dedup_key>` |
|
|
| `artifact.expected`, `artifact.timeout` | `<type>:<phase_id>:<phase_attempt>:<expected_path>` |
|
|
| `artifact.validated`, `artifact.invalid` | `<type>:<phase_id>:<expected_path>:<artifact_hash>` |
|
|
| `approval.requested` | `approval.requested:<approval_idempotency_key>` |
|
|
| `approval.resolved` | `approval.resolved:<approval_request_id>:<action>` |
|
|
| `session.created`, `session.failed` | `<type>:<session_id>` |
|
|
| `session.busy`, `session.idle` | `<type>:<session_id>:<prompt_dedup_key>` |
|
|
| `session.ready`, `session.crashed`, `session.recovered` | `<type>:<session_id>:<recovery_attempts>` |
|
|
| `command.started`, `command.completed`, `command.failed` | `<type>:<command_id>` |
|
|
| `review.batch_recorded` | `review.batch_recorded:<phase_id>:<reviewer_role>:<phase_attempt>` |
|
|
| `finding.verifier_resolved` | `finding.verifier_resolved:<finding_id>` |
|
|
| `backtest.iteration_started`, `backtest.iteration_completed`, `backtest.objective_evaluated` | `<type>:<iteration_id>` |
|
|
|
|
Definitions:
|
|
|
|
- `phase_attempt` is incremented before event append.
|
|
- `recovery_attempts` is incremented before event append.
|
|
- `prompt_dedup_key` is the envelope dedup key.
|
|
- `approval_idempotency_key` is from `approval_requests`.
|
|
- Artifact expected/timeout events are per-attempt.
|
|
- Artifact validated/invalid events are content-keyed by path + hash.
|
|
|
|
## 12. Fake Session Adapter
|
|
|
|
### 12.1 Behavior
|
|
|
|
- Deterministic.
|
|
- In-process.
|
|
- No PTY.
|
|
- No tmux.
|
|
- Drives engine end-to-end without real backends.
|
|
|
|
### 12.2 Sentinel Triggers
|
|
|
|
On `sendPrompt`, inspect `expectedSchema`.
|
|
|
|
Fixture path:
|
|
|
|
```text
|
|
tests/fixtures/fake-artifacts/<expectedSchema>/<scenarioName>.json
|
|
```
|
|
|
|
`scenarioName` comes from instruction header:
|
|
|
|
```text
|
|
Scenario: <name>
|
|
```
|
|
|
|
Default scenario: `ok`.
|
|
|
|
Scenarios:
|
|
|
|
- `ok`: write fixture to `expectedArtifactPath` after 50ms by default.
|
|
- `invalid`: write deliberately schema-invalid payload.
|
|
- `timeout`: never write.
|
|
- `crash`: throw `RecoverableError`.
|
|
|
|
### 12.3 Transcript
|
|
|
|
Fake adapter emits chunks such as:
|
|
|
|
```text
|
|
[fake] received prompt <uuid>; will write <path> in 50ms
|
|
```
|
|
|
|
## 13. State Machines
|
|
|
|
### 13.1 Run State
|
|
|
|
States:
|
|
|
|
- `created`
|
|
- `bound`
|
|
- `planning`
|
|
- `awaiting_approval`
|
|
- `executing`
|
|
- `paused`
|
|
- `completed`
|
|
- `failed`
|
|
- `aborted`
|
|
|
|
Transitions:
|
|
|
|
| From | Trigger | To | Side effects |
|
|
|---|---|---|---|
|
|
| `created` | `lockBindings ok` | `bound` | persist bindings; emit `run.started` |
|
|
| `created` | `lockBindings fail` | `failed` | emit `run.failed` |
|
|
| `bound` | phase plan needed | `planning` | emit `phase.started` |
|
|
| `planning` | plan artifact valid | `awaiting_approval` | request approval |
|
|
| `awaiting_approval` | approve | `executing` | emit `approval.resolved`, `run.resumed` |
|
|
| `awaiting_approval` | reject | `failed` | emit `run.failed` |
|
|
| `awaiting_approval` | request_changes | `planning` | increment phase attempts |
|
|
| `awaiting_approval` | timeout | `paused` | set `paused_from_state='awaiting_approval'` |
|
|
| `executing` | phase ok, more phases | `executing` | next phase |
|
|
| `executing` | phase needs gate | `awaiting_approval` | request gate |
|
|
| `executing` | all phases done | `completed` | emit `run.completed`, write final report |
|
|
| `executing` | unrecoverable error | `failed` | emit `run.failed` |
|
|
| `executing` | manual `pauseRun` | `paused` | set `paused_from_state='executing'` |
|
|
| `planning` | manual `pauseRun` | `paused` | set `paused_from_state='planning'` |
|
|
| `paused` | resume | `paused_from_state` | emit `run.resumed`, clear `paused_from_state` |
|
|
| any non-terminal state | `abortRun` | `aborted` | emit `run.aborted`, dispose sessions |
|
|
|
|
Non-terminal states for `abortRun`:
|
|
|
|
- `created`
|
|
- `bound`
|
|
- `planning`
|
|
- `awaiting_approval`
|
|
- `executing`
|
|
- `paused`
|
|
|
|
### 13.2 Run Phase State
|
|
|
|
States:
|
|
|
|
- `pending`
|
|
- `running`
|
|
- `awaiting_artifact`
|
|
- `validating`
|
|
- `awaiting_approval`
|
|
- `completed`
|
|
- `failed`
|
|
- `skipped`
|
|
|
|
Transitions:
|
|
|
|
| From | Trigger | To |
|
|
|---|---|---|
|
|
| `pending` | start | `running` |
|
|
| `running` | prompt sent, artifact expected | `awaiting_artifact` |
|
|
| `awaiting_artifact` | artifact appears | `validating` |
|
|
| `awaiting_artifact` | timeout | `running` after probe/repair, or `failed` after exhaustion |
|
|
| `validating` | valid | `awaiting_approval` if gate, else `completed` |
|
|
| `validating` | invalid | `running` after one repair, else `failed` |
|
|
| `awaiting_approval` | approve | `completed` |
|
|
| `awaiting_approval` | reject / abort | `failed` |
|
|
| `awaiting_approval` | request_changes | `running`, attempt + 1 |
|
|
|
|
## 14. Approval State
|
|
|
|
States:
|
|
|
|
- `pending`
|
|
- `approved`
|
|
- `rejected`
|
|
- `changes_requested`
|
|
- `aborted`
|
|
- `paused`
|
|
|
|
### 14.1 Transitions
|
|
|
|
| From | Event | To | Side effects |
|
|
|---|---|---|---|
|
|
| `pending` | approve decision | `approved` | insert decision row |
|
|
| `pending` | reject decision | `rejected` | insert decision row; run -> `failed` |
|
|
| `pending` | request_changes decision | `changes_requested` | insert decision row; increment attempt |
|
|
| `pending` | abort decision | `aborted` | insert decision row; run -> `aborted` |
|
|
| `pending` | timeout | `paused` | run -> `paused`; no decision row |
|
|
| `paused` | unpause | `pending` | re-arm gate; no decision row |
|
|
| terminal states | any decision | unchanged | return 409 |
|
|
|
|
Rules:
|
|
|
|
- A `pending` request can transition to one non-pending state per pending epoch.
|
|
- Terminal approval states reject further decisions.
|
|
- `paused` may return to `pending` only through `unpause`.
|
|
- Manual pause is run-level `pauseRun`; it leaves approval gate in `pending`.
|
|
- Only `approve`, `reject`, `request_changes`, and `abort` create `approval_decisions` rows.
|
|
- Default timeout is null.
|
|
- Timeout never auto-approves or auto-rejects.
|
|
|
|
### 14.2 Decision Idempotency
|
|
|
|
- GUI:
|
|
- UUIDv4 per click.
|
|
- reused across automatic UI retries for the same logical action.
|
|
- CLI:
|
|
- UUIDv4 per invocation.
|
|
- `--client-token=<uuid>` override for scripted retry.
|
|
- API:
|
|
- existing `(approval_request_id, action, client_token)` returns existing row with status 200.
|
|
- new decision inserts row and returns 201.
|
|
- same token with different action returns 409.
|
|
- decision on non-pending request returns 409.
|
|
|
|
### 14.3 Destructive Command Enforcement
|
|
|
|
Devflow-direct commands have hard enforcement. TUI-agent commands have best-effort enforcement.
|
|
|
|
Hard-blocked Devflow-direct patterns:
|
|
|
|
- `rm -rf`
|
|
- `git reset --hard`
|
|
- `git clean`
|
|
- `git push --force`
|
|
- `git push --force-with-lease`
|
|
- `git worktree remove --force`
|
|
- `git branch -D`
|
|
- `docker volume rm`
|
|
- `docker compose down -v`
|
|
- `DROP DATABASE`
|
|
- `DROP SCHEMA`
|
|
- migration rollback
|
|
- reads/writes touching `.env*`, `~/.ssh/`, `~/.aws/`, `~/.config/gcloud/`, `~/.kube/`
|
|
- files matching `*token*`, `*secret*`, `*credentials*`, `*.pem`, `*.key`
|
|
|
|
TUI-agent command enforcement is best-effort:
|
|
|
|
1. Prelude prohibits destructive operations.
|
|
2. Backend permission mode is set to safest available mode.
|
|
3. Transcript audit captures post-hoc evidence.
|
|
4. Human intervention goes through `devflow attach`.
|
|
5. Worktrees and branches are preserved by default.
|
|
|
|
v1 does not claim real-time blocking of TUI-internal commands.
|
|
|
|
## 15. Run Engine and Temporal Contract
|
|
|
|
The M4 `RunEngine` contract is frozen before M5. M5 reimplements the same interface through Temporal.
|
|
|
|
### 15.1 Public API
|
|
|
|
```ts
|
|
interface RunEngine {
|
|
startRun(input: RunStartInput): Promise<{ runId: string }>;
|
|
signalApproval(
|
|
runId: string,
|
|
approvalRequestId: string,
|
|
action: ApprovalDecisionAction,
|
|
clientToken: string,
|
|
comment?: string
|
|
): Promise<void>;
|
|
pauseRun(runId: string): Promise<void>;
|
|
resumeRun(runId: string): Promise<void>;
|
|
abortRun(runId: string, reason: string): Promise<void>;
|
|
getStatus(runId: string): Promise<RunStatus>;
|
|
}
|
|
```
|
|
|
|
### 15.2 Temporal Shape
|
|
|
|
- Namespace: `devflow`.
|
|
- Task queue: `devflow-runs`.
|
|
- Single worker process: `apps/worker`.
|
|
- Workflow: `runWorkflow(input: RunStartInput)`.
|
|
- Signals:
|
|
- `approve`
|
|
- `pause`
|
|
- `resume`
|
|
- `abort`
|
|
- `unpause`
|
|
- No Updates in M5.
|
|
- Status is read from DB.
|
|
|
|
Activities:
|
|
|
|
- `lockBindings(input)`
|
|
- `generatePhasePlan(runId, phaseKey, attempt)`
|
|
- `sendPromptToSession(sessionId, envelope)`
|
|
- `waitForArtifact(sessionId, expectedPath, expectedSchema, timeoutMs)`
|
|
- `validateArtifact(artifactPath, expectedSchema)`
|
|
- `recordEvent(runId, type, payload)`
|
|
- `requestApproval(runId, gateKey, phaseId, payload, idempotencyKey)`
|
|
- `runCommand(kind, argv, cwd, env)`
|
|
- `composeFinalReport(runId)`
|
|
|
|
Retry policy:
|
|
|
|
- Default: max attempts 3, exponential backoff start 1s, max 30s.
|
|
- `requestApproval`: max attempts 1.
|
|
- `composeFinalReport`: max attempts 1.
|
|
- `sendPromptToSession`: max attempts 2; further retry belongs to engine recovery.
|
|
|
|
### 15.3 Hard Constraints
|
|
|
|
- Workflow code holds only serializable state.
|
|
- No tmux handles in workflow state.
|
|
- No PTY refs in workflow state.
|
|
- No DB clients in workflow state.
|
|
- M5+ session interaction happens through activities calling SessionManager in `apps/worker`.
|
|
- M5+ API never calls mutating `SessionAdapter` methods.
|
|
- SessionManager advisory lock prevents API/worker ownership conflict during M4 -> M5 transition.
|
|
- Workflow code uses deterministic clock/randomness only.
|
|
|
|
## 16. WriteSet and Worktree
|
|
|
|
### 16.1 WriteSet
|
|
|
|
- Each task declares `writeSet: string[]`.
|
|
- Patterns are relative to repo root.
|
|
- Glob engine: `fast-glob`.
|
|
- Options:
|
|
|
|
```ts
|
|
{
|
|
cwd: worktreeRoot,
|
|
dot: true,
|
|
followSymbolicLinks: false,
|
|
onlyFiles: true,
|
|
suppressErrors: false
|
|
}
|
|
```
|
|
|
|
Conflict detection:
|
|
|
|
1. Expand writeSets.
|
|
2. Forbidden globs cause conflict if matched by more than one task:
|
|
- `pnpm-lock.yaml`
|
|
- `package-lock.json`
|
|
- `**/migrations/**`
|
|
- `**/*.generated.*`
|
|
- root `tsconfig*.json`
|
|
- `biome.json`
|
|
- `lefthook.yml`
|
|
- `.github/**`
|
|
- `.gitlab-ci.yml`
|
|
3. Pairwise file intersections must be empty.
|
|
|
|
Conflict creates `parallel_dag_approved` gate.
|
|
|
|
### 16.2 Worktree Lifecycle
|
|
|
|
- Worktree root:
|
|
- `WORKSPACE_ROOT/<runId>/<laneId>`
|
|
- non-parallel main lane: `WORKSPACE_ROOT/<runId>/main`
|
|
- Created via `git worktree add`.
|
|
- Branch name:
|
|
|
|
```text
|
|
devflow/<runId>/<laneId>
|
|
```
|
|
|
|
- Terminal run state does not remove worktrees or branches.
|
|
- Output branches are deliverables.
|
|
- Disk growth is accepted.
|
|
- Cleanup is manual:
|
|
|
|
```bash
|
|
devflow cleanup <run-id> [--lane=<id>]
|
|
```
|
|
|
|
Cleanup:
|
|
|
|
- uses `git worktree remove` without `--force` by default.
|
|
- refuses dirty worktrees.
|
|
- `--force` requires an additional gate.
|
|
- `git branch -D` is destructive and gated.
|
|
- `doctor --list-orphans` lists only; it never removes.
|
|
|
|
## 17. SSE Contract
|
|
|
|
Endpoints:
|
|
|
|
- `GET /sse/runs/:runId`
|
|
- `GET /sse/global`
|
|
|
|
Heartbeat every 15 seconds.
|
|
|
|
Events:
|
|
|
|
| Event | Scope |
|
|
|---|---|
|
|
| `run.state_changed` | both |
|
|
| `run.event_appended` | run |
|
|
| `phase.state_changed` | run |
|
|
| `approval.created` | both |
|
|
| `approval.resolved` | both |
|
|
| `session.state_changed` | run |
|
|
| `transcript.chunk_appended` | run |
|
|
| `artifact.validated` | run |
|
|
|
|
Reconnect:
|
|
|
|
- `Last-Event-ID` is last `run_events.seq`.
|
|
- server replays `seq > lastSeq`.
|
|
- non-run-event SSE types are not replayed; state is re-derived by fetch.
|
|
|
|
## 18. Errors
|
|
|
|
`packages/core/src/errors.ts`:
|
|
|
|
```ts
|
|
type ErrorClass = 'recoverable' | 'human_required' | 'fatal';
|
|
|
|
class DevflowError extends Error {
|
|
readonly class: ErrorClass;
|
|
readonly code: string;
|
|
readonly runId?: string;
|
|
readonly phaseId?: string;
|
|
readonly recoveryHint?: string;
|
|
readonly cause?: unknown;
|
|
}
|
|
```
|
|
|
|
Recoverable:
|
|
|
|
- `network_blip`
|
|
- `pane_briefly_unresponsive`
|
|
- `prompt_send_transient`
|
|
- `db_serialization_retry`
|
|
|
|
Human required:
|
|
|
|
- `artifact_invalid_after_repair`
|
|
- `artifact_timeout_exhausted`
|
|
- `destructive_command_blocked`
|
|
- `secret_access_blocked`
|
|
- `backend_unavailable`
|
|
- `no_eligible_persona`
|
|
- `writeset_conflict`
|
|
- `merge_conflict`
|
|
- `objective_not_met`
|
|
- `review_dispute_unresolved`
|
|
|
|
Fatal:
|
|
|
|
- `db_unreachable`
|
|
- `workspace_permissions`
|
|
- `internal_state_corruption`
|
|
- `template_load_failed`
|
|
- `migration_pending`
|
|
- `config_invalid`
|
|
|
|
Mapping:
|
|
|
|
- recoverable -> retry; exhausted -> human_required.
|
|
- human_required -> run paused and gate created.
|
|
- fatal -> run failed, sessions disposed, final report best-effort.
|
|
|
|
## 19. Concurrent Runs and Crash Recovery
|
|
|
|
### 19.1 Active Run Uniqueness
|
|
|
|
- `MAX_CONCURRENT_RUNS`, default 4.
|
|
- DB partial unique index is the source of truth:
|
|
- one active run per `(repo_path, base_branch)`.
|
|
- `repo_path` is canonicalized before insert.
|
|
- Advisory lock is auxiliary only:
|
|
|
|
```text
|
|
pg_try_advisory_xact_lock(hash64('devflow:start-run', repoPath, baseBranch))
|
|
```
|
|
|
|
- Unique-index violation returns:
|
|
|
|
```json
|
|
{ "currentRunId": "...", "currentState": "..." }
|
|
```
|
|
|
|
with HTTP 409.
|
|
|
|
### 19.2 Crash Recovery
|
|
|
|
M4, no Temporal:
|
|
|
|
- On `apps/api` startup, sweep non-terminal runs.
|
|
- Mark them `failed`.
|
|
- `final_report_path = null`.
|
|
- Append synthesized `run.failed` with reason `process_restart_unrecovered`.
|
|
- Cascade associated `tui_sessions` to `FAILED_NEEDS_HUMAN`.
|
|
- Append `session.failed`.
|
|
- This frees active-run uniqueness slots.
|
|
|
|
M5+:
|
|
|
|
- No sweep.
|
|
- Temporal durability owns in-flight workflow recovery.
|
|
- SessionManager resumes tmux sessions.
|
|
- Active-run partial index blocks duplicate runs until completion or explicit abort.
|
|
|
|
## 20. Milestones
|
|
|
|
### M1: Monorepo + Postgres + CLI Doctor
|
|
|
|
- Scaffold workspace.
|
|
- Add pnpm, tsconfig, biome, lefthook, Vitest.
|
|
- Add Docker Compose for Postgres.
|
|
- Add Drizzle and first migration.
|
|
- Add `devflow doctor`.
|
|
- Implement checks 1-9.
|
|
- Stub checks 10-12 as warn where needed.
|
|
- Add SSE compatibility smoke test:
|
|
- minimal Fastify 5 server.
|
|
- `fastify-sse-v2` plugin.
|
|
- 30-second integration test.
|
|
- receive 3 events and reconnect.
|
|
- if plugin fails, implement native `reply.raw` SSE helper before M1 is green.
|
|
|
|
### M2: Core Schema + Registry + Binding
|
|
|
|
- Implement enums.
|
|
- Implement canonical hashing.
|
|
- Implement Template schema.
|
|
- Implement Persona schema.
|
|
- Implement seed loader.
|
|
- Implement binding algorithm.
|
|
- Implement artifact schema registry.
|
|
- Add first schemas:
|
|
- `dev/spec@1`
|
|
- `dev/phase-plan@1`
|
|
- `common/final-report@1`
|
|
- Tests:
|
|
- schema validation.
|
|
- override semantics.
|
|
- risk enforcement.
|
|
- diversity enforcement.
|
|
- deterministic auto-select.
|
|
|
|
### M3: Fake Session Runtime
|
|
|
|
- Implement `SessionAdapter`.
|
|
- Implement `FakeSessionAdapter`.
|
|
- Implement prompt envelope.
|
|
- Implement event recorder.
|
|
- Implement fake sentinel scenarios.
|
|
- Persist transcript chunks.
|
|
- Tests:
|
|
- prompt correlation.
|
|
- artifact validation.
|
|
- invalid artifact.
|
|
- timeout.
|
|
- fake crash.
|
|
|
|
### M4: Minimal Run Engine
|
|
|
|
- Implement `packages/run-engine`.
|
|
- Used directly by `apps/api`.
|
|
- No Temporal.
|
|
- Supports:
|
|
- start run.
|
|
- lock bindings.
|
|
- approval.
|
|
- fake prompt.
|
|
- artifact wait/validate.
|
|
- final report.
|
|
- Freeze the `RunEngine` contract.
|
|
- Full fake `development@1` minus reviewers.
|
|
|
|
### M5: Temporal Integration
|
|
|
|
- Reimplement `RunEngine` through Temporal.
|
|
- Preserve M4 behavior.
|
|
- Add parity tests using the same M4 scenarios.
|
|
- M5+ SessionManager lives in `apps/worker`.
|
|
|
|
### M6: Real tmux SessionManager
|
|
|
|
- Implement `TmuxSessionAdapter`.
|
|
- Decoupled from M5.
|
|
- May begin after M3 is stable.
|
|
- Pre-M5 real tmux is opt-in smoke only.
|
|
- Production run path remains fake until both M5 and M6 are green.
|
|
|
|
### M7: TUI Recovery State Machine
|
|
|
|
- Implement session state transitions.
|
|
- Implement recovery counters.
|
|
- Implement escalation to human gates.
|
|
|
|
### M8: API + GUI Minimum
|
|
|
|
- Implement Fastify routes.
|
|
- Implement SSE.
|
|
- Implement GUI screens:
|
|
- Dashboard.
|
|
- Templates.
|
|
- Personas.
|
|
- New Run.
|
|
- Run Detail.
|
|
- Approvals.
|
|
- TUI Sessions.
|
|
|
|
### M9: `development@1` Fake-Agent Full Run
|
|
|
|
- Add curated `development@1`.
|
|
- Add review consensus.
|
|
- Add verifier flow with fake reviewers.
|
|
- Add coverage gate >=70% lines for core/session/run-engine.
|
|
|
|
### M10: Codex/Claude Opt-In Real Run
|
|
|
|
- Implement profiles:
|
|
- `packages/session/src/profiles/codex.ts`
|
|
- `packages/session/src/profiles/claude.ts`
|
|
- Real backends become production-default only after both M5 and M6 are green.
|
|
- Until then real tmux/Codex/Claude are developer-flagged opt-in smoke only.
|
|
|
|
### M11: Parallel Lanes
|
|
|
|
- Add task DAG scheduler.
|
|
- Add writeSet detection.
|
|
- Add per-lane worktrees.
|
|
- Add merge coordinator.
|
|
- Add conflict gates.
|
|
|
|
### M12: Backtest Workflow
|
|
|
|
- Add `backtest-strategy@1`.
|
|
- Add objective evaluator.
|
|
- Add metric parser extension points.
|
|
- Add failure mining artifacts.
|
|
- Add Backtest Lab GUI.
|
|
|
|
### M13: Template Factory
|
|
|
|
- Generate draft template from natural language and repo discovery.
|
|
- Add harness design.
|
|
- Add template review.
|
|
- Add dry-run and promote flow.
|
|
|
|
## 21. Out of Scope
|
|
|
|
- Authentication.
|
|
- Authorization.
|
|
- Multi-user support.
|
|
- Data retention or archival policy.
|
|
- Observability dashboards.
|
|
- Remote template/persona registries.
|
|
- Multi-machine deployment.
|
|
- HA.
|
|
- Managed backups.
|
|
- Web ingress.
|
|
- TLS.
|
|
- Reverse proxy.
|
|
|
|
## 22. Decision Log
|
|
|
|
### Open Questions Closed
|
|
|
|
| # | Question | Resolution |
|
|
|---|---|---|
|
|
| OQ-1 | Persona/template seeding format | Immutable YAML at `docs/schemas/{personas,templates}/<name>@<version>.yaml` |
|
|
| OQ-2 | Approval timeout default | `null`; timeout freezes only |
|
|
| OQ-3 | Final report format | Markdown and JSON |
|
|
| OQ-4 | Temporal namespace/queue | namespace `devflow`, task queue `devflow-runs` |
|
|
| OQ-5 | WriteSet glob engine | `fast-glob` |
|
|
| OQ-6 | Backtest objective DSL | Stub in M12, full DSL deferred |
|
|
| OQ-7 | Codex/Claude prompt prelude | Structure locked, exact text deferred to M10 |
|
|
|
|
### Blocking Corrections Applied
|
|
|
|
| # | Issue | Resolution |
|
|
|---|---|---|
|
|
| CC-1 | Terminal state deleted worktrees/branches | Preserve by default; manual gated cleanup only |
|
|
| CC-2 | SessionManager location conflict | M4 API, M5+ worker |
|
|
| CC-3 | Event duplicates under retry | `run_events.idempotency_key` |
|
|
| CC-4 | Destructive command enforcement overclaimed | Devflow-direct hard, TUI best-effort |
|
|
| CC-5 | UUID extension missing | `CREATE EXTENSION IF NOT EXISTS pgcrypto` |
|
|
| CC-6 | Advisory lock not enough for active-run uniqueness | partial unique index |
|
|
| CC-7 | Undefined transition sequence in event keys | cause-based keys |
|
|
| CC-8 | Approval paused transition missing | explicit approval transition table |
|
|
| CC-9 | AutoSelect order nondeterministic | deterministic sort |
|
|
| CC-10 | SSE plugin compatibility assumed | M1 smoke + native fallback |
|
|
| CC-11 | ApprovalAction included pause | split `ApprovalDecisionAction`; `pauseRun` is run-level |
|
|
| CC-12 | Artifact hash key collision | include phase id and path |
|
|
| CC-13 | Resume previous state not stored | `runs.paused_from_state` |
|
|
| CC-14 | repo path aliasing | canonical realpath storage |
|
|
| CC-15 | M4 sweep left tmux sessions ambiguous | cascade session state to `FAILED_NEEDS_HUMAN`; real tmux production-default only after M5+M6 |
|
|
| CC-16 | Prompt hash used phaseId but envelope uses phaseKey | prompt hash uses phaseKey |
|
|
| CC-17 | abortRun transition too narrow | abort from any non-terminal run state |
|
|
| CC-18 | approval pending transition wording conflicted with pause epoch | pending can transition once per pending epoch; paused may unpause to pending |
|
|
|
|
### Future Open Questions
|
|
|
|
- FOQ-1, M12: full backtest objective DSL.
|
|
- FOQ-2, M13: template factory generation prompts.
|
|
- FOQ-3, post-M10: optional third backend such as Gemini.
|
|
- FOQ-4, post-M8: WebSocket vs SSE if transcript pressure requires it.
|
|
|
|
## 23. Kickoff Order
|
|
|
|
1. M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
|
|
2. M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
|
|
3. M1.3: `apps/cli` skeleton + `devflow doctor`.
|
|
4. M1.4: `packages/core` skeleton with config, enums, errors, hash, prompt-envelope, run-event types.
|
|
5. M2.1: Zod schemas for Template/Persona, persona YAML loader, hashing.
|
|
6. M2.2: Binding algorithm + tests.
|
|
7. M2.3: Artifact schema registry + first three schemas.
|
|
8. M3.1: `SessionAdapter` interface + `FakeSessionAdapter`.
|
|
9. M3.2: Transcript chunk capture + DB persistence.
|
|
10. M3.3: engine-shaped harness running a single fake phase end-to-end.
|
|
11. M4: assemble run engine; lock contract; full fake `development@1` minus reviewers.
|
|
12. M5 in parallel with M6 once M4 is green.
|