50 KiB
Devflow Implementation Plan v3 r9
0. Document Status
- This document supersedes v2 and all earlier v3 drafts where conflicting.
- Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
- Target OS: macOS 13+ / Linux. No Windows.
- All paths are Unix-style. All times are stored UTC.
- Decisions in this document are locked unless explicitly marked
(provisional). Override requires updating this document, not only code. - r1 applied CC-1 through CC-5.
- r2 applied CC-6 through CC-10.
- r3 applied CC-11 through CC-15.
- r4 applies CC-16 through CC-18.
- r5 applies CC-19.
- r6 applies CC-20.
- r7 applies CC-21 through CC-23.
- r8 applies CC-24 through CC-26.
- r9 applies CC-27 through CC-28.
1. Stack Decisions
1.1 Workspace
pnpm 9with workspaces. No Turbo.- Node 22 LTS, pinned by
.nvmrcandpackage.json#engines. - TypeScript 5.6 with project references via
tsc -b. strict: true.- No
anyunless accompanied by an explicit annotation comment explaining why.
1.2 Tooling
- Build:
tsupfor libraries, CJS + ESM dual output.viteforapps/web.tsxforapps/cli,apps/api, andapps/workerin dev.nodefor prod-ish local runs.
- Test:
vitestwith workspace config.- Coverage via
@vitest/coverage-v8. - No coverage gate at M1.
- M9 adds coverage gate: >=70% lines on
packages/core,packages/session,packages/run-engine.
- Lint/format:
biome.- One root config.
- Pre-commit:
lefthook.- Runs
biome check --writeon staged files. - Runs
tsc -p tsconfig.typecheck.json --noEmit. - Runs related Vitest tests on changed packages.
1.3 Database
- Postgres 16 via Docker Compose.
- Drizzle ORM +
drizzle-kit generate. - Generated SQL migrations are committed.
- Migrations are never auto-applied at runtime except through the explicit migration runner invoked by
devflow up. - Migration runner:
scripts/migrate.ts.- Takes
DATABASE_URL. devflow upwaits for Postgres health and then runs pending migrations.
1.4 Logging
pino.pino-prettyin dev, JSON otherwise.- Standard fields:
timelevelmodulerunId?phaseId?role?eventId?
- Levels:
trace: transcript chunks only.debug: internal state transitions.info: run events.warn: recoverable errors.error: human-required or fatal errors.
1.5 Config
- Single Zod schema in
packages/core/src/config.ts. - Source precedence, high to low:
process.env.env.local.env- schema defaults
- Config is loaded once at process start, validated, frozen, and exported as typed
Config. - Config validation failure is fatal.
- Required keys at M1:
DATABASE_URLWORKSPACE_ROOTLOG_LEVEL
- M5 adds:
TEMPORAL_ADDRESS
- Path canonicalization:
WORKSPACE_ROOTis resolved throughfs.realpathSyncand stored as an absolute path at config load.- Any path entering the system must be canonicalized before storage or hashing.
repo_pathandworktree_rootrules are defined in section 4.
Backend registration:
const BackendConfig = z.object({
id: Backend, // codex | claude | fake
enabled: z.boolean(),
binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude
});
fakeis always available.codexandclaudeare available only when:enabled=true- binary resolves at process start.
- Resolution failure:
doctorwarns.- binding fails fast at run start with
human_required:backend_unavailable.
- Binding reads from
config.backends, never directly fromPATH.
1.6 HTTP
fastify5.@fastify/sensible.- SSE primary strategy:
- Try
fastify-sse-v2. - Fastify 5 compatibility is not assumed.
- M1 includes a smoke test.
- Try
- SSE fallback:
- Native
reply.raw. - Headers:
content-type: text/event-streamcache-control: no-cacheconnection: keep-alive
- Write
data: <json>\n\n. - Manage heartbeats and reconnect manually.
- Native
- WebSocket is deferred unless SSE fails under transcript volume.
2. Directory Layout
devflow/
├── package.json
├── pnpm-workspace.yaml
├── tsconfig.base.json
├── biome.json
├── lefthook.yml
├── vitest.workspace.ts
├── docker-compose.yml
├── .nvmrc
├── .env.example
├── docs/
│ ├── plan.md
│ ├── adr/
│ └── schemas/
│ ├── artifacts/
│ ├── personas/
│ └── templates/
├── scripts/
│ ├── migrate.ts
│ └── seed.ts
├── packages/
│ ├── core/
│ │ └── src/
│ │ ├── config.ts
│ │ ├── enums.ts
│ │ ├── hash.ts
│ │ ├── errors.ts
│ │ ├── template.ts
│ │ ├── persona.ts
│ │ ├── binding.ts
│ │ ├── prompt-envelope.ts
│ │ ├── artifact-schema.ts
│ │ ├── run-event.ts
│ │ └── index.ts
│ ├── db/
│ │ └── src/
│ │ ├── schema/
│ │ ├── migrations/
│ │ ├── repositories/
│ │ └── client.ts
│ ├── session/
│ │ └── src/
│ │ ├── adapter.ts
│ │ ├── fake.ts
│ │ ├── tmux.ts
│ │ ├── profiles/
│ │ │ ├── codex.ts
│ │ │ └── claude.ts
│ │ ├── recovery.ts
│ │ └── transcript.ts
│ ├── harness/
│ │ └── src/
│ │ ├── git.ts
│ │ ├── worktree.ts
│ │ ├── runner.ts
│ │ ├── review.ts
│ │ └── backtest.ts
│ ├── run-engine/
│ │ └── src/
│ │ ├── engine.ts
│ │ ├── phase-executor.ts
│ │ └── approval.ts
│ └── workflows/
│ └── src/
│ ├── workflow.ts
│ └── activities.ts
├── apps/
│ ├── api/
│ ├── web/
│ ├── cli/
│ └── worker/
└── tests/
├── e2e/
└── fixtures/
3. devflow doctor
Exit codes:
0: all green.1: one or more red checks.2: internal or unknown error.
Each check emits:
namestatus:pass|fail|warndetailremediation
Closed check list:
- Node version satisfies
>=22.0.0 <23. - pnpm version
>=9.0.0. tmuxexists, version>=3.3.gitversion>=2.40.- Docker daemon reachable.
- Postgres container running,
pg_isreadyok,DATABASE_URLconnects. - No pending Drizzle migrations.
WORKSPACE_ROOTexists and is writable..envresolves to validConfig.codexinPATH, warn-only.claudeinPATH, warn-only.- Free disk on
WORKSPACE_ROOTpartition:- warn under 10GB.
- fail under 2GB.
- target green threshold: >=5GB.
Output:
- Human table by default.
--jsonfor machine-readable output.--quietprints only nonzero results.--list-orphanslists orphaned worktrees only; it never removes them.
4. Database Schema
First migration prelude:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
All tables use gen_random_uuid() primary keys unless noted. All times are timestamptz. Mutable rows include updated_at. JSON columns use jsonb.
4.1 workflow_templates
id uuid primary key default gen_random_uuid()name text not nullversion int not nullhash text not null uniquedefinition jsonb not nullcreated_at timestamptz not null default now()- unique
(name, version)
4.2 agent_personas
id uuid primary key default gen_random_uuid()name text not nullversion int not nullhash text not null uniquedefinition jsonb not nullcreated_at timestamptz not null default now()- unique
(name, version)
4.3 runs
id uuid primary key default gen_random_uuid()template_id uuid not null references workflow_templates(id)template_hash text not nullstate text not nullrepo_path text not null- canonical absolute path
- resolved through
fs.realpathSyncbefore insert
base_branch text not nullworktree_root text not null- canonical absolute path under
WORKSPACE_ROOT/<runId>/
- canonical absolute path under
current_phase_id uuid references run_phases(id)nullable and deferrablestarted_at timestamptzended_at timestamptzfinal_report_path textpaused_from_state text- set when transitioning to
paused - cleared on resume
- null when state is not
paused
- set when transitioning to
created_at timestamptz not null default now()updated_at timestamptz
Active-run uniqueness:
CREATE UNIQUE INDEX ux_active_run_repo_base
ON runs (repo_path, base_branch)
WHERE state NOT IN ('completed', 'failed', 'aborted');
4.4 run_inputs
id uuid primary key default gen_random_uuid()run_id uuid not null unique references runs(id) on delete cascaderequirements_md text not nullobjective jsonbextra jsonbinput_hash text not null
input_hash is based on:
requirements_mdobjectiveextra- canonical
repo_path base_branch
4.5 run_bindings
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascaderole_id text not nullpersona_id uuid not null references agent_personas(id)persona_hash text not nullbackend text not nullbinding_hash text not null- unique
(run_id, role_id)
4.6 run_phases
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_key text not nullseq int not nullstate text not nullattempts int not null default 0started_at timestamptzended_at timestamptz- unique
(run_id, phase_key)
4.7 run_events
Append-only.
id bigserial primary keyrun_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)seq bigint not nulltype text not nullpayload jsonb not nullidempotency_key text not nullts timestamptz not null default now()- unique
(run_id, seq) - unique
(run_id, idempotency_key) - index
(run_id, ts)
Concurrency:
- All inserts go through
RunEventRepository.append(). - Raw SQL inserts into
run_eventsare forbidden. append()takespg_advisory_xact_lock(hash64('devflow:run-events', run_id)).- Inside that same transaction it assigns:
seq := COALESCE(MAX(seq), 0) + 1
4.8 approval_requests
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id)phase_id uuid references run_phases(id)gate_key text not nullstate text not nullidempotency_key text not nullpayload jsonb not nullcreated_at timestamptz not null default now()resolved_at timestamptz- unique
(idempotency_key)
4.9 approval_decisions
Append-only and immutable.
id uuid primary key default gen_random_uuid()approval_request_id uuid not null references approval_requests(id)action text not nullapproverejectrequest_changesabort
comment textdecided_at timestamptz not null default now()idempotency_key text not null unique
pause is not an approval decision.
4.10 tui_sessions
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascaderole_id text not nullbackend text not nullcwd text not nullexpected_artifact_path textexpected_schema textlast_prompt_hash textlast_prompt_at timestamptzlast_capture_seq bigint not null default 0last_known_pane_pid inttmux_session texttmux_window textstate text not nullrecovery_attempts int not null default 0- unique
(run_id, role_id)
4.11 tui_transcript_chunks
Append-only.
id bigserial primary keysession_id uuid not null references tui_sessions(id) on delete cascadeseq bigint not nullcontent text not nullcaptured_at timestamptz not null default now()- unique
(session_id, seq)
4.12 artifacts
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)path text not nullschema_id text not nullhash text not nullvalid boolean not nullvalidation_error jsonbcreated_at timestamptz not null default now()- unique
(run_id, path, hash)
4.13 commands
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)kind text not nullgitteste2edoctorbacktestother
argv text[] not nullcwd text not nullexit_code intstdout_path textstderr_path textstarted_at timestamptzended_at timestamptz
4.14 review_findings
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)reviewer_role text not nullseverity text not nullinfolowmediumhighcritical
category text not nullcorrectnessevidencestylesecurityperformanceother
file_path textline intsummary text not nullevidence textverifier_status text not null default 'unverified'unverifiedconfirmedrejected
created_at timestamptz not null default now()
4.15 Backtest Stub Tables
backtest_iterations and backtest_metrics are created at M1 as stub tables:
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadepayload jsonbcreated_at timestamptz not null default now()
Full schema is deferred to M12.
5. Enums
All enums live in packages/core/src/enums.ts as TypeScript const objects and Zod enums.
5.1 Backend
codexclaudefake
Future gemini support adds an enum entry and a BackendProfile; no design change.
5.2 Capability
spec_writephase_planningtask_dag_planningcode_edittest_first_developmentcode_reviewevidence_checkcommand_executebacktest_runmetric_extractfailure_miningobjective_evalfinal_report_compose
5.3 RiskLevel
lowmediumhigh
Risk is declared per phase in the template. Persona has maxRiskLevel. Binding fails when phase.risk > persona.maxRiskLevel.
5.4 ApprovalDecisionAction
approverejectrequest_changesabort
pause is a run-level control operation, not an approval decision.
5.5 ApprovalState
pendingapprovedrejectedchanges_requestedabortedpaused
paused is not an auto-decision.
5.6 RunState
createdboundplanningawaiting_approvalexecutingpausedcompletedfailedaborted
5.7 RunPhaseState
pendingrunningawaiting_artifactvalidatingawaiting_approvalcompletedfailedskipped
5.8 SessionState
CREATEDBOOTSTRAPPINGREADYBUSYWAITING_FOR_APPROVALARTIFACT_TIMEOUTHUNGCRASHEDRESUMINGREBOOTSTRAPPEDFAILED_NEEDS_HUMAN
6. Content-Addressed Hashing
6.1 Canonical JSON
- Object keys sorted lexicographically by UTF-16 code units.
- No insignificant whitespace.
- Strings use standard JSON escaping.
- No Unicode normalization.
- Numbers use shortest round-trippable representation.
- Integers have no decimal point.
- No leading zeros.
- Arrays preserve order.
- No trailing newline.
packages/core/src/hash.ts exports:
canonicalize(value: unknown): string
hash(value: unknown): string
hash() returns sha256hex(canonicalize(value)).
6.2 Hash Subjects
- Template hash:
{ name, version, roles, phases, gates, capabilitiesRequired }
- Persona hash:
{ name, version, capabilities, backend, maxRiskLevel, allowedRoles, promptConfig, modelConfig }
- Binding hash:
{ runId, roleId, templateHash, personaHash, backend, override }
- Run input hash:
{ templateHash, bindings: sorted[bindingHash], requirementsMd, objective, repoPath, baseBranch, extra }
- Prompt hash:
{ runId, roleId, phaseKey, expectedArtifact, expectedSchema, instructions, attempt }
- Artifact hash:
- SHA-256 of file bytes.
Prompt hash uses phaseKey, not phaseId, because PromptEnvelope carries phaseKey.
7. Template, Persona, Binding
7.1 Template Schema
const TemplatePhase = z.object({
key: z.string(),
title: z.string(),
risk: RiskLevel,
roles: z.array(z.string()),
expectedArtifact: z
.object({
path: z.string(),
schema: z.string(),
})
.optional(),
gates: z.array(z.string()).default([]),
timeoutMs: z.number().int().positive().optional(),
});
const TemplateRole = z.object({
id: z.string(),
requiredCapabilities: z.array(Capability),
preferredBackends: z.array(Backend).default([]),
count: z.number().int().min(1).default(1),
diversity: z
.object({
requireDifferentBackends: z.boolean().default(false),
})
.optional(),
});
const Template = z.object({
name: z.string(),
version: z.number().int().positive(),
roles: z.array(TemplateRole),
phases: z.array(TemplatePhase),
defaultGates: z.array(z.string()).default([]),
});
7.2 Persona Schema
const Persona = z.object({
name: z.string(),
version: z.number().int().positive(),
backend: Backend,
capabilities: z.array(Capability),
maxRiskLevel: RiskLevel,
allowedRoles: z.array(z.string()).optional(),
promptConfig: z
.object({
systemPrompt: z.string().optional(),
instructionsPrelude: z.string().optional(),
})
.default({}),
modelConfig: z.record(z.string(), z.unknown()).default({}),
});
7.3 Override Semantics
- Override may swap persona for a role.
- Override may constrain backend to a specific allowed backend.
- Override cannot add capabilities.
- Override cannot raise risk above persona
maxRiskLevel. - Diversity rules apply after override.
- Lock-time validation runs the full binding algorithm.
- On first binding failure, the run does not start.
7.4 Binding Algorithm
For each role:
- Select override persona if present; otherwise run
autoSelect. - Assert backend is enabled in
config.backends. - Assert non-fake backend binary resolved at process start.
- Assert role id is in
allowedRoles, unlessallowedRolesis absent. - Assert required capabilities are a subset of persona capabilities.
- Assert every phase using the role has risk <= persona
maxRiskLevel. - Expand roles with
count > 1intoroleId#0,roleId#1, etc. - Enforce diversity rules after expansion.
- Compute and persist
binding_hashper role instance.
autoSelect is deterministic. Sort candidates by:
- role
preferredBackendsorder. persona.version desc.persona.name asc.persona.hash asc.
Personas whose backend is not in preferredBackends are eligible only if all preferred-backend personas fail capability or risk checks.
Binding fails with human_required:no_eligible_persona if no persona satisfies requirements.
7.5 Seeding
Personas:
docs/schemas/personas/<name>@<version>.yaml- filename encodes immutable identity.
- loader parses with Persona schema.
- loader computes
personaHash. - loader upserts keyed by
(name, version). - hash mismatch on an existing row is fatal.
Templates:
docs/schemas/templates/<name>@<version>.yaml- same immutable version rule.
Deleting a published file is allowed only when no run references that hash.
8. Session Runtime
8.1 SessionAdapter Interface
export interface SessionAdapter {
start(input: StartInput): Promise<SessionHandle>;
sendPrompt(handle: SessionHandle, envelope: PromptEnvelope): Promise<{ promptId: string }>;
probe(handle: SessionHandle): Promise<ProbeResult>;
resume(handle: SessionHandle): Promise<SessionHandle>;
rebootstrap(handle: SessionHandle): Promise<SessionHandle>;
capture(handle: SessionHandle, fromSeq: bigint): AsyncIterable<TranscriptChunk>;
dispose(handle: SessionHandle): Promise<void>;
}
export interface StartInput {
runId: string;
roleId: string;
backend: Backend;
cwd: string;
expectedArtifactPath?: string;
expectedSchema?: string;
envelopePrelude?: string;
}
export interface SessionHandle {
sessionId: string;
pid?: number;
tmuxSession?: string;
tmuxWindow?: string;
}
export interface ProbeResult {
alive: boolean;
paneActive: boolean;
lastOutputAt?: Date;
hint?: string;
}
export interface TranscriptChunk {
seq: bigint;
content: string;
capturedAt: Date;
}
8.2 Session State Machine
CREATED -> BOOTSTRAPPING -> READYREADY <-> BUSYBUSY -> WAITING_FOR_APPROVALBUSY -> ARTIFACT_TIMEOUTBUSY -> HUNGBUSY -> CRASHEDHUNG | CRASHED | ARTIFACT_TIMEOUT -> RESUMING -> READYRESUMING -> REBOOTSTRAPPED -> READY- exhausted errors ->
FAILED_NEEDS_HUMAN
8.3 Recovery Counters
sendPromptretry: 2.- Means one initial send plus two adapter-level retries, three physical send attempts max.
resumeretry: 2.rebootstrapretry: 1.- artifact repair retry: 1.
- max hung time: configurable; default 20 minutes.
Exhaustion creates a human gate with recoveryHint.
8.4 SessionManager Singleton
- M4: hosted in
apps/api. - M5+: hosted in
apps/worker. - Only SessionManager may call mutating
SessionAdaptermethods. - Holds in-memory
Map<sessionId, SessionHandle>. - Takes
pg_advisory_lock(hash64('devflow:session-manager')). - Second instance exits code
3. - On start:
- query non-terminal
tui_sessions. - call
adapter.resume(handle). - success: place handle in map.
- failure: session ->
FAILED_NEEDS_HUMAN, appendsession.failed, create recovery gate.
- query non-terminal
- On SIGTERM/SIGINT:
- refuse new prompts.
- allow in-flight artifact polling up to 30s.
- persist
last_capture_seq. - release advisory lock.
9. Prompt Envelope
9.1 Wire Format
DEVFLOW_PROMPT_BEGIN <uuid>
Run: <run-id>
Role: <role-id>
Phase: <phase-key>
Attempt: <int>
Expected artifact: <absolute-path>
Expected schema: <schema-id>
Dedup-Key: <prompt-hash>
Instructions:
<freeform multi-line instructions>
DEVFLOW_PROMPT_END <uuid>
9.2 Schema
const PromptEnvelope = z.object({
uuid: z.string().uuid(),
runId: z.string().uuid(),
roleId: z.string(),
phaseKey: z.string(),
attempt: z.number().int().nonnegative(),
expectedArtifact: z.string(),
expectedSchema: z.string(),
dedupKey: z.string(),
instructions: z.string(),
});
9.3 Rules
- Prompt identity is
dedupKey. - Adapter treats duplicate
dedupKeyfor the same session within a run lifetime as idempotent success and does not reprocess the prompt. attemptincrements only when the engine intentionally re-sends after timeout or repair.- Adapter-level retry does not increment attempt.
- Completion is never inferred from transcript text.
- Completion requires a schema-valid artifact.
9.4 Backend Prelude
Sent once at session bootstrap before the first envelope.
Required structure:
- Backend identity statement.
- Persona
instructionsPrelude. - Protocol declaration: completion is signaled only by writing expected artifact files.
- Envelope marker contract.
- Approval/probe contract:
DEVFLOW_PROBEmust respond with one lineREADYorBUSY <reason>.
Codex and Claude-specific addenda live in packages/session/src/profiles/{codex,claude}.ts and are populated at M10.
10. Artifact Schema Registry
10.1 Layout
JSON Schema 2020-12 documents live at:
docs/schemas/artifacts/<schema_id>.json
schema_id format:
<domain>/<name>@<version>
Examples:
dev/spec@1dev/phase-plan@1dev/dag@1dev/review-finding-batch@1bt/objective@1bt/iteration-result@1common/final-report@1
10.2 Loader
packages/core/src/artifact-schema.ts exports:
function loadSchema(id: string): JsonSchema;
function validateArtifact(
id: string,
data: unknown
): { ok: true } | { ok: false; errors: ValidationError[] };
Unknown schema id is fatal.
10.3 Validation Flow
- Engine waits for
expectedArtifactPathto appear. - Debounce 500ms after last
mtimechange. - Read file.
- Compute SHA-256.
- Validate against
expectedSchema. - Valid:
- insert artifact row with
valid=true. - append
artifact.validated. - advance phase.
- insert artifact row with
- Invalid:
- insert artifact row with
valid=false. - append
artifact.invalid. - trigger one repair prompt.
- after repair exhaustion, create human gate.
- insert artifact row with
- Timeout:
- append
artifact.timeout. - probe session.
- enter recovery flow.
- append
10.4 Final Report
At terminal run state, write atomically:
<WORKSPACE_ROOT>/<runId>/<runId>.report.md<WORKSPACE_ROOT>/<runId>/<runId>.report.json
Both are written even on failed or aborted, best-effort.
common/final-report@1 minimum fields:
runIdtemplateHashbindings[]inputsphases[]approvals[]findings[]commands[]artifacts[]events.tailunresolved[]endedAtstatus
10.5 Backtest Objective Stub
bt/objective@1:
{
"targets": [
{ "metric": "sharpe", "op": "gte", "value": 1.5, "weight": 1.0 },
{ "metric": "mdd", "op": "lte", "value": 0.15, "weight": 1.0 }
],
"stopWhen": "all"
}
op:gte|lte|eq|gt|ltstopWhen:all|weightedweightedthreshold is hardcoded at 0.8 at M12.- Full DSL deferred to M12.
11. Run Events
Closed event types:
run.created
run.started
run.paused
run.resumed
run.completed
run.failed
run.aborted
phase.started
phase.completed
phase.failed
phase.skipped
prompt.sent
prompt.repaired
artifact.expected
artifact.validated
artifact.invalid
artifact.timeout
approval.requested
approval.resolved
session.created
session.ready
session.busy
session.idle
session.crashed
session.recovered
session.failed
command.started
command.completed
command.failed
review.batch_recorded
finding.verifier_resolved
backtest.iteration_started
backtest.iteration_completed
backtest.objective_evaluated
11.1 Idempotency Keys
Every event append requires deterministic idempotency_key.
| Event family | Key formula |
|---|---|
run.created, run.started, run.completed, run.failed, run.aborted |
<type>:<run_id> |
run.paused |
run.paused:<run_id>:<cause> |
run.resumed |
run.resumed:<run_id>:<cause> |
phase.started, phase.completed, phase.failed, phase.skipped |
<type>:<phase_id>:<phase_attempt> |
prompt.sent, prompt.repaired |
<type>:<prompt_dedup_key> |
artifact.expected, artifact.timeout |
<type>:<phase_id>:<phase_attempt>:<expected_path> |
artifact.validated, artifact.invalid |
<type>:<phase_id>:<expected_path>:<artifact_hash> |
approval.requested |
approval.requested:<approval_idempotency_key> |
approval.resolved |
approval.resolved:<approval_request_id>:<action> |
session.created, session.failed |
<type>:<session_id> |
session.busy, session.idle |
<type>:<session_id>:<prompt_dedup_key> |
session.ready, session.crashed, session.recovered |
<type>:<session_id>:<recovery_attempts> |
command.started, command.completed, command.failed |
<type>:<command_id> |
review.batch_recorded |
review.batch_recorded:<phase_id>:<reviewer_role>:<phase_attempt> |
finding.verifier_resolved |
finding.verifier_resolved:<finding_id> |
backtest.iteration_started, backtest.iteration_completed, backtest.objective_evaluated |
<type>:<iteration_id> |
Definitions:
phase_attemptis incremented before event append.recovery_attemptsis incremented before event append.prompt_dedup_keyis the envelope dedup key.approval_idempotency_keyis fromapproval_requests.- Artifact expected/timeout events are per-attempt.
- Artifact validated/invalid events are content-keyed by path + hash.
12. Fake Session Adapter
12.1 Behavior
- Deterministic.
- In-process.
- No PTY.
- No tmux.
- Drives engine end-to-end without real backends.
12.2 Sentinel Triggers
On sendPrompt, inspect expectedSchema.
Fixture path:
tests/fixtures/fake-artifacts/<expectedSchema>/<scenarioName>.json
scenarioName comes from instruction header:
Scenario: <name>
Default scenario: ok.
Scenarios:
ok: write fixture toexpectedArtifactPathafter 50ms by default.invalid: write deliberately schema-invalid payload.timeout: never write.crash: throwRecoverableError.
12.3 Transcript
Fake adapter emits chunks such as:
[fake] received prompt <uuid>; will write <path> in 50ms
13. State Machines
13.1 Run State
States:
createdboundplanningawaiting_approvalexecutingpausedcompletedfailedaborted
Transitions:
| From | Trigger | To | Side effects |
|---|---|---|---|
created |
lockBindings ok |
bound |
persist bindings; emit run.started |
created |
lockBindings fail |
failed |
emit run.failed |
bound |
phase plan needed | planning |
emit phase.started |
planning |
plan artifact valid | awaiting_approval |
request approval |
awaiting_approval |
approve | executing |
emit approval.resolved, run.resumed |
awaiting_approval |
reject | failed |
emit run.failed |
awaiting_approval |
request_changes | planning |
increment phase attempts |
awaiting_approval |
timeout | paused |
set paused_from_state='awaiting_approval' |
executing |
phase ok, more phases | executing |
next phase |
executing |
normal workflow approval gate | awaiting_approval |
request gate |
executing |
all phases done | completed |
emit run.completed, write final report |
executing |
unrecoverable error | failed |
emit run.failed |
executing |
manual pauseRun |
paused |
set paused_from_state='executing' |
planning |
manual pauseRun |
paused |
set paused_from_state='planning' |
paused |
resume | paused_from_state |
emit run.resumed, clear paused_from_state |
| any non-terminal state | abortRun |
aborted |
emit run.aborted, dispose sessions |
Non-terminal states for abortRun:
createdboundplanningawaiting_approvalexecutingpaused
13.2 Run Phase State
States:
pendingrunningawaiting_artifactvalidatingawaiting_approvalcompletedfailedskipped
Transitions:
| From | Trigger | To |
|---|---|---|
pending |
start | running |
running |
prompt sent, artifact expected | awaiting_artifact |
awaiting_artifact |
artifact appears | validating |
awaiting_artifact |
timeout | running after probe/repair, or failed after exhaustion |
validating |
valid | awaiting_approval if gate, else completed |
validating |
invalid | running after one repair, else failed |
awaiting_approval |
approve | completed |
awaiting_approval |
reject / abort | failed |
awaiting_approval |
request_changes | running, attempt + 1 |
Replay rules:
phase.started.payload.repair === truemarks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions,prompt.repaired, and must not start a third attempt.- Repair replay from
runningmay reuse an existingREADY/ bootstrapped session even iflast_prompt_hashstill contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet. - If phase state is
validatingand no artifact row exists yet, replay re-reads and validates the currentexpectedArtifactPathinstead of treating the state as corruption. - If phase state is
validatingand artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current sessionlast_prompt_at; older rows are treated as stale previous-attempt outputs and the file is revalidated. - Session bootstrap DB row/state changes and
session.created/session.readyevents are written in one DB transaction after adapter start succeeds.
14. Approval State
States:
pendingapprovedrejectedchanges_requestedabortedpaused
14.1 Transitions
| From | Event | To | Side effects |
|---|---|---|---|
pending |
approve decision | approved |
insert decision row |
pending |
reject decision | rejected |
insert decision row; run -> failed |
pending |
request_changes decision | changes_requested |
insert decision row; increment attempt |
pending |
abort decision | aborted |
insert decision row; run -> aborted |
pending |
timeout | paused |
run -> paused; no decision row |
paused |
unpause | pending |
re-arm gate; no decision row |
| terminal states | any decision | unchanged | return 409 |
Rules:
- A
pendingrequest can transition to one non-pending state per pending epoch. - Terminal approval states reject further decisions.
pausedmay return topendingonly throughunpause.- Manual pause is run-level
pauseRun; it leaves approval gate inpending. - Only
approve,reject,request_changes, andabortcreateapproval_decisionsrows. - Default timeout is null.
- Timeout never auto-approves or auto-rejects.
14.2 Decision Idempotency
- GUI:
- UUIDv4 per click.
- reused across automatic UI retries for the same logical action.
- CLI:
- UUIDv4 per invocation.
--client-token=<uuid>override for scripted retry.
- API:
- existing
(approval_request_id, action, client_token)returns existing row with status 200. - new decision inserts row and returns 201.
- same token with different action returns 409.
- decision on non-pending request returns 409.
- existing
14.3 Destructive Command Enforcement
Devflow-direct commands have hard enforcement. TUI-agent commands have best-effort enforcement.
Hard-blocked Devflow-direct patterns:
rm -rfgit reset --hardgit cleangit push --forcegit push --force-with-leasegit worktree remove --forcegit branch -Ddocker volume rmdocker compose down -vDROP DATABASEDROP SCHEMA- migration rollback
- reads/writes touching
.env*,~/.ssh/,~/.aws/,~/.config/gcloud/,~/.kube/ - files matching
*token*,*secret*,*credentials*,*.pem,*.key
TUI-agent command enforcement is best-effort:
- Prelude prohibits destructive operations.
- Backend permission mode is set to safest available mode.
- Transcript audit captures post-hoc evidence.
- Human intervention goes through
devflow attach. - Worktrees and branches are preserved by default.
v1 does not claim real-time blocking of TUI-internal commands.
15. Run Engine and Temporal Contract
The M4 RunEngine contract is frozen before M5. M5 reimplements the same interface through Temporal.
15.1 Public API
interface RunEngine {
startRun(input: RunStartInput): Promise<{ runId: string }>;
signalApproval(
runId: string,
approvalRequestId: string,
action: ApprovalDecisionAction,
clientToken: string,
comment?: string
): Promise<void>;
pauseRun(runId: string): Promise<void>;
resumeRun(runId: string): Promise<void>;
abortRun(runId: string, reason: string): Promise<void>;
getStatus(runId: string): Promise<RunStatus>;
}
15.2 Temporal Shape
- Namespace:
devflow. - Task queue:
devflow-runs. - Single worker process:
apps/worker. - Workflow:
runWorkflow(input: RunStartInput). - Signals:
approvepauseresumeabortunpause
- No Updates in M5.
- Status is read from DB.
Activities:
lockBindings(input)generatePhasePlan(runId, phaseKey, attempt)sendPromptToSession(sessionId, envelope)waitForArtifact(sessionId, expectedPath, expectedSchema, timeoutMs)validateArtifact(artifactPath, expectedSchema)recordEvent(runId, type, payload)requestApproval(runId, gateKey, phaseId, payload, idempotencyKey)runCommand(kind, argv, cwd, env)composeFinalReport(runId)
Retry policy:
- Default: max attempts 3, exponential backoff start 1s, max 30s.
requestApproval: max attempts 1.composeFinalReport: max attempts 1.sendPromptToSession: max attempts 2; further retry belongs to engine recovery.
15.3 Hard Constraints
- Workflow code holds only serializable state.
- No tmux handles in workflow state.
- No PTY refs in workflow state.
- No DB clients in workflow state.
- M5+ session interaction happens through activities calling SessionManager in
apps/worker. - M5+ API never calls mutating
SessionAdaptermethods. - SessionManager advisory lock prevents API/worker ownership conflict during M4 -> M5 transition.
- Workflow code uses deterministic clock/randomness only.
16. WriteSet and Worktree
16.1 WriteSet
- Each task declares
writeSet: string[]. - Patterns are relative to repo root.
- Glob engine:
fast-glob. - Options:
{
cwd: worktreeRoot,
dot: true,
followSymbolicLinks: false,
onlyFiles: true,
suppressErrors: false
}
Conflict detection:
- Expand writeSets.
- Forbidden globs cause conflict if matched by more than one task:
pnpm-lock.yamlpackage-lock.json**/migrations/****/*.generated.*- root
tsconfig*.json biome.jsonlefthook.yml.github/**.gitlab-ci.yml
- Pairwise file intersections must be empty.
Conflict creates parallel_dag_approved gate.
16.2 Worktree Lifecycle
- Worktree root:
WORKSPACE_ROOT/<runId>/<laneId>- non-parallel main lane:
WORKSPACE_ROOT/<runId>/main
- Created via
git worktree add. - Branch name:
devflow/<runId>/<laneId>
- Terminal run state does not remove worktrees or branches.
- Output branches are deliverables.
- Disk growth is accepted.
- Cleanup is manual:
devflow cleanup <run-id> [--lane=<id>]
Cleanup:
- uses
git worktree removewithout--forceby default. - refuses dirty worktrees.
--forcerequires an additional gate.git branch -Dis destructive and gated.doctor --list-orphanslists only; it never removes.
17. SSE Contract
Endpoints:
GET /sse/runs/:runIdGET /sse/global
Heartbeat every 15 seconds.
Events:
| Event | Scope |
|---|---|
run.state_changed |
both |
run.event_appended |
run |
phase.state_changed |
run |
approval.created |
both |
approval.resolved |
both |
session.state_changed |
run |
transcript.chunk_appended |
run |
artifact.validated |
run |
Reconnect:
Last-Event-IDis lastrun_events.seq.- server replays
seq > lastSeq. - non-run-event SSE types are not replayed; state is re-derived by fetch.
18. Errors
packages/core/src/errors.ts:
type ErrorClass = 'recoverable' | 'human_required' | 'fatal';
class DevflowError extends Error {
readonly class: ErrorClass;
readonly code: string;
readonly runId?: string;
readonly phaseId?: string;
readonly recoveryHint?: string;
readonly cause?: unknown;
}
Recoverable:
network_blippane_briefly_unresponsiveprompt_send_transientdb_serialization_retry
Human required:
artifact_invalid_after_repairartifact_timeout_exhaustedprompt_send_exhausteddestructive_command_blockedsecret_access_blockedbackend_unavailableno_eligible_personawriteset_conflictmerge_conflictobjective_not_metreview_dispute_unresolved
Fatal:
db_unreachableworkspace_permissionsinternal_state_corruptiontemplate_load_failedartifact_schema_unknownartifact_schema_load_failedmigration_pendingconfig_invalid
Mapping:
- recoverable -> retry; exhausted -> human_required.
- human_required / recovery gate -> run paused and gate created. This is distinct from normal workflow approval gates in §13.1, which use
awaiting_approval. - fatal -> run failed, sessions disposed, final report best-effort.
19. Concurrent Runs and Crash Recovery
19.1 Active Run Uniqueness
MAX_CONCURRENT_RUNS, default 4.- DB partial unique index is the source of truth:
- one active run per
(repo_path, base_branch).
- one active run per
repo_pathis canonicalized before insert.- Advisory lock is auxiliary only:
pg_try_advisory_xact_lock(hash64('devflow:start-run', repoPath, baseBranch))
- Unique-index violation returns:
{ "currentRunId": "...", "currentState": "..." }
with HTTP 409.
19.2 Crash Recovery
M4, no Temporal:
- On
apps/apistartup, sweep non-terminal runs. - Mark them
failed. final_report_path = null.- Append synthesized
run.failedwith reasonprocess_restart_unrecovered. - Cascade associated
tui_sessionstoFAILED_NEEDS_HUMAN. - Append
session.failed. - This frees active-run uniqueness slots.
M5+:
- No sweep.
- Temporal durability owns in-flight workflow recovery.
- SessionManager resumes tmux sessions.
- Active-run partial index blocks duplicate runs until completion or explicit abort.
20. Milestones
M1: Monorepo + Postgres + CLI Doctor
- Scaffold workspace.
- Add pnpm, tsconfig, biome, lefthook, Vitest.
- Add Docker Compose for Postgres.
- Add Drizzle and first migration.
- Add
devflow doctor. - Implement checks 1-9.
- Stub checks 10-12 as warn where needed.
- Add SSE compatibility smoke test:
- minimal Fastify 5 server.
fastify-sse-v2plugin.- 30-second integration test.
- receive 3 events and reconnect.
- if plugin fails, implement native
reply.rawSSE helper before M1 is green.
M2: Core Schema + Registry + Binding
- Implement enums.
- Implement canonical hashing.
- Implement Template schema.
- Implement Persona schema.
- Implement seed loader.
- Implement binding algorithm.
- Implement artifact schema registry.
- Add first schemas:
dev/spec@1dev/phase-plan@1common/final-report@1
- Tests:
- schema validation.
- override semantics.
- risk enforcement.
- diversity enforcement.
- deterministic auto-select.
M3: Fake Session Runtime
- Implement
SessionAdapter. - Implement
FakeSessionAdapter. - Implement prompt envelope.
- Implement event recorder.
- Implement fake sentinel scenarios.
- Persist transcript chunks.
- Tests:
- prompt correlation.
- artifact validation.
- invalid artifact.
- timeout.
- fake crash.
M4: Minimal Run Engine
- Implement
packages/run-engine. - Used directly by
apps/api. - No Temporal.
- Supports:
- start run.
- lock bindings.
- approval.
- fake prompt.
- artifact wait/validate.
- final report.
- Freeze the
RunEnginecontract. - Full fake
development@1minus reviewers.
M5: Temporal Integration
- Reimplement
RunEnginethrough Temporal. - Preserve M4 behavior.
- Add parity tests using the same M4 scenarios.
- M5+ SessionManager lives in
apps/worker.
M6: Real tmux SessionManager
- Implement
TmuxSessionAdapter. - Decoupled from M5.
- May begin after M3 is stable.
- Pre-M5 real tmux is opt-in smoke only.
- Production run path remains fake until both M5 and M6 are green.
M7: TUI Recovery State Machine
- Implement session state transitions.
- Implement recovery counters.
- Implement escalation to human gates.
M8: API + GUI Minimum
- Implement Fastify routes.
- Implement SSE.
- Implement GUI screens:
- Dashboard.
- Templates.
- Personas.
- New Run.
- Run Detail.
- Approvals.
- TUI Sessions.
M9: development@1 Fake-Agent Full Run
- Add curated
development@1. - Add review consensus.
- Add verifier flow with fake reviewers.
- Add coverage gate >=70% lines for core/session/run-engine.
M10: Codex/Claude Opt-In Real Run
- Implement profiles:
packages/session/src/profiles/codex.tspackages/session/src/profiles/claude.ts
- Real backends become production-default only after both M5 and M6 are green.
- Until then real tmux/Codex/Claude are developer-flagged opt-in smoke only.
M11: Parallel Lanes
- Add task DAG scheduler.
- Add writeSet detection.
- Add per-lane worktrees.
- Add merge coordinator.
- Add conflict gates.
M12: Backtest Workflow
- Add
backtest-strategy@1. - Add objective evaluator.
- Add metric parser extension points.
- Add failure mining artifacts.
- Add Backtest Lab GUI.
M13: Template Factory
- Generate draft template from natural language and repo discovery.
- Add harness design.
- Add template review.
- Add dry-run and promote flow.
21. Out of Scope
- Authentication.
- Authorization.
- Multi-user support.
- Data retention or archival policy.
- Observability dashboards.
- Remote template/persona registries.
- Multi-machine deployment.
- HA.
- Managed backups.
- Web ingress.
- TLS.
- Reverse proxy.
22. Decision Log
Open Questions Closed
| # | Question | Resolution |
|---|---|---|
| OQ-1 | Persona/template seeding format | Immutable YAML at docs/schemas/{personas,templates}/<name>@<version>.yaml |
| OQ-2 | Approval timeout default | null; timeout freezes only |
| OQ-3 | Final report format | Markdown and JSON |
| OQ-4 | Temporal namespace/queue | namespace devflow, task queue devflow-runs |
| OQ-5 | WriteSet glob engine | fast-glob |
| OQ-6 | Backtest objective DSL | Stub in M12, full DSL deferred |
| OQ-7 | Codex/Claude prompt prelude | Structure locked, exact text deferred to M10 |
Blocking Corrections Applied
| # | Issue | Resolution |
|---|---|---|
| CC-1 | Terminal state deleted worktrees/branches | Preserve by default; manual gated cleanup only |
| CC-2 | SessionManager location conflict | M4 API, M5+ worker |
| CC-3 | Event duplicates under retry | run_events.idempotency_key |
| CC-4 | Destructive command enforcement overclaimed | Devflow-direct hard, TUI best-effort |
| CC-5 | UUID extension missing | CREATE EXTENSION IF NOT EXISTS pgcrypto |
| CC-6 | Advisory lock not enough for active-run uniqueness | partial unique index |
| CC-7 | Undefined transition sequence in event keys | cause-based keys |
| CC-8 | Approval paused transition missing | explicit approval transition table |
| CC-9 | AutoSelect order nondeterministic | deterministic sort |
| CC-10 | SSE plugin compatibility assumed | M1 smoke + native fallback |
| CC-11 | ApprovalAction included pause | split ApprovalDecisionAction; pauseRun is run-level |
| CC-12 | Artifact hash key collision | include phase id and path |
| CC-13 | Resume previous state not stored | runs.paused_from_state |
| CC-14 | repo path aliasing | canonical realpath storage |
| CC-15 | M4 sweep left tmux sessions ambiguous | cascade session state to FAILED_NEEDS_HUMAN; real tmux production-default only after M5+M6 |
| CC-16 | Prompt hash used phaseId but envelope uses phaseKey | prompt hash uses phaseKey |
| CC-17 | abortRun transition too narrow | abort from any non-terminal run state |
| CC-18 | approval pending transition wording conflicted with pause epoch | pending can transition once per pending epoch; paused may unpause to pending |
| CC-19 | tsc -b --noEmit is brittle with TypeScript 5.6 project references on clean worktrees |
build still uses tsc -b; no-emit verification uses root tsconfig.typecheck.json |
| CC-20 | sendPrompt retry count was ambiguous against Temporal activity attempts |
§8.3 now states retry budget means initial attempt plus retries; §15.2 remains Temporal-level attempts only |
| CC-21 | Duplicate prompt dedup handling conflicted with adapter retry idempotency | duplicate dedupKey returns idempotent success without reprocessing |
| CC-22 | Normal workflow approval gates and human-required recovery gates were easy to conflate | §13.1 names normal workflow gates; §18 keeps human_required recovery gates paused |
| CC-23 | Phase start and event append could diverge under retry/error | phase start and phase.started append occur in one DB transaction |
| CC-24 | Repair attempt replay lost repair prompt identity and one-repair budget | repair attempts are derived from phase.started.payload.repair, replay uses repair instructions and prompt.repaired, and cannot start attempt 3 |
| CC-25 | validating replay failed if crash happened before artifact row insert |
replay revalidates the expected artifact file when state is validating but no artifact row exists |
| CC-26 | Session bootstrap state/events could diverge | session row/state and session.created / session.ready events are committed in one DB transaction |
| CC-27 | validating replay could reuse stale previous-attempt artifact rows |
artifact-row replay requires artifact.created_at >= tui_sessions.last_prompt_at; otherwise the file is revalidated |
| CC-28 | repair running replay rejected existing READY sessions with previous attempt prompt hash |
current-attempt repair prompt is considered unsent, so replay may reuse the session and send prompt.repaired |
Future Open Questions
- FOQ-1, M12: full backtest objective DSL.
- FOQ-2, M13: template factory generation prompts.
- FOQ-3, post-M10: optional third backend such as Gemini.
- FOQ-4, post-M8: WebSocket vs SSE if transcript pressure requires it.
23. Kickoff Order
- M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
- M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
- M1.3:
apps/cliskeleton +devflow doctor. - M1.4:
packages/coreskeleton with config, enums, errors, hash, prompt-envelope, run-event types. - M2.1: Zod schemas for Template/Persona, persona YAML loader, hashing.
- M2.2: Binding algorithm + tests.
- M2.3: Artifact schema registry + first three schemas.
- M3.1:
SessionAdapterinterface +FakeSessionAdapter. - M3.2: Transcript chunk capture + DB persistence.
- M3.3: engine-shaped harness running a single fake phase end-to-end.
- M4: assemble run engine; lock contract; full fake
development@1minus reviewers. - M5 in parallel with M6 once M4 is green.