55 KiB
Devflow Implementation Plan v3 r12
0. Document Status
- This document supersedes v2 and all earlier v3 drafts where conflicting.
- Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
- Target OS: macOS 13+ / Linux. No Windows.
- All paths are Unix-style. All times are stored UTC.
- Decisions in this document are locked unless explicitly marked
(provisional). Override requires updating this document, not only code. - r1 applied CC-1 through CC-5.
- r2 applied CC-6 through CC-10.
- r3 applied CC-11 through CC-15.
- r4 applies CC-16 through CC-18.
- r5 applies CC-19.
- r6 applies CC-20.
- r7 applies CC-21 through CC-23.
- r8 applies CC-24 through CC-26.
- r9 applies CC-27 through CC-28.
- r10 applies CC-29 through CC-31.
- r11 applies CC-32.
- r12 applies CC-33 through CC-35.
1. Stack Decisions
1.1 Workspace
pnpm 9with workspaces. No Turbo.- Node 22 LTS, pinned by
.nvmrcandpackage.json#engines. - TypeScript 5.6 with project references via
tsc -b. strict: true.- No
anyunless accompanied by an explicit annotation comment explaining why.
1.2 Tooling
- Build:
tsupfor libraries, CJS + ESM dual output.viteforapps/web.tsxforapps/cli,apps/api, andapps/workerin dev.nodefor prod-ish local runs.
- Test:
vitestwith workspace config.- Coverage via
@vitest/coverage-v8. - No coverage gate at M1.
- M9 adds coverage gate: >=70% lines on
packages/core,packages/session,packages/run-engine.
- Lint/format:
biome.- One root config.
- Pre-commit:
lefthook.- Runs
biome check --writeon staged files. - Runs
tsc -p tsconfig.typecheck.json --noEmit. - Runs related Vitest tests on changed packages.
1.3 Database
- Postgres 16 via Docker Compose.
- Drizzle ORM +
drizzle-kit generate. - Generated SQL migrations are committed.
- Migrations are never auto-applied at runtime except through the explicit migration runner invoked by
devflow up. - Migration runner:
scripts/migrate.ts.- Takes
DATABASE_URL. devflow upwaits for Postgres health and then runs pending migrations.
1.4 Logging
pino.pino-prettyin dev, JSON otherwise.- Standard fields:
timelevelmodulerunId?phaseId?role?eventId?
- Levels:
trace: transcript chunks only.debug: internal state transitions.info: run events.warn: recoverable errors.error: human-required or fatal errors.
1.5 Config
- Single Zod schema in
packages/core/src/config.ts. - Source precedence, high to low:
process.env.env.local.env- schema defaults
- Config is loaded once at process start, validated, frozen, and exported as typed
Config. - Config validation failure is fatal.
- Required keys at M1:
DATABASE_URLWORKSPACE_ROOTLOG_LEVEL
- M5 adds:
TEMPORAL_ADDRESS
- Path canonicalization:
WORKSPACE_ROOTis resolved throughfs.realpathSyncand stored as an absolute path at config load.- Any path entering the system must be canonicalized before storage or hashing.
repo_pathandworktree_rootrules are defined in section 4.
Backend registration:
const BackendConfig = z.object({
id: Backend, // codex | claude | fake
enabled: z.boolean(),
binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude
});
fakeis always available.codexandclaudeare available only when:enabled=true- binary resolves at process start.
- Resolution failure:
doctorwarns.- binding fails fast at run start with
human_required:backend_unavailable.
- Binding reads from
config.backends, never directly fromPATH.
1.6 HTTP
fastify5.@fastify/sensible.- SSE primary strategy:
- Try
fastify-sse-v2. - Fastify 5 compatibility is not assumed.
- M1 includes a smoke test.
- Try
- SSE fallback:
- Native
reply.raw. - Headers:
content-type: text/event-streamcache-control: no-cacheconnection: keep-alive
- Write
data: <json>\n\n. - Manage heartbeats and reconnect manually.
- Native
- WebSocket is deferred unless SSE fails under transcript volume.
2. Directory Layout
devflow/
├── package.json
├── pnpm-workspace.yaml
├── tsconfig.base.json
├── biome.json
├── lefthook.yml
├── vitest.workspace.ts
├── docker-compose.yml
├── .nvmrc
├── .env.example
├── docs/
│ ├── plan.md
│ ├── adr/
│ └── schemas/
│ ├── artifacts/
│ ├── personas/
│ └── templates/
├── scripts/
│ ├── migrate.ts
│ └── seed.ts
├── packages/
│ ├── core/
│ │ └── src/
│ │ ├── config.ts
│ │ ├── enums.ts
│ │ ├── hash.ts
│ │ ├── errors.ts
│ │ ├── template.ts
│ │ ├── persona.ts
│ │ ├── binding.ts
│ │ ├── prompt-envelope.ts
│ │ ├── artifact-schema.ts
│ │ ├── run-event.ts
│ │ └── index.ts
│ ├── db/
│ │ └── src/
│ │ ├── schema/
│ │ ├── migrations/
│ │ ├── repositories/
│ │ └── client.ts
│ ├── session/
│ │ └── src/
│ │ ├── adapter.ts
│ │ ├── fake.ts
│ │ ├── tmux.ts
│ │ ├── profiles/
│ │ │ ├── codex.ts
│ │ │ └── claude.ts
│ │ ├── recovery.ts
│ │ └── transcript.ts
│ ├── harness/
│ │ └── src/
│ │ ├── git.ts
│ │ ├── worktree.ts
│ │ ├── runner.ts
│ │ ├── review.ts
│ │ └── backtest.ts
│ ├── run-engine/
│ │ └── src/
│ │ ├── engine.ts
│ │ ├── phase-executor.ts
│ │ └── approval.ts
│ └── workflows/
│ └── src/
│ ├── workflow.ts
│ └── activities.ts
├── apps/
│ ├── api/
│ ├── web/
│ ├── cli/
│ └── worker/
└── tests/
├── e2e/
└── fixtures/
3. devflow doctor
Exit codes:
0: all green.1: one or more red checks.2: internal or unknown error.
Each check emits:
namestatus:pass|fail|warndetailremediation
Closed check list:
- Node version satisfies
>=22.0.0 <23. - pnpm version
>=9.0.0. tmuxexists, version>=3.3.gitversion>=2.40.- Docker daemon reachable.
- Postgres container running,
pg_isreadyok,DATABASE_URLconnects. - No pending Drizzle migrations.
WORKSPACE_ROOTexists and is writable..envresolves to validConfig.codexinPATH, warn-only.claudeinPATH, warn-only.- Free disk on
WORKSPACE_ROOTpartition:- warn under 10GB.
- fail under 2GB.
- target green threshold: >=5GB.
Output:
- Human table by default.
--jsonfor machine-readable output.--quietprints only nonzero results.--list-orphanslists orphaned worktrees only; it never removes them.
4. Database Schema
First migration prelude:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
All tables use gen_random_uuid() primary keys unless noted. All times are timestamptz. Mutable rows include updated_at. JSON columns use jsonb.
4.1 workflow_templates
id uuid primary key default gen_random_uuid()name text not nullversion int not nullhash text not null uniquedefinition jsonb not nullcreated_at timestamptz not null default now()- unique
(name, version)
4.2 agent_personas
id uuid primary key default gen_random_uuid()name text not nullversion int not nullhash text not null uniquedefinition jsonb not nullcreated_at timestamptz not null default now()- unique
(name, version)
4.3 runs
id uuid primary key default gen_random_uuid()template_id uuid not null references workflow_templates(id)template_hash text not nullstate text not nullrepo_path text not null- canonical absolute path
- resolved through
fs.realpathSyncbefore insert
base_branch text not nullworktree_root text not null- canonical absolute path under
WORKSPACE_ROOT/<runId>/
- canonical absolute path under
current_phase_id uuid references run_phases(id)nullable and deferrablestarted_at timestamptzended_at timestamptzfinal_report_path textpaused_from_state text- set when transitioning to
paused - cleared on resume
- null when state is not
paused
- set when transitioning to
created_at timestamptz not null default now()updated_at timestamptz
Active-run uniqueness:
CREATE UNIQUE INDEX ux_active_run_repo_base
ON runs (repo_path, base_branch)
WHERE state NOT IN ('completed', 'failed', 'aborted');
4.4 run_inputs
id uuid primary key default gen_random_uuid()run_id uuid not null unique references runs(id) on delete cascaderequirements_md text not nullobjective jsonbextra jsonbinput_hash text not null
input_hash is based on:
requirements_mdobjectiveextra- canonical
repo_path base_branch
4.5 run_bindings
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascaderole_id text not nullpersona_id uuid not null references agent_personas(id)persona_hash text not nullbackend text not nullbinding_hash text not null- unique
(run_id, role_id)
4.6 run_phases
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_key text not nullseq int not nullstate text not nullattempts int not null default 0started_at timestamptzended_at timestamptz- unique
(run_id, phase_key)
4.7 run_events
Append-only.
id bigserial primary keyrun_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)seq bigint not nulltype text not nullpayload jsonb not nullidempotency_key text not nullts timestamptz not null default now()- unique
(run_id, seq) - unique
(run_id, idempotency_key) - index
(run_id, ts)
Concurrency:
- All inserts go through
RunEventRepository.append(). - Raw SQL inserts into
run_eventsare forbidden. append()takespg_advisory_xact_lock(hash64('devflow:run-events', run_id)).- Inside that same transaction it assigns:
seq := COALESCE(MAX(seq), 0) + 1
4.8 approval_requests
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id)phase_id uuid references run_phases(id)gate_key text not nullstate text not nullidempotency_key text not nullpayload jsonb not nullcreated_at timestamptz not null default now()resolved_at timestamptz- unique
(idempotency_key)
4.9 approval_decisions
Append-only and immutable.
id uuid primary key default gen_random_uuid()approval_request_id uuid not null references approval_requests(id)action text not nullapproverejectrequest_changesabort
comment textdecided_at timestamptz not null default now()idempotency_key text not null unique
pause is not an approval decision.
4.10 tui_sessions
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascaderole_id text not nullbackend text not nullcwd text not nullexpected_artifact_path textexpected_schema textlast_prompt_hash textlast_prompt_at timestamptzlast_capture_seq bigint not null default 0last_known_pane_pid inttmux_session texttmux_window textstate text not nullrecovery_attempts int not null default 0- unique
(run_id, role_id)
4.11 tui_transcript_chunks
Append-only.
id bigserial primary keysession_id uuid not null references tui_sessions(id) on delete cascadeseq bigint not nullcontent text not nullcaptured_at timestamptz not null default now()- unique
(session_id, seq)
4.12 artifacts
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)path text not nullschema_id text not nullhash text not nullvalid boolean not nullvalidation_error jsonbcreated_at timestamptz not null default now()- unique
(run_id, path, hash)
4.13 commands
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)kind text not nullgitteste2edoctorbacktestother
argv text[] not nullcwd text not nullexit_code intstdout_path textstderr_path textstarted_at timestamptzended_at timestamptz
4.14 review_findings
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadephase_id uuid references run_phases(id)reviewer_role text not nullseverity text not nullinfolowmediumhighcritical
category text not nullcorrectnessevidencestylesecurityperformanceother
file_path textline intsummary text not nullevidence textverifier_status text not null default 'unverified'unverifiedconfirmedrejected
created_at timestamptz not null default now()
4.15 Backtest Stub Tables
backtest_iterations and backtest_metrics are created at M1 as stub tables:
id uuid primary key default gen_random_uuid()run_id uuid not null references runs(id) on delete cascadepayload jsonbcreated_at timestamptz not null default now()
Full schema is deferred to M12.
5. Enums
All enums live in packages/core/src/enums.ts as TypeScript const objects and Zod enums.
5.1 Backend
codexclaudefake
Future gemini support adds an enum entry and a BackendProfile; no design change.
5.2 Capability
spec_writephase_planningtask_dag_planningcode_edittest_first_developmentcode_reviewevidence_checkcommand_executebacktest_runmetric_extractfailure_miningobjective_evalfinal_report_compose
5.3 RiskLevel
lowmediumhigh
Risk is declared per phase in the template. Persona has maxRiskLevel. Binding fails when phase.risk > persona.maxRiskLevel.
5.4 ApprovalDecisionAction
approverejectrequest_changesabort
pause is a run-level control operation, not an approval decision.
5.5 ApprovalState
pendingapprovedrejectedchanges_requestedabortedpaused
paused is not an auto-decision.
5.6 RunState
createdboundplanningawaiting_approvalexecutingpausedcompletedfailedaborted
5.7 RunPhaseState
pendingrunningawaiting_artifactvalidatingawaiting_approvalcompletedfailedskipped
5.8 SessionState
CREATEDBOOTSTRAPPINGREADYBUSYWAITING_FOR_APPROVALARTIFACT_TIMEOUTHUNGCRASHEDRESUMINGREBOOTSTRAPPEDFAILED_NEEDS_HUMAN
6. Content-Addressed Hashing
6.1 Canonical JSON
- Object keys sorted lexicographically by UTF-16 code units.
- No insignificant whitespace.
- Strings use standard JSON escaping.
- No Unicode normalization.
- Numbers use shortest round-trippable representation.
- Integers have no decimal point.
- No leading zeros.
- Arrays preserve order.
- No trailing newline.
packages/core/src/hash.ts exports:
canonicalize(value: unknown): string
hash(value: unknown): string
hash() returns sha256hex(canonicalize(value)).
6.2 Hash Subjects
- Template hash:
{ name, version, roles, phases, gates, capabilitiesRequired }
- Persona hash:
{ name, version, capabilities, backend, maxRiskLevel, allowedRoles, promptConfig, modelConfig }
- Binding hash:
{ runId, roleId, templateHash, personaHash, backend, override }
- Run input hash:
{ templateHash, bindings: sorted[bindingHash], requirementsMd, objective, repoPath, baseBranch, extra }
- Prompt hash:
{ runId, roleId, phaseKey, expectedArtifact, expectedSchema, instructions, attempt }
- Artifact hash:
- SHA-256 of file bytes.
Prompt hash uses phaseKey, not phaseId, because PromptEnvelope carries phaseKey.
7. Template, Persona, Binding
7.1 Template Schema
const TemplatePhase = z.object({
key: z.string(),
title: z.string(),
risk: RiskLevel,
roles: z.array(z.string()),
expectedArtifact: z
.object({
path: z.string(),
schema: z.string(),
})
.optional(),
gates: z.array(z.string()).default([]),
timeoutMs: z.number().int().positive().optional(),
});
const TemplateRole = z.object({
id: z.string(),
requiredCapabilities: z.array(Capability),
preferredBackends: z.array(Backend).default([]),
count: z.number().int().min(1).default(1),
diversity: z
.object({
requireDifferentBackends: z.boolean().default(false),
})
.optional(),
});
const Template = z.object({
name: z.string(),
version: z.number().int().positive(),
roles: z.array(TemplateRole),
phases: z.array(TemplatePhase),
defaultGates: z.array(z.string()).default([]),
});
7.2 Persona Schema
const Persona = z.object({
name: z.string(),
version: z.number().int().positive(),
backend: Backend,
capabilities: z.array(Capability),
maxRiskLevel: RiskLevel,
allowedRoles: z.array(z.string()).optional(),
promptConfig: z
.object({
systemPrompt: z.string().optional(),
instructionsPrelude: z.string().optional(),
})
.default({}),
modelConfig: z.record(z.string(), z.unknown()).default({}),
});
7.3 Override Semantics
- Override may swap persona for a role.
- Override may constrain backend to a specific allowed backend.
- Override cannot add capabilities.
- Override cannot raise risk above persona
maxRiskLevel. - Diversity rules apply after override.
- Lock-time validation runs the full binding algorithm.
- On first binding failure, the run does not start.
7.4 Binding Algorithm
For each role:
- Select override persona if present; otherwise run
autoSelect. - Assert backend is enabled in
config.backends. - Assert non-fake backend binary resolved at process start.
- Assert role id is in
allowedRoles, unlessallowedRolesis absent. - Assert required capabilities are a subset of persona capabilities.
- Assert every phase using the role has risk <= persona
maxRiskLevel. - Expand roles with
count > 1intoroleId#0,roleId#1, etc. - Enforce diversity rules after expansion.
- Compute and persist
binding_hashper role instance.
autoSelect is deterministic. Sort candidates by:
- role
preferredBackendsorder. persona.version desc.persona.name asc.persona.hash asc.
Personas whose backend is not in preferredBackends are eligible only if all preferred-backend personas fail capability or risk checks.
Binding fails with human_required:no_eligible_persona if no persona satisfies requirements.
7.5 Seeding
Personas:
docs/schemas/personas/<name>@<version>.yaml- filename encodes immutable identity.
- loader parses with Persona schema.
- loader computes
personaHash. - loader upserts keyed by
(name, version). - hash mismatch on an existing row is fatal.
Templates:
docs/schemas/templates/<name>@<version>.yaml- same immutable version rule.
Deleting a published file is allowed only when no run references that hash.
8. Session Runtime
8.1 SessionAdapter Interface
export interface SessionAdapter {
start(input: StartInput): Promise<SessionHandle>;
sendPrompt(handle: SessionHandle, envelope: PromptEnvelope): Promise<{ promptId: string }>;
probe(handle: SessionHandle): Promise<ProbeResult>;
resume(handle: SessionHandle): Promise<SessionHandle>;
rebootstrap(handle: SessionHandle): Promise<SessionHandle>;
capture(handle: SessionHandle, fromSeq: bigint): AsyncIterable<TranscriptChunk>;
dispose(handle: SessionHandle): Promise<void>;
}
export interface StartInput {
runId: string;
roleId: string;
backend: Backend;
cwd: string;
expectedArtifactPath?: string;
expectedSchema?: string;
envelopePrelude?: string;
}
export interface SessionHandle {
sessionId: string;
pid?: number;
tmuxSession?: string;
tmuxWindow?: string;
}
export interface ProbeResult {
alive: boolean;
paneActive: boolean;
lastOutputAt?: Date;
hint?: string;
}
export interface TranscriptChunk {
seq: bigint;
content: string;
capturedAt: Date;
}
8.2 Session State Machine
CREATED -> BOOTSTRAPPING -> READYREADY <-> BUSYBUSY -> WAITING_FOR_APPROVALBUSY -> ARTIFACT_TIMEOUTBUSY -> HUNGBUSY -> CRASHEDHUNG | CRASHED | ARTIFACT_TIMEOUT -> RESUMING -> READYRESUMING -> REBOOTSTRAPPED -> READY- exhausted errors ->
FAILED_NEEDS_HUMAN
8.3 Recovery Counters
sendPromptretry: 2.- Means one initial send plus two adapter-level retries, three physical send attempts max.
resumeretry: 2.rebootstrapretry: 1.- artifact repair retry: 1.
- max hung time: configurable; default 20 minutes.
Exhaustion creates a human gate with recoveryHint.
8.4 SessionManager Singleton
- M4: hosted in
apps/api. - M5+: hosted in
apps/worker. - Only SessionManager may call mutating
SessionAdaptermethods. - Holds in-memory
Map<sessionId, SessionHandle>. - Takes
pg_advisory_lock(hash64('devflow:session-manager')). - Second instance exits code
3. - On start:
- query non-terminal
tui_sessions. - call
adapter.resume(handle). - success: place handle in map.
- failure: session ->
FAILED_NEEDS_HUMAN, appendsession.failed, create recovery gate.
- query non-terminal
- On SIGTERM/SIGINT:
- refuse new prompts.
- allow in-flight artifact polling up to 30s.
- persist
last_capture_seq. - release advisory lock.
9. Prompt Envelope
9.1 Wire Format
DEVFLOW_PROMPT_BEGIN <uuid>
Run: <run-id>
Role: <role-id>
Phase: <phase-key>
Attempt: <int>
Expected artifact: <absolute-path>
Expected schema: <schema-id>
Dedup-Key: <prompt-hash>
Instructions:
<freeform multi-line instructions>
DEVFLOW_PROMPT_END <uuid>
9.2 Schema
const PromptEnvelope = z.object({
uuid: z.string().uuid(),
runId: z.string().uuid(),
roleId: z.string(),
phaseKey: z.string(),
attempt: z.number().int().nonnegative(),
expectedArtifact: z.string(),
expectedSchema: z.string(),
dedupKey: z.string(),
instructions: z.string(),
});
9.3 Rules
- Prompt identity is
dedupKey. - Adapter treats duplicate
dedupKeyfor the same session within a run lifetime as idempotent success and does not reprocess the prompt. attemptincrements only when the engine intentionally re-sends after timeout or repair.- Adapter-level retry does not increment attempt.
- Completion is never inferred from transcript text.
- Completion requires a schema-valid artifact.
9.4 Backend Prelude
Sent once at session bootstrap before the first envelope.
Required structure:
- Backend identity statement.
- Persona
instructionsPrelude. - Protocol declaration: completion is signaled only by writing expected artifact files.
- Envelope marker contract.
- Approval/probe contract:
DEVFLOW_PROBEmust respond with one lineREADYorBUSY <reason>.
Codex and Claude-specific addenda live in packages/session/src/profiles/{codex,claude}.ts and are populated at M10.
10. Artifact Schema Registry
10.1 Layout
JSON Schema 2020-12 documents live at:
docs/schemas/artifacts/<schema_id>.json
schema_id format:
<domain>/<name>@<version>
Examples:
dev/spec@1dev/phase-plan@1dev/dag@1dev/review-finding-batch@1bt/objective@1bt/iteration-result@1common/final-report@1
10.2 Loader
packages/core/src/artifact-schema.ts exports:
function loadSchema(id: string): JsonSchema;
function validateArtifact(
id: string,
data: unknown
): { ok: true } | { ok: false; errors: ValidationError[] };
Unknown schema id is fatal.
10.3 Validation Flow
- Engine waits for
expectedArtifactPathto appear. - Debounce 500ms after last
mtimechange. - Read file.
- Compute SHA-256.
- Validate against
expectedSchema. - Valid:
- insert artifact row with
valid=true. - append
artifact.validated. - advance phase.
- insert artifact row with
- Invalid:
- insert artifact row with
valid=false. - append
artifact.invalid. - trigger one repair prompt.
- after repair exhaustion, create human gate.
- insert artifact row with
- Timeout:
- append
artifact.timeout. - probe session.
- enter recovery flow.
- append
10.4 Final Report
At terminal run state, write atomically:
<WORKSPACE_ROOT>/<runId>/<runId>.report.md<WORKSPACE_ROOT>/<runId>/<runId>.report.json
Both are written even on failed or aborted, best-effort.
common/final-report@1 minimum fields:
runIdtemplateHashbindings[]inputsphases[]approvals[]findings[]commands[]artifacts[]events.tailunresolved[]endedAtstatus
10.5 Backtest Objective Stub
bt/objective@1:
{
"targets": [
{ "metric": "sharpe", "op": "gte", "value": 1.5, "weight": 1.0 },
{ "metric": "mdd", "op": "lte", "value": 0.15, "weight": 1.0 }
],
"stopWhen": "all"
}
op:gte|lte|eq|gt|ltstopWhen:all|weightedweightedthreshold is hardcoded at 0.8 at M12.- Full DSL deferred to M12.
11. Run Events
Closed event types:
run.created
run.started
run.paused
run.resumed
run.completed
run.failed
run.aborted
phase.started
phase.completed
phase.failed
phase.skipped
prompt.sent
prompt.repaired
artifact.expected
artifact.validated
artifact.invalid
artifact.timeout
approval.requested
approval.resolved
session.created
session.ready
session.busy
session.idle
session.crashed
session.recovered
session.failed
command.started
command.completed
command.failed
review.batch_recorded
finding.verifier_resolved
backtest.iteration_started
backtest.iteration_completed
backtest.objective_evaluated
11.1 Idempotency Keys
Every event append requires deterministic idempotency_key.
| Event family | Key formula |
|---|---|
run.created, run.started, run.completed, run.failed, run.aborted |
<type>:<run_id> |
run.paused |
run.paused:<run_id>:<cause> |
run.resumed |
run.resumed:<run_id>:<cause> |
phase.started, phase.completed, phase.failed, phase.skipped |
<type>:<phase_id>:<phase_attempt> |
prompt.sent, prompt.repaired |
<type>:<prompt_dedup_key> |
artifact.expected, artifact.timeout |
<type>:<phase_id>:<phase_attempt>:<expected_path> |
artifact.validated, artifact.invalid |
<type>:<phase_id>:<expected_path>:<artifact_hash> |
approval.requested |
approval.requested:<approval_idempotency_key> |
approval.resolved |
approval.resolved:<approval_request_id>:<action> |
session.created, session.failed |
<type>:<session_id> |
session.busy, session.idle |
<type>:<session_id>:<prompt_dedup_key> |
session.ready, session.crashed, session.recovered |
<type>:<session_id>:<recovery_attempts> |
command.started, command.completed, command.failed |
<type>:<command_id> |
review.batch_recorded |
review.batch_recorded:<phase_id>:<reviewer_role>:<phase_attempt> |
finding.verifier_resolved |
finding.verifier_resolved:<finding_id> |
backtest.iteration_started, backtest.iteration_completed, backtest.objective_evaluated |
<type>:<iteration_id> |
Definitions:
phase_attemptis incremented before event append.recovery_attemptsis incremented before event append.prompt_dedup_keyis the envelope dedup key.approval_idempotency_keyis fromapproval_requests.- Artifact expected/timeout events are per-attempt.
- Artifact validated/invalid events are content-keyed by path + hash.
12. Fake Session Adapter
12.1 Behavior
- Deterministic.
- In-process.
- No PTY.
- No tmux.
- Drives engine end-to-end without real backends.
12.2 Sentinel Triggers
On sendPrompt, inspect expectedSchema.
Fixture path:
tests/fixtures/fake-artifacts/<expectedSchema>/<scenarioName>.json
scenarioName comes from instruction header:
Scenario: <name>
Default scenario: ok.
Scenarios:
ok: write fixture toexpectedArtifactPathafter 50ms by default.invalid: write deliberately schema-invalid payload.timeout: never write.crash: throwRecoverableError.
12.3 Transcript
Fake adapter emits chunks such as:
[fake] received prompt <uuid>; will write <path> in 50ms
13. State Machines
13.1 Run State
States:
createdboundplanningawaiting_approvalexecutingpausedcompletedfailedaborted
Transitions:
| From | Trigger | To | Side effects |
|---|---|---|---|
created |
lockBindings ok |
bound |
persist bindings; emit run.started |
created |
lockBindings fail |
failed |
emit run.failed |
bound |
phase plan needed | planning |
emit phase.started |
planning |
plan artifact valid | awaiting_approval |
request approval |
awaiting_approval |
approve | executing |
emit approval.resolved, run.resumed |
awaiting_approval |
reject | failed |
emit run.failed |
awaiting_approval |
request_changes | planning |
increment phase attempts |
awaiting_approval |
timeout | paused |
set paused_from_state='awaiting_approval' |
executing |
phase ok, more phases | executing |
next phase |
executing |
normal workflow approval gate | awaiting_approval |
request gate |
executing |
all phases done | completed |
emit run.completed, write final report |
executing |
unrecoverable error | failed |
emit run.failed |
executing |
manual pauseRun |
paused |
set paused_from_state='executing' |
planning |
manual pauseRun |
paused |
set paused_from_state='planning' |
paused |
resume | paused_from_state |
emit run.resumed, clear paused_from_state |
| any non-terminal state | abortRun |
aborted |
emit run.aborted, dispose sessions |
Non-terminal states for abortRun:
createdboundplanningawaiting_approvalexecutingpaused
13.2 Run Phase State
States:
pendingrunningawaiting_artifactvalidatingawaiting_approvalcompletedfailedskipped
Transitions:
| From | Trigger | To |
|---|---|---|
pending |
start | running |
running |
prompt sent, artifact expected | awaiting_artifact |
awaiting_artifact |
artifact appears | validating |
awaiting_artifact |
timeout | running after probe/repair, or failed after exhaustion |
validating |
valid | awaiting_approval if gate, else completed |
validating |
invalid | running after one repair, else failed |
awaiting_approval |
approve | completed |
awaiting_approval |
reject / abort | failed |
awaiting_approval |
request_changes | running, attempt + 1 |
Replay rules:
phase.started.payload.repair === truemarks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions,prompt.repaired, and must not start a third attempt.- Repair replay from
runningmay reuse an existingREADY/ bootstrapped session even iflast_prompt_hashstill contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet. - If phase state is
running, existing artifact files are never accepted unless the current prompt event (prompt.sentorprompt.repaired) for the current dedup key is already recorded. Replay without prompt proof treats existing files as stale. - If phase state is
running, session state isBUSY, andlast_prompt_hashmatches the current prompt but the matching prompt event is missing, replay waits for the artifact with the current file signature as the baseline. This preserves idempotency without validating a stale pre-existing artifact. - Baseline-protected waits must not synthesize durable prompt proof before the wait finishes. If replay crashes or is cancelled before validation, the next replay must still treat the existing artifact as baseline/stale unless real prompt proof already exists.
- If phase state is
validatingand no artifact row exists yet, replay re-reads and validates the currentexpectedArtifactPathinstead of treating the state as corruption. - If phase state is
validatingand artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current sessionlast_prompt_at; older rows are treated as stale previous-attempt outputs and the file is revalidated. - Session bootstrap DB row/state changes and
session.created/session.readyevents are written in one DB transaction after adapter start succeeds.
14. Approval State
States:
pendingapprovedrejectedchanges_requestedabortedpaused
14.1 Transitions
| From | Event | To | Side effects |
|---|---|---|---|
pending |
approve decision | approved |
insert decision row |
pending |
reject decision | rejected |
insert decision row; run -> failed |
pending |
request_changes decision | changes_requested |
insert decision row; increment attempt |
pending |
abort decision | aborted |
insert decision row; run -> aborted |
pending |
timeout | paused |
run -> paused; no decision row |
paused |
unpause | pending |
re-arm gate; no decision row |
| terminal states | any decision | unchanged | return 409 |
Rules:
- A
pendingrequest can transition to one non-pending state per pending epoch. - Terminal approval states reject further decisions.
pausedmay return topendingonly throughunpause.- Manual pause is run-level
pauseRun; it leaves approval gate inpending. - Only
approve,reject,request_changes, andabortcreateapproval_decisionsrows. - Default timeout is null.
- Timeout never auto-approves or auto-rejects.
14.2 Decision Idempotency
- GUI:
- UUIDv4 per click.
- reused across automatic UI retries for the same logical action.
- CLI:
- UUIDv4 per invocation.
--client-token=<uuid>override for scripted retry.
- API:
- existing
(approval_request_id, action, client_token)returns existing row with status 200. - new decision inserts row and returns 201.
- same token with different action returns 409.
- decision on non-pending request returns 409.
- existing
14.3 Destructive Command Enforcement
Devflow-direct commands have hard enforcement. TUI-agent commands have best-effort enforcement.
Hard-blocked Devflow-direct patterns:
rm -rfgit reset --hardgit cleangit push --forcegit push --force-with-leasegit worktree remove --forcegit branch -Ddocker volume rmdocker compose down -vDROP DATABASEDROP SCHEMA- migration rollback
- reads/writes touching
.env*,~/.ssh/,~/.aws/,~/.config/gcloud/,~/.kube/ - files matching
*token*,*secret*,*credentials*,*.pem,*.key
TUI-agent command enforcement is best-effort:
- Prelude prohibits destructive operations.
- Backend permission mode is set to safest available mode.
- Transcript audit captures post-hoc evidence.
- Human intervention goes through
devflow attach. - Worktrees and branches are preserved by default.
v1 does not claim real-time blocking of TUI-internal commands.
15. Run Engine and Temporal Contract
The M4 RunEngine contract is frozen before M5. M5 reimplements the same interface through Temporal.
15.1 Public API
interface RunEngine {
startRun(input: RunStartInput): Promise<{ runId: string }>;
signalApproval(
runId: string,
approvalRequestId: string,
action: ApprovalDecisionAction,
clientToken: string,
comment?: string
): Promise<void>;
pauseRun(runId: string): Promise<void>;
resumeRun(runId: string): Promise<void>;
abortRun(runId: string, reason: string): Promise<void>;
getStatus(runId: string): Promise<RunStatus>;
}
15.2 Temporal Shape
- Namespace:
devflow. - Task queue:
devflow-runs. - Single worker process:
apps/worker. - Workflow:
runWorkflow(input: RunStartInput). - Signals:
approvepauseresumeabortunpause
- No Updates in M5.
- Status is read from DB.
Activities:
- M5 compatibility activity surface:
prepareRunActivity(input)lockBindingsActivity(runId)failRunActivity(runId, reason)advanceRunActivity(runId)signalApprovalActivity(runId, approvalRequestId, action, clientToken, comment?)pauseRunActivity(runId)resumeRunActivity(runId)abortRunActivity(runId, reason)getStatusActivity(runId)isRunTerminalActivity(runId)composeFinalReportActivity(runId)
advanceRunActivityis the M5 parity wrapper over M4 phase advancement. It may internally perform prompt send, artifact wait/validation, event recording, and approval request creation through the same DB/idempotency contracts already locked in sections 8 through 14.- The granular activity split (
sendPromptToSession,waitForArtifact,validateArtifact,recordEvent,requestApproval,runCommand) is deferred to a later hardening ADR. It is not an M5 acceptance gate. - Prompt/session mutation still occurs only inside worker-hosted activities through SessionManager. M5+ API code never mutates
SessionAdapterdirectly.
Retry policy:
- Default: max attempts 3, exponential backoff start 1s, max 30s.
composeFinalReportActivity: max attempts 1.- Activity-level failures serialize
DevflowError; non-recoverable Devflow errors are rethrown as non-retryable Temporal failures. advanceRunActivityis cancellation-aware and idempotent by DB state, event idempotency keys, prompt dedup keys, and artifact content keys.- Already-applied approval signal replay repairs missing final reports for every terminal run state:
completed,failed, andaborted, regardless of whether the replayed approval action wasapprove,request_changes,reject, orabort. - API-side already-applied approval replay is report-repair only. It must not call
SessionAdaptermutation methods; reject/abort session disposal belongs to the worker/session-manager path that originally applies the decision. - If a workflow closes before the API observes an approval signal result, closed-workflow settlement must first verify the requested decision was applied, then replay approval side effects, then wait for the terminal report.
15.3 Hard Constraints
- Workflow code holds only serializable state.
- No tmux handles in workflow state.
- No PTY refs in workflow state.
- No DB clients in workflow state.
- M5+ session interaction happens through activities calling SessionManager in
apps/worker. - M5+ API never calls mutating
SessionAdaptermethods. - SessionManager advisory lock prevents API/worker ownership conflict during M4 -> M5 transition.
- Workflow code uses deterministic clock/randomness only.
16. WriteSet and Worktree
16.1 WriteSet
- Each task declares
writeSet: string[]. - Patterns are relative to repo root.
- Glob engine:
fast-glob. - Options:
{
cwd: worktreeRoot,
dot: true,
followSymbolicLinks: false,
onlyFiles: true,
suppressErrors: false
}
Conflict detection:
- Expand writeSets.
- Forbidden globs cause conflict if matched by more than one task:
pnpm-lock.yamlpackage-lock.json**/migrations/****/*.generated.*- root
tsconfig*.json biome.jsonlefthook.yml.github/**.gitlab-ci.yml
- Pairwise file intersections must be empty.
Conflict creates parallel_dag_approved gate.
16.2 Worktree Lifecycle
- Worktree root:
WORKSPACE_ROOT/<runId>/<laneId>- non-parallel main lane:
WORKSPACE_ROOT/<runId>/main
- Created via
git worktree add. - Branch name:
devflow/<runId>/<laneId>
- Terminal run state does not remove worktrees or branches.
- Output branches are deliverables.
- Disk growth is accepted.
- Cleanup is manual:
devflow cleanup <run-id> [--lane=<id>]
Cleanup:
- uses
git worktree removewithout--forceby default. - refuses dirty worktrees.
--forcerequires an additional gate.git branch -Dis destructive and gated.doctor --list-orphanslists only; it never removes.
17. SSE Contract
Endpoints:
GET /sse/runs/:runIdGET /sse/global
Heartbeat every 15 seconds.
Events:
| Event | Scope |
|---|---|
run.state_changed |
both |
run.event_appended |
run |
phase.state_changed |
run |
approval.created |
both |
approval.resolved |
both |
session.state_changed |
run |
transcript.chunk_appended |
run |
artifact.validated |
run |
Reconnect:
- Run-scoped
/sse/runs/:runId:Last-Event-IDis lastrun_events.seqfor that run.- server replays
run.event_appendedforseq > lastSeq. - derived non-
run.event_appendedSSE types are not replayed for historical rows; state is re-derived by fetch.
- Global
/sse/global:Last-Event-IDis last globalrun_events.id, becauserun_events.seqis only monotonic within a run.- fresh connects start at the latest global event id and emit only new summary events.
- reconnects replay rows with
id > lastId. - global stream emits only scope=
bothevents:run.state_changed,approval.created,approval.resolved. - global stream never emits
run.event_appended.
18. Errors
packages/core/src/errors.ts:
type ErrorClass = 'recoverable' | 'human_required' | 'fatal';
class DevflowError extends Error {
readonly class: ErrorClass;
readonly code: string;
readonly runId?: string;
readonly phaseId?: string;
readonly recoveryHint?: string;
readonly cause?: unknown;
}
Recoverable:
network_blippane_briefly_unresponsiveprompt_send_transientdb_serialization_retry
Human required:
artifact_invalid_after_repairartifact_timeout_exhaustedprompt_send_exhausteddestructive_command_blockedsecret_access_blockedbackend_unavailableno_eligible_personawriteset_conflictmerge_conflictobjective_not_metreview_dispute_unresolved
Fatal:
db_unreachableworkspace_permissionsinternal_state_corruptiontemplate_load_failedartifact_schema_unknownartifact_schema_load_failedmigration_pendingconfig_invalid
Mapping:
- recoverable -> retry; exhausted -> human_required.
- human_required / recovery gate -> run paused and gate created. This is distinct from normal workflow approval gates in §13.1, which use
awaiting_approval. - fatal -> run failed, sessions disposed, final report best-effort.
19. Concurrent Runs and Crash Recovery
19.1 Active Run Uniqueness
MAX_CONCURRENT_RUNS, default 4.- DB partial unique index is the source of truth:
- one active run per
(repo_path, base_branch).
- one active run per
repo_pathis canonicalized before insert.- Advisory lock is auxiliary only:
pg_try_advisory_xact_lock(hash64('devflow:start-run', repoPath, baseBranch))
- Unique-index violation returns:
{ "currentRunId": "...", "currentState": "..." }
with HTTP 409.
19.2 Crash Recovery
M4, no Temporal:
- On
apps/apistartup, sweep non-terminal runs. - Mark them
failed. final_report_path = null.- Append synthesized
run.failedwith reasonprocess_restart_unrecovered. - Cascade associated
tui_sessionstoFAILED_NEEDS_HUMAN. - Append
session.failed. - This frees active-run uniqueness slots.
M5+:
- No sweep.
- Temporal durability owns in-flight workflow recovery.
- SessionManager resumes tmux sessions.
- Active-run partial index blocks duplicate runs until completion or explicit abort.
20. Milestones
M1: Monorepo + Postgres + CLI Doctor
- Scaffold workspace.
- Add pnpm, tsconfig, biome, lefthook, Vitest.
- Add Docker Compose for Postgres.
- Add Drizzle and first migration.
- Add
devflow doctor. - Implement checks 1-9.
- Stub checks 10-12 as warn where needed.
- Add SSE compatibility smoke test:
- minimal Fastify 5 server.
fastify-sse-v2plugin.- 30-second integration test.
- receive 3 events and reconnect.
- if plugin fails, implement native
reply.rawSSE helper before M1 is green.
M2: Core Schema + Registry + Binding
- Implement enums.
- Implement canonical hashing.
- Implement Template schema.
- Implement Persona schema.
- Implement seed loader.
- Implement binding algorithm.
- Implement artifact schema registry.
- Add first schemas:
dev/spec@1dev/phase-plan@1common/final-report@1
- Tests:
- schema validation.
- override semantics.
- risk enforcement.
- diversity enforcement.
- deterministic auto-select.
M3: Fake Session Runtime
- Implement
SessionAdapter. - Implement
FakeSessionAdapter. - Implement prompt envelope.
- Implement event recorder.
- Implement fake sentinel scenarios.
- Persist transcript chunks.
- Tests:
- prompt correlation.
- artifact validation.
- invalid artifact.
- timeout.
- fake crash.
M4: Minimal Run Engine
- Implement
packages/run-engine. - Used directly by
apps/api. - No Temporal.
- Supports:
- start run.
- lock bindings.
- approval.
- fake prompt.
- artifact wait/validate.
- final report.
- Freeze the
RunEnginecontract. - Full fake
development@1minus reviewers.
M5: Temporal Integration
- Reimplement
RunEnginethrough Temporal. - Preserve M4 behavior.
- Add parity tests using the same M4 scenarios.
- M5+ SessionManager lives in
apps/worker.
M6: Real tmux SessionManager
- Implement
TmuxSessionAdapter. - Decoupled from M5.
- May begin after M3 is stable.
- Pre-M5 real tmux is opt-in smoke only.
- Production run path remains fake until both M5 and M6 are green.
M7: TUI Recovery State Machine
- Implement session state transitions.
- Implement recovery counters.
- Implement escalation to human gates.
M8: API + GUI Minimum
- Implement Fastify routes.
- Implement SSE.
- Implement GUI screens:
- Dashboard.
- Templates.
- Personas.
- New Run.
- Run Detail.
- Approvals.
- TUI Sessions.
M9: development@1 Fake-Agent Full Run
- Add curated
development@1. - Add review consensus.
- Add verifier flow with fake reviewers.
- Add coverage gate >=70% lines for core/session/run-engine.
M10: Codex/Claude Opt-In Real Run
- Implement profiles:
packages/session/src/profiles/codex.tspackages/session/src/profiles/claude.ts
- Real backends become production-default only after both M5 and M6 are green.
- Until then real tmux/Codex/Claude are developer-flagged opt-in smoke only.
M11: Parallel Lanes
- Add task DAG scheduler.
- Add writeSet detection.
- Add per-lane worktrees.
- Add merge coordinator.
- Add conflict gates.
M12: Backtest Workflow
- Add
backtest-strategy@1. - Add objective evaluator.
- Add metric parser extension points.
- Add failure mining artifacts.
- Add Backtest Lab GUI.
M13: Template Factory
- Generate draft template from natural language and repo discovery.
- Add harness design.
- Add template review.
- Add dry-run and promote flow.
21. Out of Scope
- Authentication.
- Authorization.
- Multi-user support.
- Data retention or archival policy.
- Observability dashboards.
- Remote template/persona registries.
- Multi-machine deployment.
- HA.
- Managed backups.
- Web ingress.
- TLS.
- Reverse proxy.
22. Decision Log
Open Questions Closed
| # | Question | Resolution |
|---|---|---|
| OQ-1 | Persona/template seeding format | Immutable YAML at docs/schemas/{personas,templates}/<name>@<version>.yaml |
| OQ-2 | Approval timeout default | null; timeout freezes only |
| OQ-3 | Final report format | Markdown and JSON |
| OQ-4 | Temporal namespace/queue | namespace devflow, task queue devflow-runs |
| OQ-5 | WriteSet glob engine | fast-glob |
| OQ-6 | Backtest objective DSL | Stub in M12, full DSL deferred |
| OQ-7 | Codex/Claude prompt prelude | Structure locked, exact text deferred to M10 |
Blocking Corrections Applied
| # | Issue | Resolution |
|---|---|---|
| CC-1 | Terminal state deleted worktrees/branches | Preserve by default; manual gated cleanup only |
| CC-2 | SessionManager location conflict | M4 API, M5+ worker |
| CC-3 | Event duplicates under retry | run_events.idempotency_key |
| CC-4 | Destructive command enforcement overclaimed | Devflow-direct hard, TUI best-effort |
| CC-5 | UUID extension missing | CREATE EXTENSION IF NOT EXISTS pgcrypto |
| CC-6 | Advisory lock not enough for active-run uniqueness | partial unique index |
| CC-7 | Undefined transition sequence in event keys | cause-based keys |
| CC-8 | Approval paused transition missing | explicit approval transition table |
| CC-9 | AutoSelect order nondeterministic | deterministic sort |
| CC-10 | SSE plugin compatibility assumed | M1 smoke + native fallback |
| CC-11 | ApprovalAction included pause | split ApprovalDecisionAction; pauseRun is run-level |
| CC-12 | Artifact hash key collision | include phase id and path |
| CC-13 | Resume previous state not stored | runs.paused_from_state |
| CC-14 | repo path aliasing | canonical realpath storage |
| CC-15 | M4 sweep left tmux sessions ambiguous | cascade session state to FAILED_NEEDS_HUMAN; real tmux production-default only after M5+M6 |
| CC-16 | Prompt hash used phaseId but envelope uses phaseKey | prompt hash uses phaseKey |
| CC-17 | abortRun transition too narrow | abort from any non-terminal run state |
| CC-18 | approval pending transition wording conflicted with pause epoch | pending can transition once per pending epoch; paused may unpause to pending |
| CC-19 | tsc -b --noEmit is brittle with TypeScript 5.6 project references on clean worktrees |
build still uses tsc -b; no-emit verification uses root tsconfig.typecheck.json |
| CC-20 | sendPrompt retry count was ambiguous against Temporal activity attempts |
§8.3 now states retry budget means initial attempt plus retries; §15.2 remains Temporal-level attempts only |
| CC-21 | Duplicate prompt dedup handling conflicted with adapter retry idempotency | duplicate dedupKey returns idempotent success without reprocessing |
| CC-22 | Normal workflow approval gates and human-required recovery gates were easy to conflate | §13.1 names normal workflow gates; §18 keeps human_required recovery gates paused |
| CC-23 | Phase start and event append could diverge under retry/error | phase start and phase.started append occur in one DB transaction |
| CC-24 | Repair attempt replay lost repair prompt identity and one-repair budget | repair attempts are derived from phase.started.payload.repair, replay uses repair instructions and prompt.repaired, and cannot start attempt 3 |
| CC-25 | validating replay failed if crash happened before artifact row insert |
replay revalidates the expected artifact file when state is validating but no artifact row exists |
| CC-26 | Session bootstrap state/events could diverge | session row/state and session.created / session.ready events are committed in one DB transaction |
| CC-27 | validating replay could reuse stale previous-attempt artifact rows |
artifact-row replay requires artifact.created_at >= tui_sessions.last_prompt_at; otherwise the file is revalidated |
| CC-28 | repair running replay rejected existing READY sessions with previous attempt prompt hash |
current-attempt repair prompt is considered unsent, so replay may reuse the session and send prompt.repaired |
| CC-29 | API Temporal approval replay omitted M4 approval side-effect repair | API approval signal reader now wires replayAppliedApprovalSideEffects, so already-applied terminal approval replays can repair missing final reports |
| CC-30 | running replay could validate stale artifacts without prompt proof |
running replay requires matching prompt event proof; BUSY replay without prompt event uses current artifact signature as baseline and ignores stale files |
| CC-31 | M5 activity list over-specified granular activities not implemented by the M4 parity adapter | M5 locks the compatibility activity wrapper surface; granular activity split is deferred to a later hardening ADR |
| CC-32 | Already-applied approve / request_changes replay repaired missing reports for completed / failed but missed aborted |
approval replay side-effect repair now composes missing final reports for all terminal states |
| CC-33 | API-side already-applied reject / abort replay tried to dispose sessions through DB-only replay validation runtime |
API replay side effects are report-repair only; worker-side decision application owns session disposal |
| CC-34 | Closed-workflow approval settlement waited for reports but did not replay approval side effects | settlement now verifies the requested decision, replays side effects, then waits for the terminal report |
| CC-35 | Baseline-protected BUSY replay recorded synthetic prompt proof before the baseline wait was durable | baseline replay no longer records synthetic prompt events; replay without real prompt proof keeps treating existing files as stale |
| CC-36 | SSE reconnect wording used per-run seq for global stream even though seq is not globally monotonic |
/sse/runs/:runId uses per-run seq; /sse/global uses global run_events.id and emits only scope=both summary events |
| CC-37 | Run SSE replay could emit historical derived events after the first page | run SSE drains historical rows up to a high-water seq with only run.event_appended, then switches to live derived events |
| CC-38 | Normal phase start changed run state to planning / executing without a summary event source |
phase.started payload includes runState; SSE derives run.state_changed from that live event |
Future Open Questions
- FOQ-1, M12: full backtest objective DSL.
- FOQ-2, M13: template factory generation prompts.
- FOQ-3, post-M10: optional third backend such as Gemini.
- FOQ-4, post-M8: WebSocket vs SSE if transcript pressure requires it.
23. Kickoff Order
- M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
- M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
- M1.3:
apps/cliskeleton +devflow doctor. - M1.4:
packages/coreskeleton with config, enums, errors, hash, prompt-envelope, run-event types. - M2.1: Zod schemas for Template/Persona, persona YAML loader, hashing.
- M2.2: Binding algorithm + tests.
- M2.3: Artifact schema registry + first three schemas.
- M3.1:
SessionAdapterinterface +FakeSessionAdapter. - M3.2: Transcript chunk capture + DB persistence.
- M3.3: engine-shaped harness running a single fake phase end-to-end.
- M4: assemble run engine; lock contract; full fake
development@1minus reviewers. - M5 in parallel with M6 once M4 is green.