Files

chungyeong 1d0dfb273b docs: patch plan.md to v4 r1 (Python rewrite spec) + .gitignore node_modules

plan.md v4 r1 patches (per plan-v4-draft.md §0/§1/§2/§3/§8.5/§18/§22/§23):

- §0 header: v3 r13 → v4 r1 + note explaining the language migration. v3 CC
  counter frozen at CC-39; v4 begins its own series (DR-1 below).
- §1 Stack Decisions: full rewrite for Python (uv / pydantic v2 /
  pydantic-settings / SQLAlchemy 2 async + aiosqlite / typer + prompt_toolkit
  / structlog / FastAPI + sse-starlette).
- §2 Directory Layout: collapse v3 multi-package monorepo → single
  `my-deepagent/` project. TS `apps/`, `packages/`, `tests/`, `scripts/` are
  gone after `0e61b2d`.
- §3 doctor: 13-check (Node/pnpm/Docker/Drizzle) → 8-check (python/uv/git/
  workspace_root/config+governance/openrouter_api_key/openrouter_ping+pricing
  upsert/disk+sqlite integrity).
- §8.5 OpenRouter Adapter: full rewrite. v3 marker-extraction HTTP adapter
  (CC-39) is superseded by the deepagents 0.6.1 multi-turn tool-using agent
  driven by `my_deepagent.session.build_agent`. Native write_file/read_file/
  bash via LocalShellBackend; SafetyShellMiddleware enforces destructive
  command + deny-path policy; ArtifactWatcherMiddleware observes artifact
  writes; CostMiddleware records usage. Known v0.1.0 limits documented:
  usage_metadata empty on OpenRouter-forwarded responses, Anthropic-via-
  OpenRouter tool_calls.args ValidationError requires DeepSeek workaround.
- §18 Errors: add `token_budget_exceeded` and `tool_quota_exceeded` under
  human_required.
- §22 Decision Log: add DR-1 "v3 → v4 major bump" with rationale, scope,
  recovery path (pre-python-rewrite tag at c9fed71).
- §23 Kickoff Order: v3 historical order preserved + v4 Python step matrix
  showing Step 0~12 + Step 15 DONE, Step 13/14 (tmux/TUI recovery) DEFERRED.

§4~§17 (DB schema, enums, hashing, template/persona/binding, session
runtime, prompt envelope, artifact schema registry, run events, fake
adapter, state machines, approval state, run engine + Temporal contract,
WriteSet/worktree, SSE contract) are language-neutral domain spec and remain
unchanged for the Python implementation.

.gitignore: re-add `node_modules/` (legacy Node tree kept ignored until
`rm -rf` cleanup outside git).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 17:16:47 +09:00

69 KiB

Raw Blame History

Devflow Implementation Plan v4 r1

0. Document Status

v4 r1: language migration TS → Python. Major version bump; the TypeScript monorepo (apps/, packages/, tests/, scripts/, pnpm/tsconfig metadata) was deleted in 0e61b2d after being re-implemented under my-deepagent/. v3 CC counters are preserved as historical context; v4 begins its own CC series (DR-1 below; CC-Py-1 onward as new change clarifications land).
This document supersedes v2 and all earlier v3 drafts where conflicting.
Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
Target OS: macOS 13+ / Linux. No Windows.
All paths are Unix-style. All times are stored UTC.
Decisions in this document are locked unless explicitly marked (provisional). Override requires updating this document, not only code.
§1 Stack Decisions, §2 Directory Layout, §3 doctor checklist, §22 Decision Log have been rewritten for v4 r1. §4~§17 (DB schema, enums, hashing, template/ persona/binding, session runtime, prompt envelope, artifact registry, run events, fake adapter, state machines, errors, write set, SSE contract) are language-neutral domain spec and remain valid for the Python implementation.
v3 CC history (informational):
r1 applied CC-1 through CC-5.
r2 applied CC-6 through CC-10.
r3 applied CC-11 through CC-15.
r4 applies CC-16 through CC-18.
r5 applies CC-19.
r6 applies CC-20.
r7 applies CC-21 through CC-23.
r8 applies CC-24 through CC-26.
r9 applies CC-27 through CC-28.
r10 applies CC-29 through CC-31.
r11 applies CC-32.
r12 applies CC-33 through CC-35.
r13 applied CC-39 (final v3 revision; superseded by v4 r1).

1. Stack Decisions

1.1 Workspace

Python 3.12+, managed by uv workspaces (uv sync, uv add, uv run).
Pinned via .python-version. No Node, no pnpm, no tsc.
pyproject.toml at repo root + per-package pyproject.toml under packages/<name>/ (uv workspace members).
Imports are absolute. No from . import *.

1.2 Tooling

Concern	Choice	Notes
Lint / format	ruff	One root `ruff.toml`. `ruff check .` + `ruff format --check .`.
Type check	mypy --strict	`mypy.ini` enables strict mode; tests relax `disallow_untyped_defs`.
Test	pytest + pytest-asyncio + pytest-httpx + respx	`pytest -q`.
Pre-commit	pre-commit (`.pre-commit-config.yaml`)	Runs ruff + mypy + pytest --collect-only.
Schema validation	pydantic v2 + pydantic-settings	Replaces zod.
YAML	PyYAML	Persona/template YAML loaders.
JSON Schema	jsonschema (2020-12)	Artifact registry.
HTTP client	httpx (async)	OpenRouter / pricing fetch.
Logging	structlog + rich	Replaces pino. `_scrub_processor` redacts secrets before stderr / JSON sinks.
CLI	typer + prompt_toolkit	Replaces commander; prompt_toolkit drives the interactive REPL.
OS dirs	platformdirs	XDG data / state / config dirs.
Secrets	keyring	macOS Keychain / Linux Secret Service / Windows Credential Store.

1.3 Database

SQLite 3 (WAL mode) via aiosqlite, ORM: SQLAlchemy 2.0 async.
Migrations: Alembic (baseline + per-feature revisions).
WAL + busy_timeout=5000 + PRAGMA foreign_keys=ON enforced at connect.
Postgres (the v3 default) is parked: single-machine + single-user removes the multi-process concurrency justification, and aiosqlite + the ux_active_run_repo_base partial unique index covers the active-run uniqueness invariant. Postgres can be reinstated for multi-tenant later.

1.4 Logging

structlog, JSON sink to stderr by default, rich pretty sink when stdout is a TTY.
Standard fields: time, level, module, run_id?, phase_id?, role?, event_id?, interactive_session_id?.
_scrub_processor redacts OpenRouter / Anthropic / OpenAI / LangSmith / GitHub / GitLab API keys and generic Bearer … tokens before emission.
Levels: same semantics as v3 (trace/debug/info/warn/error).

1.5 Config

Single pydantic-settings BaseSettings in my_deepagent.config.Config with MYDEEPAGENT_ env prefix and optional TOML source.
Source precedence (high → low): explicit overrides → os.environ (with MYDEEPAGENT_ prefix) → .env → config.toml → schema defaults.
Config is loaded once at process start, validated, frozen, and re-exported as an immutable typed Config.
Validation failure is fatal (exit code 2).
Required keys at v0.1.0:
- MYDEEPAGENT_DATABASE_URL (default sqlite+aiosqlite:///<state_dir>/db.sqlite3)
- MYDEEPAGENT_WORKSPACE_ROOT
- MYDEEPAGENT_LOG_LEVEL
- MYDEEPAGENT_OPENROUTER_API_KEY when the OpenRouter backend is enabled (resolution order: config → env → OS keyring → error).
Path canonicalization: workspace_root is resolved via Path.resolve() at config load. Any path entering the system is canonicalized before storage or hashing.

Backend registration (deepagents-flavored):

class BackendConfig(BaseModel, frozen=True):
    id: Backend                 # openrouter | anthropic | openai | google | fake
    enabled: bool
    api_base_url: str | None = None  # openrouter default https://openrouter.ai/api/v1
    api_key_env: str | None = None   # default MYDEEPAGENT_OPENROUTER_API_KEY

fake is always available.
openrouter is available only when enabled and the resolved key is present.
Doctor warns on misconfig; binding fails fast at run start with human_required:backend_unavailable.

1.6 HTTP / SSE

FastAPI + uvicorn + sse-starlette for the M8-Py REST + SSE surface (v3 r13 §17 contract unchanged: same event types, same headers, same data: <json>\n\n framing).
Body validation via the same pydantic v2 models used elsewhere.
WebSocket remains deferred unless SSE fails under transcript volume.

2. Directory Layout

v4 r1 collapses the v3 multi-package monorepo into a single my-deepagent/ project. The TS apps/, packages/, tests/, scripts/ trees were deleted in 0e61b2d; v3 §4~§17 module-by-module spec still applies but each module now lives under my_deepagent/<name>.py instead of packages/<name>/src/<name>.ts.

<repo-root>/
├── docs/
│   ├── plan.md                          # this document
│   ├── plan-v4-draft.md                 # v4 r1 design memo (informational)
│   └── schemas/
│       ├── artifacts/                   # JSON Schema 2020-12 (language-neutral)
│       ├── personas/                    # YAML persona seed (language-neutral)
│       └── templates/                   # YAML workflow templates
├── docker-compose.yml                   # Postgres + Temporal (still relevant for M5-Py)
├── .env.example
├── .gitignore
├── my-deepagent-seed/                   # v0.1.0 bootstrap kit (historical, may be pruned)
└── my-deepagent/
    ├── pyproject.toml                   # uv workspace root
    ├── uv.lock
    ├── ruff.toml
    ├── mypy.ini
    ├── alembic.ini
    ├── .pre-commit-config.yaml
    ├── CHANGELOG.md
    ├── alembic/
    │   ├── env.py
    │   └── versions/                    # baseline + per-feature migrations
    ├── docs/schemas/                    # mirror of repo-root docs/schemas for loader convenience
    ├── src/my_deepagent/
    │   ├── config.py                    # pydantic-settings Config (replaces §1.5 zod schema)
    │   ├── enums.py                     # closed-set enums (§5)
    │   ├── errors.py                    # error taxonomy (§18)
    │   ├── hash.py                      # content-addressed hashing (§6)
    │   ├── persona.py                   # Persona + loader (§7.2)
    │   ├── workflow.py                  # WorkflowTemplate + loader (§7.1)
    │   ├── binding.py                   # autoSelect / override / consent store (§7.4)
    │   ├── artifact_schema.py           # JSON Schema 2020-12 registry (§10)
    │   ├── run_event.py                 # event types + idempotency keys (§11, §13.1)
    │   ├── prompt_envelope.py           # envelope builder (§9)
    │   ├── budget.py                    # BudgetTracker (v4-new)
    │   ├── secrets.py                   # config → env → keyring resolution chain
    │   ├── keys.py                      # OS keyring wrapper
    │   ├── audit.py                     # append-only JSONL audit log (v4-new)
    │   ├── logging.py                   # structlog + secret scrubber (§1.4)
    │   ├── governance.py                # first-run consent (v4-new)
    │   ├── i18n/                        # ko / en catalog
    │   ├── recovery.py                  # sweep_orphan_runs (§19)
    │   ├── session.py                   # deepagents adapter (§8.5, v4-new)
    │   ├── engine.py                    # WorkflowEngine — phase loop (§15)
    │   ├── persistence/
    │   │   ├── db.py                    # SQLAlchemy 2 async engine
    │   │   ├── models.py                # ORM models (§4)
    │   │   └── checkpointer.py          # LangGraph SqliteSaver context
    │   ├── middleware/
    │   │   ├── cost.py                  # CostMiddleware (v4-new)
    │   │   ├── budget.py                # BudgetMiddleware (v4-new)
    │   │   ├── audit.py                 # AuditToolMiddleware
    │   │   ├── safety.py                # SafetyShellMiddleware (deny-path / destructive command)
    │   │   └── artifact_watcher.py      # ArtifactWatcherMiddleware
    │   ├── monitoring/
    │   │   ├── pricing.py               # OpenRouter pricing cache
    │   │   └── cost_estimator.py        # pre-run preview
    │   ├── cli/                         # typer-driven CLI
    │   │   ├── main.py                  # entry (interactive REPL when no subcommand)
    │   │   ├── doctor.py                # §3 doctor checks (Python/uv version)
    │   │   ├── init.py
    │   │   ├── keys_cmd.py
    │   │   ├── run.py
    │   │   ├── runs.py
    │   │   ├── stats.py
    │   │   └── interactive.py           # prompt_toolkit REPL
    │   ├── tui/
    │   │   └── approval.py              # tri-state approval prompt
    │   └── slash.py                     # REPL slash commands
    └── tests/
        ├── unit/                        # pure-Python unit tests
        └── integration/                 # async + persistence + real OpenRouter (gated)

Future trees deferred:

apps/api/, apps/worker/ (M5-Py / M8-Py): FastAPI app and temporalio worker. v4 r1 keeps them out until M5 lands.
apps/web/: Web GUI port is out of scope for v0.1.0 (separate milestone).

3. `mydeepagent doctor`

Exit codes:

0: all green.
1: one or more red checks.
2: internal or unknown error.

Each check emits:

name
status: pass | fail | warn
detail
remediation

Closed check list (v4 r1, 8 checks — Node/pnpm/Docker/Drizzle dropped):

python: python --version satisfies >=3.12,<3.14.
uv: uv --version resolves (any).
git: git --version >=2.40.
workspace_root: MYDEEPAGENT_WORKSPACE_ROOT exists, is a directory, and is writable.
config+governance: Config loads from env + .env + config.toml without ValidationError; first-run governance consent file exists (or is created interactively on first run only).
openrouter_api_key: resolution chain (config → env → OS keyring) yields a non-empty value. Warn-only when the OpenRouter backend is not enabled.
openrouter_ping + pricing upsert: GET https://openrouter.ai/api/v1/models with the bearer key.
- 200 → pass; pricing rows are upserted into model_pricing for use by the mydeepagent run cost preview.
- 401 → fail.
- any other non-200 / network error → warn.
disk+sqlite integrity:
- Free disk on the workspace_root partition: warn under 10 GB, fail under 2 GB, green target ≥ 5 GB.
- SQLite DB file (if present) opens and PRAGMA integrity_check returns ok.

Output:

Rich human table by default.
--json for machine-readable output.
--quiet prints only nonzero results.

Notes:

tmux / Docker / Postgres / pg_isready / drizzle migration checks from v3 §3 are dropped in v4 r1 — the v0.1.0 runtime is SQLite-only and tmux is out of scope for the deepagents-driven session model.
--list-orphans and friends are owned by mydeepagent runs list/show (§19).

4. Database Schema

First migration prelude:

CREATE EXTENSION IF NOT EXISTS pgcrypto;

All tables use gen_random_uuid() primary keys unless noted. All times are timestamptz. Mutable rows include updated_at. JSON columns use jsonb.

4.1 `workflow_templates`

id uuid primary key default gen_random_uuid()
name text not null
version int not null
hash text not null unique
definition jsonb not null
created_at timestamptz not null default now()
unique (name, version)

4.2 `agent_personas`

id uuid primary key default gen_random_uuid()
name text not null
version int not null
hash text not null unique
definition jsonb not null
created_at timestamptz not null default now()
unique (name, version)

4.3 `runs`

id uuid primary key default gen_random_uuid()
template_id uuid not null references workflow_templates(id)
template_hash text not null
state text not null
repo_path text not null
- canonical absolute path
- resolved through fs.realpathSync before insert
base_branch text not null
worktree_root text not null
- canonical absolute path under WORKSPACE_ROOT/<runId>/
current_phase_id uuid references run_phases(id) nullable and deferrable
started_at timestamptz
ended_at timestamptz
final_report_path text
paused_from_state text
- set when transitioning to paused
- cleared on resume
- null when state is not paused
created_at timestamptz not null default now()
updated_at timestamptz

Active-run uniqueness:

CREATE UNIQUE INDEX ux_active_run_repo_base
ON runs (repo_path, base_branch)
WHERE state NOT IN ('completed', 'failed', 'aborted');

4.4 `run_inputs`

id uuid primary key default gen_random_uuid()
run_id uuid not null unique references runs(id) on delete cascade
requirements_md text not null
objective jsonb
extra jsonb
input_hash text not null

input_hash is based on:

requirements_md
objective
extra
canonical repo_path
base_branch

4.5 `run_bindings`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
role_id text not null
persona_id uuid not null references agent_personas(id)
persona_hash text not null
backend text not null
binding_hash text not null
unique (run_id, role_id)

4.6 `run_phases`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
phase_key text not null
seq int not null
state text not null
attempts int not null default 0
started_at timestamptz
ended_at timestamptz
unique (run_id, phase_key)

4.7 `run_events`

Append-only.

id bigserial primary key
run_id uuid not null references runs(id) on delete cascade
phase_id uuid references run_phases(id)
seq bigint not null
type text not null
payload jsonb not null
idempotency_key text not null
ts timestamptz not null default now()
unique (run_id, seq)
unique (run_id, idempotency_key)
index (run_id, ts)

Concurrency:

All inserts go through RunEventRepository.append().
Raw SQL inserts into run_events are forbidden.
append() takes pg_advisory_xact_lock(hash64('devflow:run-events', run_id)).
Inside that same transaction it assigns:

seq := COALESCE(MAX(seq), 0) + 1

4.8 `approval_requests`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id)
phase_id uuid references run_phases(id)
gate_key text not null
state text not null
idempotency_key text not null
payload jsonb not null
created_at timestamptz not null default now()
resolved_at timestamptz
unique (idempotency_key)

4.9 `approval_decisions`

Append-only and immutable.

id uuid primary key default gen_random_uuid()
approval_request_id uuid not null references approval_requests(id)
action text not null
- approve
- reject
- request_changes
- abort
comment text
decided_at timestamptz not null default now()
idempotency_key text not null unique

pause is not an approval decision.

4.10 `tui_sessions`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
role_id text not null
backend text not null
cwd text not null
expected_artifact_path text
expected_schema text
last_prompt_hash text
last_prompt_at timestamptz
last_capture_seq bigint not null default 0
last_known_pane_pid int
tmux_session text
tmux_window text
state text not null
recovery_attempts int not null default 0
unique (run_id, role_id)

4.11 `tui_transcript_chunks`

Append-only.

id bigserial primary key
session_id uuid not null references tui_sessions(id) on delete cascade
seq bigint not null
content text not null
captured_at timestamptz not null default now()
unique (session_id, seq)

4.12 `artifacts`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
phase_id uuid references run_phases(id)
path text not null
schema_id text not null
hash text not null
valid boolean not null
validation_error jsonb
created_at timestamptz not null default now()
unique (run_id, path, hash)

4.13 `commands`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
phase_id uuid references run_phases(id)
kind text not null
- git
- test
- e2e
- doctor
- backtest
- other
argv text[] not null
cwd text not null
exit_code int
stdout_path text
stderr_path text
started_at timestamptz
ended_at timestamptz

4.14 `review_findings`

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
phase_id uuid references run_phases(id)
reviewer_role text not null
severity text not null
- info
- low
- medium
- high
- critical
category text not null
- correctness
- evidence
- style
- security
- performance
- other
file_path text
line int
summary text not null
evidence text
verifier_status text not null default 'unverified'
- unverified
- confirmed
- rejected
created_at timestamptz not null default now()

4.15 Backtest Stub Tables

backtest_iterations and backtest_metrics are created at M1 as stub tables:

id uuid primary key default gen_random_uuid()
run_id uuid not null references runs(id) on delete cascade
payload jsonb
created_at timestamptz not null default now()

Full schema is deferred to M12.

5. Enums

All enums live in packages/core/src/enums.ts as TypeScript const objects and Zod enums.

5.1 `Backend`

codex
claude
fake
openrouter

openrouter is HTTP-based and has no tmux/PTY; see §8.5.

Future gemini support adds an enum entry and a BackendProfile; no design change.

5.2 `Capability`

spec_write
phase_planning
task_dag_planning
code_edit
test_first_development
code_review
evidence_check
command_execute
backtest_run
metric_extract
failure_mining
objective_eval
final_report_compose

5.3 `RiskLevel`

low
medium
high

Risk is declared per phase in the template. Persona has maxRiskLevel. Binding fails when phase.risk > persona.maxRiskLevel.

5.4 `ApprovalDecisionAction`

approve
reject
request_changes
abort

pause is a run-level control operation, not an approval decision.

5.5 `ApprovalState`

pending
approved
rejected
changes_requested
aborted
paused

paused is not an auto-decision.

5.6 `RunState`

created
bound
planning
awaiting_approval
executing
paused
completed
failed
aborted

5.7 `RunPhaseState`

pending
running
awaiting_artifact
validating
awaiting_approval
completed
failed
skipped

5.8 `SessionState`

CREATED
BOOTSTRAPPING
READY
BUSY
WAITING_FOR_APPROVAL
ARTIFACT_TIMEOUT
HUNG
CRASHED
RESUMING
REBOOTSTRAPPED
FAILED_NEEDS_HUMAN

6. Content-Addressed Hashing

6.1 Canonical JSON

Object keys sorted lexicographically by UTF-16 code units.
No insignificant whitespace.
Strings use standard JSON escaping.
No Unicode normalization.
Numbers use shortest round-trippable representation.
Integers have no decimal point.
No leading zeros.
Arrays preserve order.
No trailing newline.

packages/core/src/hash.ts exports:

canonicalize(value: unknown): string
hash(value: unknown): string

hash() returns sha256hex(canonicalize(value)).

6.2 Hash Subjects

Template hash:
- { name, version, roles, phases, gates, capabilitiesRequired }
Persona hash:
- { name, version, capabilities, backend, maxRiskLevel, allowedRoles, promptConfig, modelConfig }
Binding hash:
- { runId, roleId, templateHash, personaHash, backend, override }
Run input hash:
- { templateHash, bindings: sorted[bindingHash], requirementsMd, objective, repoPath, baseBranch, extra }
Prompt hash:
- { runId, roleId, phaseKey, expectedArtifact, expectedSchema, instructions, attempt }
Artifact hash:
- SHA-256 of file bytes.

Prompt hash uses phaseKey, not phaseId, because PromptEnvelope carries phaseKey.

7. Template, Persona, Binding

7.1 Template Schema

const TemplatePhase = z.object({
  key: z.string(),
  title: z.string(),
  risk: RiskLevel,
  roles: z.array(z.string()),
  expectedArtifact: z
    .object({
      path: z.string(),
      schema: z.string(),
    })
    .optional(),
  gates: z.array(z.string()).default([]),
  timeoutMs: z.number().int().positive().optional(),
});

const TemplateRole = z.object({
  id: z.string(),
  requiredCapabilities: z.array(Capability),
  preferredBackends: z.array(Backend).default([]),
  count: z.number().int().min(1).default(1),
  diversity: z
    .object({
      requireDifferentBackends: z.boolean().default(false),
    })
    .optional(),
});

const Template = z.object({
  name: z.string(),
  version: z.number().int().positive(),
  roles: z.array(TemplateRole),
  phases: z.array(TemplatePhase),
  defaultGates: z.array(z.string()).default([]),
});

7.2 Persona Schema

const Persona = z.object({
  name: z.string(),
  version: z.number().int().positive(),
  backend: Backend,
  capabilities: z.array(Capability),
  maxRiskLevel: RiskLevel,
  allowedRoles: z.array(z.string()).optional(),
  promptConfig: z
    .object({
      systemPrompt: z.string().optional(),
      instructionsPrelude: z.string().optional(),
    })
    .default({}),
  modelConfig: z.record(z.string(), z.unknown()).default({}),
});

modelConfig conventions:

Personas bound to openrouter MUST set modelConfig.model to a routable OpenRouter model id, e.g. anthropic/claude-sonnet-4-5, deepseek/deepseek-chat, meta-llama/llama-3.1-70b-instruct.
Other supported keys: maxTokens, temperature, topP. All optional.
For tmux-based backends (codex, claude, fake), modelConfig.model is informational only and MAY be omitted.
Binding fails fast with human_required:model_unavailable when an openrouter persona has no modelConfig.model.

7.3 Override Semantics

Override may swap persona for a role.
Override may constrain backend to a specific allowed backend.
Override cannot add capabilities.
Override cannot raise risk above persona maxRiskLevel.
Diversity rules apply after override.
Lock-time validation runs the full binding algorithm.
On first binding failure, the run does not start.

7.4 Binding Algorithm

For each role:

Select override persona if present; otherwise run autoSelect.
Assert backend is enabled in config.backends.
Assert non-fake backend binary resolved at process start.
Assert role id is in allowedRoles, unless allowedRoles is absent.
Assert required capabilities are a subset of persona capabilities.
Assert every phase using the role has risk <= persona maxRiskLevel.
Expand roles with count > 1 into roleId#0, roleId#1, etc.
Enforce diversity rules after expansion.
Compute and persist binding_hash per role instance.

autoSelect is deterministic. Sort candidates by:

role preferredBackends order.
persona.version desc.
persona.name asc.
persona.hash asc.

Personas whose backend is not in preferredBackends are eligible only if all preferred-backend personas fail capability or risk checks.

Binding fails with human_required:no_eligible_persona if no persona satisfies requirements.

7.5 Seeding

Personas:

docs/schemas/personas/<name>@<version>.yaml
filename encodes immutable identity.
loader parses with Persona schema.
loader computes personaHash.
loader upserts keyed by (name, version).
hash mismatch on an existing row is fatal.

Templates:

docs/schemas/templates/<name>@<version>.yaml
same immutable version rule.

Deleting a published file is allowed only when no run references that hash.

8. Session Runtime

8.1 SessionAdapter Interface

export interface SessionAdapter {
  start(input: StartInput): Promise<SessionHandle>;
  sendPrompt(handle: SessionHandle, envelope: PromptEnvelope): Promise<{ promptId: string }>;
  probe(handle: SessionHandle): Promise<ProbeResult>;
  resume(handle: SessionHandle): Promise<SessionHandle>;
  rebootstrap(handle: SessionHandle): Promise<SessionHandle>;
  capture(handle: SessionHandle, fromSeq: bigint): AsyncIterable<TranscriptChunk>;
  dispose(handle: SessionHandle): Promise<void>;
}

export interface StartInput {
  runId: string;
  roleId: string;
  backend: Backend;
  cwd: string;
  expectedArtifactPath?: string;
  expectedSchema?: string;
  envelopePrelude?: string;
}

export interface SessionHandle {
  sessionId: string;
  pid?: number;
  tmuxSession?: string;
  tmuxWindow?: string;
}

export interface ProbeResult {
  alive: boolean;
  paneActive: boolean;
  lastOutputAt?: Date;
  hint?: string;
}

export interface TranscriptChunk {
  seq: bigint;
  content: string;
  capturedAt: Date;
}

For HTTP backends (openrouter) the SessionHandle.pid, tmuxSession, and tmuxWindow fields are always undefined. See §8.5 for the HTTP adapter mapping.

8.2 Session State Machine

CREATED -> BOOTSTRAPPING -> READY
READY <-> BUSY
BUSY -> WAITING_FOR_APPROVAL
BUSY -> ARTIFACT_TIMEOUT
BUSY -> HUNG
BUSY -> CRASHED
HUNG | CRASHED | ARTIFACT_TIMEOUT -> RESUMING -> READY
RESUMING -> REBOOTSTRAPPED -> READY
exhausted errors -> FAILED_NEEDS_HUMAN

8.3 Recovery Counters

sendPrompt retry: 2.
- Means one initial send plus two adapter-level retries, three physical send attempts max.
resume retry: 2.
rebootstrap retry: 1.
artifact repair retry: 1.
max hung time: configurable; default 20 minutes.

Exhaustion creates a human gate with recoveryHint.

8.4 SessionManager Singleton

M4: hosted in apps/api.
M5+: hosted in apps/worker.
Only SessionManager may call mutating SessionAdapter methods.
Holds in-memory Map<sessionId, SessionHandle>.
Takes pg_advisory_lock(hash64('devflow:session-manager')).
Second instance exits code 3.
On start:
- query non-terminal tui_sessions.
- call adapter.resume(handle).
- success: place handle in map.
- failure: session -> FAILED_NEEDS_HUMAN, append session.failed, create recovery gate.
On SIGTERM/SIGINT:
- refuse new prompts.
- allow in-flight artifact polling up to 30s.
- persist last_capture_seq.
- release advisory lock.

8.5 OpenRouter Adapter — v4 r1 deepagents rewrite

Supersedes the v3 marker-extraction HTTP adapter (CC-39). In v4 the OpenRouter integration is a multi-turn, tool-using agent driven by LangChain deepagents 0.6.1 — no single-shot completions, no <<<DEVFLOW_ARTIFACT_*>>> markers, no transcript replay reconstruction.

Construction — my_deepagent.session.build_agent(persona, run_id, …):

llm = ChatOpenAI(
    model=persona.model,                  # e.g. "openrouter:deepseek/deepseek-chat"
    base_url=config.openrouter_api_base,  # https://openrouter.ai/api/v1
    api_key=resolve_openrouter_api_key(),
    timeout=persona.model_params.timeout,
)
agent = deepagents.create_deep_agent(
    model=llm,
    tools=[],                             # base tools come from LocalShellBackend
    instructions=persona.system_prompt,
    subagents=[_subagent_to_dict(s) for s in persona.subagents],
    middleware=[
        SafetyShellMiddleware(...),       # destructive command + deny-path guard
        AuditToolMiddleware(...),         # append-only JSONL audit log
        ArtifactWatcherMiddleware(...),   # write_file/edit_file detection
        CostMiddleware(...),              # usage_metadata + budget ledger
    ],
    backend=LocalShellBackend(            # bash + read_file + write_file + edit_file + ls
        cwd=worktree_root,
        # `permissions` kwarg is intentionally omitted for local_shell backend
        # (deepagents 0.6.1 NotImplementedError workaround — enforcement moves
        # to SafetyShellMiddleware).
    ),
)

Method mapping (driven by WorkflowEngine rather than a v3-style adapter interface):

Start: create_deep_agent returns a CompiledStateGraph per phase. No persistent session object is shared across phases — each phase is a fresh agent invocation parameterized by persona + envelope.
Send prompt: await agent.ainvoke({"messages": [HumanMessage(envelope)]}) where envelope is built by WorkflowEngine._build_envelope (§9 with the artifact JSON Schema inlined so the model sees the exact required fields).
Tool use: native read_file / write_file / edit_file / ls / bash calls are emitted by the model and dispatched through LocalShellBackend, recorded by AuditToolMiddleware, gated by SafetyShellMiddleware.
Probe / resume / rebootstrap / dispose: not applicable — the agent is ephemeral per phase. Crash recovery operates at the run/phase level via sweep_orphan_runs (§19), not at a session-adapter level.

Artifact production:

The model writes the artifact directly to expected_artifact_path via the write_file tool. ArtifactWatcherMiddleware observes the tool call and notifies the engine.
The envelope inlines the artifact's JSON Schema definition so the LLM has the exact required fields.
Schema validation is performed by ArtifactSchemaRegistry.validate on the written file (§10). On failure, the engine retries once with a repair prompt; second failure raises human_required:artifact_invalid_after_repair.

Error mapping (preserved from CC-39, applied per-call by the LangChain exception path):

HTTP 401 → human_required:backend_auth_failed.
HTTP 429 → recoverable:rate_limited (exponential backoff: 1 s, 2 s, 4 s, max 30 s, owned by langchain-openai retries).
HTTP 5xx → recoverable:network_blip.
HTTP 400 with model_not_found → human_required:model_unavailable.
BudgetTracker pre-call rejection → human_required:token_budget_exceeded.
SafetyShellMiddleware blocked tool call → human_required:tool_quota_exceeded.

Known v0.1.0 limitations:

usage_metadata is sometimes empty on responses forwarded by OpenRouter (deepagents wraps the underlying ChatOpenAI response so token counts may not surface). The recorder still fires and LlmCallRow is persisted, but input_tokens / output_tokens may read 0. v0.2 will probe additional response shapes (raw chunks / callbacks).
Anthropic models via OpenRouter currently fail with a tool_calls.args JSON-string vs dict ValidationError inside langchain-openai 1.2.1. Workaround: pin DeepSeek personas via BindingOverride. Tracking for v0.2.

9. Prompt Envelope

9.1 Wire Format

DEVFLOW_PROMPT_BEGIN <uuid>
Run: <run-id>
Role: <role-id>
Phase: <phase-key>
Attempt: <int>
Expected artifact: <absolute-path>
Expected schema: <schema-id>
Dedup-Key: <prompt-hash>
Instructions:
<freeform multi-line instructions>
DEVFLOW_PROMPT_END <uuid>

9.2 Schema

const PromptEnvelope = z.object({
  uuid: z.string().uuid(),
  runId: z.string().uuid(),
  roleId: z.string(),
  phaseKey: z.string(),
  attempt: z.number().int().nonnegative(),
  expectedArtifact: z.string(),
  expectedSchema: z.string(),
  dedupKey: z.string(),
  instructions: z.string(),
});

9.3 Rules

Prompt identity is dedupKey.
Adapter treats duplicate dedupKey for the same session within a run lifetime as idempotent success and does not reprocess the prompt.
attempt increments only when the engine intentionally re-sends after timeout or repair.
Adapter-level retry does not increment attempt.
Completion is never inferred from transcript text.
Completion requires a schema-valid artifact.

9.4 Backend Prelude

Sent once at session bootstrap before the first envelope.

Required structure:

Backend identity statement.
Persona instructionsPrelude.
Protocol declaration: completion is signaled only by writing expected artifact files.
Envelope marker contract.
Approval/probe contract: DEVFLOW_PROBE must respond with one line READY or BUSY <reason>.

Codex and Claude-specific addenda live in packages/session/src/profiles/{codex,claude}.ts and are populated at M10.

10. Artifact Schema Registry

10.1 Layout

JSON Schema 2020-12 documents live at:

docs/schemas/artifacts/<schema_id>.json

schema_id format:

<domain>/<name>@<version>

Examples:

dev/spec@1
dev/phase-plan@1
dev/dag@1
dev/review-finding-batch@1
bt/objective@1
bt/iteration-result@1
common/final-report@1

10.2 Loader

packages/core/src/artifact-schema.ts exports:

function loadSchema(id: string): JsonSchema;
function validateArtifact(
  id: string,
  data: unknown
): { ok: true } | { ok: false; errors: ValidationError[] };

Unknown schema id is fatal.

10.3 Validation Flow

Engine waits for expectedArtifactPath to appear.
Debounce 500ms after last mtime change.
Read file.
Compute SHA-256.
Validate against expectedSchema.
Valid:
- insert artifact row with valid=true.
- append artifact.validated.
- advance phase.
Invalid:
- insert artifact row with valid=false.
- append artifact.invalid.
- trigger one repair prompt.
- after repair exhaustion, create human gate.
Timeout:
- append artifact.timeout.
- probe session.
- enter recovery flow.

10.4 Final Report

At terminal run state, write atomically:

<WORKSPACE_ROOT>/<runId>/<runId>.report.md
<WORKSPACE_ROOT>/<runId>/<runId>.report.json

Both are written even on failed or aborted, best-effort.

common/final-report@1 minimum fields:

runId
templateHash
bindings[]
inputs
phases[]
approvals[]
findings[]
commands[]
artifacts[]
events.tail
unresolved[]
endedAt
status

10.5 Backtest Objective Stub

bt/objective@1:

{
  "targets": [
    { "metric": "sharpe", "op": "gte", "value": 1.5, "weight": 1.0 },
    { "metric": "mdd", "op": "lte", "value": 0.15, "weight": 1.0 }
  ],
  "stopWhen": "all"
}

op: gte | lte | eq | gt | lt
stopWhen: all | weighted
weighted threshold is hardcoded at 0.8 at M12.
Full DSL deferred to M12.

11. Run Events

Closed event types:

run.created
run.started
run.paused
run.resumed
run.completed
run.failed
run.aborted
phase.started
phase.completed
phase.failed
phase.skipped
prompt.sent
prompt.repaired
artifact.expected
artifact.validated
artifact.invalid
artifact.timeout
approval.requested
approval.resolved
session.created
session.ready
session.busy
session.idle
session.crashed
session.recovered
session.failed
command.started
command.completed
command.failed
review.batch_recorded
finding.verifier_resolved
backtest.iteration_started
backtest.iteration_completed
backtest.objective_evaluated

11.1 Idempotency Keys

Every event append requires deterministic idempotency_key.

Event family	Key formula
`run.created`, `run.started`, `run.completed`, `run.failed`, `run.aborted`	`<type>:<run_id>`
`run.paused`	`run.paused:<run_id>:<cause>`
`run.resumed`	`run.resumed:<run_id>:<cause>`
`phase.started`, `phase.completed`, `phase.failed`, `phase.skipped`	`<type>:<phase_id>:<phase_attempt>`
`prompt.sent`, `prompt.repaired`	`<type>:<prompt_dedup_key>`
`artifact.expected`, `artifact.timeout`	`<type>:<phase_id>:<phase_attempt>:<expected_path>`
`artifact.validated`, `artifact.invalid`	`<type>:<phase_id>:<expected_path>:<artifact_hash>`
`approval.requested`	`approval.requested:<approval_idempotency_key>`
`approval.resolved`	`approval.resolved:<approval_request_id>:<action>`
`session.created`, `session.failed`	`<type>:<session_id>`
`session.busy`, `session.idle`	`<type>:<session_id>:<prompt_dedup_key>`
`session.ready`, `session.crashed`, `session.recovered`	`<type>:<session_id>:<recovery_attempts>`
`command.started`, `command.completed`, `command.failed`	`<type>:<command_id>`
`review.batch_recorded`	`review.batch_recorded:<phase_id>:<reviewer_role>:<phase_attempt>`
`finding.verifier_resolved`	`finding.verifier_resolved:<finding_id>`
`backtest.iteration_started`, `backtest.iteration_completed`, `backtest.objective_evaluated`	`<type>:<iteration_id>`

Definitions:

phase_attempt is incremented before event append.
recovery_attempts is incremented before event append.
prompt_dedup_key is the envelope dedup key.
approval_idempotency_key is from approval_requests.
Artifact expected/timeout events are per-attempt.
Artifact validated/invalid events are content-keyed by path + hash.

12. Fake Session Adapter

12.1 Behavior

Deterministic.
In-process.
No PTY.
No tmux.
Drives engine end-to-end without real backends.

12.2 Sentinel Triggers

On sendPrompt, inspect expectedSchema.

Fixture path:

tests/fixtures/fake-artifacts/<expectedSchema>/<scenarioName>.json

scenarioName comes from instruction header:

Scenario: <name>

Default scenario: ok.

Scenarios:

ok: write fixture to expectedArtifactPath after 50ms by default.
invalid: write deliberately schema-invalid payload.
timeout: never write.
crash: throw RecoverableError.

12.3 Transcript

Fake adapter emits chunks such as:

[fake] received prompt <uuid>; will write <path> in 50ms

13. State Machines

13.1 Run State

States:

created
bound
planning
awaiting_approval
executing
paused
completed
failed
aborted

Transitions:

From	Trigger	To	Side effects
`created`	`lockBindings ok`	`bound`	persist bindings; emit `run.started`
`created`	`lockBindings fail`	`failed`	emit `run.failed`
`bound`	phase plan needed	`planning`	emit `phase.started`
`planning`	plan artifact valid	`awaiting_approval`	request approval
`awaiting_approval`	approve	`executing`	emit `approval.resolved`, `run.resumed`
`awaiting_approval`	reject	`failed`	emit `run.failed`
`awaiting_approval`	request_changes	`planning`	increment phase attempts
`awaiting_approval`	timeout	`paused`	set `paused_from_state='awaiting_approval'`
`executing`	phase ok, more phases	`executing`	next phase
`executing`	normal workflow approval gate	`awaiting_approval`	request gate
`executing`	all phases done	`completed`	emit `run.completed`, write final report
`executing`	unrecoverable error	`failed`	emit `run.failed`
`executing`	manual `pauseRun`	`paused`	set `paused_from_state='executing'`
`planning`	manual `pauseRun`	`paused`	set `paused_from_state='planning'`
`paused`	resume	`paused_from_state`	emit `run.resumed`, clear `paused_from_state`
any non-terminal state	`abortRun`	`aborted`	emit `run.aborted`, dispose sessions

Non-terminal states for abortRun:

created
bound
planning
awaiting_approval
executing
paused

13.2 Run Phase State

States:

pending
running
awaiting_artifact
validating
awaiting_approval
completed
failed
skipped

Transitions:

From	Trigger	To
`pending`	start	`running`
`running`	prompt sent, artifact expected	`awaiting_artifact`
`awaiting_artifact`	artifact appears	`validating`
`awaiting_artifact`	timeout	`running` after probe/repair, or `failed` after exhaustion
`validating`	valid	`awaiting_approval` if gate, else `completed`
`validating`	invalid	`running` after one repair, else `failed`
`awaiting_approval`	approve	`completed`
`awaiting_approval`	reject / abort	`failed`
`awaiting_approval`	request_changes	`running`, attempt + 1

Replay rules:

phase.started.payload.repair === true marks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions, prompt.repaired, and must not start a third attempt.
Repair replay from running may reuse an existing READY / bootstrapped session even if last_prompt_hash still contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet.
If phase state is running, existing artifact files are never accepted unless the current prompt event (prompt.sent or prompt.repaired) for the current dedup key is already recorded. Replay without prompt proof treats existing files as stale.
If phase state is running, session state is BUSY, and last_prompt_hash matches the current prompt but the matching prompt event is missing, replay waits for the artifact with the current file signature as the baseline. This preserves idempotency without validating a stale pre-existing artifact.
Baseline-protected waits must not synthesize durable prompt proof before the wait finishes. If replay crashes or is cancelled before validation, the next replay must still treat the existing artifact as baseline/stale unless real prompt proof already exists.
If phase state is validating and no artifact row exists yet, replay re-reads and validates the current expectedArtifactPath instead of treating the state as corruption.
If phase state is validating and artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current session last_prompt_at; older rows are treated as stale previous-attempt outputs and the file is revalidated.
Session bootstrap DB row/state changes and session.created / session.ready events are written in one DB transaction after adapter start succeeds.

14. Approval State

States:

pending
approved
rejected
changes_requested
aborted
paused

14.1 Transitions

From	Event	To	Side effects
`pending`	approve decision	`approved`	insert decision row
`pending`	reject decision	`rejected`	insert decision row; run -> `failed`
`pending`	request_changes decision	`changes_requested`	insert decision row; increment attempt
`pending`	abort decision	`aborted`	insert decision row; run -> `aborted`
`pending`	timeout	`paused`	run -> `paused`; no decision row
`paused`	unpause	`pending`	re-arm gate; no decision row
terminal states	any decision	unchanged	return 409

Rules:

A pending request can transition to one non-pending state per pending epoch.
Terminal approval states reject further decisions.
paused may return to pending only through unpause.
Manual pause is run-level pauseRun; it leaves approval gate in pending.
Only approve, reject, request_changes, and abort create approval_decisions rows.
Default timeout is null.
Timeout never auto-approves or auto-rejects.

14.2 Decision Idempotency

GUI:
- UUIDv4 per click.
- reused across automatic UI retries for the same logical action.
CLI:
- UUIDv4 per invocation.
- --client-token=<uuid> override for scripted retry.
API:
- existing (approval_request_id, action, client_token) returns existing row with status 200.
- new decision inserts row and returns 201.
- same token with different action returns 409.
- decision on non-pending request returns 409.

14.3 Destructive Command Enforcement

Devflow-direct commands have hard enforcement. TUI-agent commands have best-effort enforcement.

Hard-blocked Devflow-direct patterns:

rm -rf
git reset --hard
git clean
git push --force
git push --force-with-lease
git worktree remove --force
git branch -D
docker volume rm
docker compose down -v
DROP DATABASE
DROP SCHEMA
migration rollback
reads/writes touching .env*, ~/.ssh/, ~/.aws/, ~/.config/gcloud/, ~/.kube/
files matching *token*, *secret*, *credentials*, *.pem, *.key

TUI-agent command enforcement is best-effort:

Prelude prohibits destructive operations.
Backend permission mode is set to safest available mode.
Transcript audit captures post-hoc evidence.
Human intervention goes through devflow attach.
Worktrees and branches are preserved by default.

v1 does not claim real-time blocking of TUI-internal commands.

15. Run Engine and Temporal Contract

The M4 RunEngine contract is frozen before M5. M5 reimplements the same interface through Temporal.

15.1 Public API

interface RunEngine {
  startRun(input: RunStartInput): Promise<{ runId: string }>;
  signalApproval(
    runId: string,
    approvalRequestId: string,
    action: ApprovalDecisionAction,
    clientToken: string,
    comment?: string
  ): Promise<void>;
  pauseRun(runId: string): Promise<void>;
  resumeRun(runId: string): Promise<void>;
  abortRun(runId: string, reason: string): Promise<void>;
  getStatus(runId: string): Promise<RunStatus>;
}

15.2 Temporal Shape

Namespace: devflow.
Task queue: devflow-runs.
Single worker process: apps/worker.
Workflow: runWorkflow(input: RunStartInput).
Signals:
- approve
- pause
- resume
- abort
- unpause
No Updates in M5.
Status is read from DB.

Activities:

M5 compatibility activity surface:
- prepareRunActivity(input)
- lockBindingsActivity(runId)
- failRunActivity(runId, reason)
- advanceRunActivity(runId)
- signalApprovalActivity(runId, approvalRequestId, action, clientToken, comment?)
- pauseRunActivity(runId)
- resumeRunActivity(runId)
- abortRunActivity(runId, reason)
- getStatusActivity(runId)
- isRunTerminalActivity(runId)
- composeFinalReportActivity(runId)
advanceRunActivity is the M5 parity wrapper over M4 phase advancement. It may internally perform prompt send, artifact wait/validation, event recording, and approval request creation through the same DB/idempotency contracts already locked in sections 8 through 14.
The granular activity split (sendPromptToSession, waitForArtifact, validateArtifact, recordEvent, requestApproval, runCommand) is deferred to a later hardening ADR. It is not an M5 acceptance gate.
Prompt/session mutation still occurs only inside worker-hosted activities through SessionManager. M5+ API code never mutates SessionAdapter directly.

Retry policy:

Default: max attempts 3, exponential backoff start 1s, max 30s.
composeFinalReportActivity: max attempts 1.
Activity-level failures serialize DevflowError; non-recoverable Devflow errors are rethrown as non-retryable Temporal failures.
advanceRunActivity is cancellation-aware and idempotent by DB state, event idempotency keys, prompt dedup keys, and artifact content keys.
Already-applied approval signal replay repairs missing final reports for every terminal run state: completed, failed, and aborted, regardless of whether the replayed approval action was approve, request_changes, reject, or abort.
API-side already-applied approval replay is report-repair only. It must not call SessionAdapter mutation methods; reject/abort session disposal belongs to the worker/session-manager path that originally applies the decision.
If a workflow closes before the API observes an approval signal result, closed-workflow settlement must first verify the requested decision was applied, then replay approval side effects, then wait for the terminal report.

15.3 Hard Constraints

Workflow code holds only serializable state.
No tmux handles in workflow state.
No PTY refs in workflow state.
No DB clients in workflow state.
M5+ session interaction happens through activities calling SessionManager in apps/worker.
M5+ API never calls mutating SessionAdapter methods.
SessionManager advisory lock prevents API/worker ownership conflict during M4 -> M5 transition.
Workflow code uses deterministic clock/randomness only.

16. WriteSet and Worktree

16.1 WriteSet

Each task declares writeSet: string[].
Patterns are relative to repo root.
Glob engine: fast-glob.
Options:

{
  cwd: worktreeRoot,
  dot: true,
  followSymbolicLinks: false,
  onlyFiles: true,
  suppressErrors: false
}

Conflict detection:

Expand writeSets.
Forbidden globs cause conflict if matched by more than one task:
- pnpm-lock.yaml
- package-lock.json
- **/migrations/**
- **/*.generated.*
- root tsconfig*.json
- biome.json
- lefthook.yml
- .github/**
- .gitlab-ci.yml
Pairwise file intersections must be empty.

Conflict creates parallel_dag_approved gate.

16.2 Worktree Lifecycle

Worktree root:
- WORKSPACE_ROOT/<runId>/<laneId>
- non-parallel main lane: WORKSPACE_ROOT/<runId>/main
Created via git worktree add.
Branch name:

devflow/<runId>/<laneId>

Terminal run state does not remove worktrees or branches.
Output branches are deliverables.
Disk growth is accepted.
Cleanup is manual:

devflow cleanup <run-id> [--lane=<id>]

Cleanup:

uses git worktree remove without --force by default.
refuses dirty worktrees.
--force requires an additional gate.
git branch -D is destructive and gated.
doctor --list-orphans lists only; it never removes.

17. SSE Contract

Endpoints:

GET /sse/runs/:runId
GET /sse/global

Heartbeat every 15 seconds.

Events:

Event	Scope
`run.state_changed`	both
`run.event_appended`	run
`phase.state_changed`	run
`approval.created`	both
`approval.resolved`	both
`session.state_changed`	run
`transcript.chunk_appended`	run
`artifact.validated`	run

Reconnect:

Run-scoped /sse/runs/:runId:
- Last-Event-ID is last run_events.seq for that run.
- server replays run.event_appended for seq > lastSeq.
- derived non-run.event_appended SSE types are not replayed for historical rows; state is re-derived by fetch.
Global /sse/global:
- Last-Event-ID is last global run_events.id, because run_events.seq is only monotonic within a run.
- fresh connects start at the latest global event id and emit only new summary events.
- reconnects replay rows with id > lastId.
- global stream emits only scope=both events: run.state_changed, approval.created, approval.resolved.
- global stream never emits run.event_appended.

18. Errors

v4: my_deepagent.errors.MyDeepAgentError (replaces v3 DevflowError 1:1):

class ErrorClass(StrEnum):
    RECOVERABLE = "recoverable"
    HUMAN_REQUIRED = "human_required"
    FATAL = "fatal"


class MyDeepAgentError(Exception):
    error_class: ErrorClass
    code: str
    run_id: UUID | None
    phase_id: UUID | None
    recovery_hint: str | None
    cause: BaseException | None

Recoverable:

network_blip
pane_briefly_unresponsive
prompt_send_transient
db_serialization_retry
rate_limited

Human required:

artifact_invalid_after_repair
artifact_timeout_exhausted
prompt_send_exhausted
destructive_command_blocked
secret_access_blocked
backend_unavailable
no_eligible_persona
writeset_conflict
merge_conflict
objective_not_met
review_dispute_unresolved
backend_auth_failed
model_unavailable
token_budget_exceeded (v4 r1: BudgetTracker rejects a call whose estimated cost would breach the per-run, per-day, or per-persona-daily cap with on_hit=block.)
tool_quota_exceeded (v4 r1: SafetyShellMiddleware blocked a tool call due to deny-path / destructive-command policy, or a per-phase tool-call cap was hit.)

Fatal:

db_unreachable
workspace_permissions
internal_state_corruption
template_load_failed
artifact_schema_unknown
artifact_schema_load_failed
migration_pending
config_invalid

Mapping:

recoverable -> retry; exhausted -> human_required.
human_required / recovery gate -> run paused and gate created. This is distinct from normal workflow approval gates in §13.1, which use awaiting_approval.
fatal -> run failed, sessions disposed, final report best-effort.

19. Concurrent Runs and Crash Recovery

19.1 Active Run Uniqueness

MAX_CONCURRENT_RUNS, default 4.
DB partial unique index is the source of truth:
- one active run per (repo_path, base_branch).
repo_path is canonicalized before insert.
Advisory lock is auxiliary only:

pg_try_advisory_xact_lock(hash64('devflow:start-run', repoPath, baseBranch))

Unique-index violation returns:

{ "currentRunId": "...", "currentState": "..." }

with HTTP 409.

19.2 Crash Recovery

M4, no Temporal:

On apps/api startup, sweep non-terminal runs.
Mark them failed.
final_report_path = null.
Append synthesized run.failed with reason process_restart_unrecovered.
Cascade associated tui_sessions to FAILED_NEEDS_HUMAN.
Append session.failed.
This frees active-run uniqueness slots.

M5+:

No sweep.
Temporal durability owns in-flight workflow recovery.
SessionManager resumes tmux sessions.
Active-run partial index blocks duplicate runs until completion or explicit abort.

20. Milestones

M1: Monorepo + Postgres + CLI Doctor

Scaffold workspace.
Add pnpm, tsconfig, biome, lefthook, Vitest.
Add Docker Compose for Postgres.
Add Drizzle and first migration.
Add devflow doctor.
Implement checks 1-9.
Stub checks 10-12 as warn where needed.
Add SSE compatibility smoke test:
- minimal Fastify 5 server.
- fastify-sse-v2 plugin.
- 30-second integration test.
- receive 3 events and reconnect.
- if plugin fails, implement native reply.raw SSE helper before M1 is green.

M2: Core Schema + Registry + Binding

Implement enums.
Implement canonical hashing.
Implement Template schema.
Implement Persona schema.
Implement seed loader.
Implement binding algorithm.
Implement artifact schema registry.
Add first schemas:
- dev/spec@1
- dev/phase-plan@1
- common/final-report@1
Tests:
- schema validation.
- override semantics.
- risk enforcement.
- diversity enforcement.
- deterministic auto-select.

M3: Fake Session Runtime

Implement SessionAdapter.
Implement FakeSessionAdapter.
Implement prompt envelope.
Implement event recorder.
Implement fake sentinel scenarios.
Persist transcript chunks.
Tests:
- prompt correlation.
- artifact validation.
- invalid artifact.
- timeout.
- fake crash.

M4: Minimal Run Engine

Implement packages/run-engine.
Used directly by apps/api.
No Temporal.
Supports:
- start run.
- lock bindings.
- approval.
- fake prompt.
- artifact wait/validate.
- final report.
Freeze the RunEngine contract.
Full fake development@1 minus reviewers.

M5: Temporal Integration

Reimplement RunEngine through Temporal.
Preserve M4 behavior.
Add parity tests using the same M4 scenarios.
M5+ SessionManager lives in apps/worker.

M6: Real tmux SessionManager

Implement TmuxSessionAdapter.
Decoupled from M5.
May begin after M3 is stable.
Pre-M5 real tmux is opt-in smoke only.
Production run path remains fake until both M5 and M6 are green.

M7: TUI Recovery State Machine

Implement session state transitions.
Implement recovery counters.
Implement escalation to human gates.

M8: API + GUI Minimum

Implement Fastify routes.
Implement SSE.
Implement GUI screens:
- Dashboard.
- Templates.
- Personas.
- New Run.
- Run Detail.
- Approvals.
- TUI Sessions.

M9: `development@1` Fake-Agent Full Run

Add curated development@1.
Add review consensus.
Add verifier flow with fake reviewers.
Add coverage gate >=70% lines for core/session/run-engine.

M10: Codex/Claude Opt-In Real Run

Implement profiles:
- packages/session/src/profiles/codex.ts
- packages/session/src/profiles/claude.ts
Real backends become production-default only after both M5 and M6 are green.
Until then real tmux/Codex/Claude are developer-flagged opt-in smoke only.

M11: Parallel Lanes

Add task DAG scheduler.
Add writeSet detection.
Add per-lane worktrees.
Add merge coordinator.
Add conflict gates.

M12: Backtest Workflow

Add backtest-strategy@1.
Add objective evaluator.
Add metric parser extension points.
Add failure mining artifacts.
Add Backtest Lab GUI.

M13: Template Factory

Generate draft template from natural language and repo discovery.
Add harness design.
Add template review.
Add dry-run and promote flow.

21. Out of Scope

Authentication.
Authorization.
Multi-user support.
Data retention or archival policy.
Observability dashboards.
Remote template/persona registries.
Multi-machine deployment.
HA.
Managed backups.
Web ingress.
TLS.
Reverse proxy.

22. Decision Log

Open Questions Closed

#	Question	Resolution
OQ-1	Persona/template seeding format	Immutable YAML at `docs/schemas/{personas,templates}/<name>@<version>.yaml`
OQ-2	Approval timeout default	`null`; timeout freezes only
OQ-3	Final report format	Markdown and JSON
OQ-4	Temporal namespace/queue	namespace `devflow`, task queue `devflow-runs`
OQ-5	WriteSet glob engine	`fast-glob`
OQ-6	Backtest objective DSL	Stub in M12, full DSL deferred
OQ-7	Codex/Claude prompt prelude	Structure locked, exact text deferred to M10

Blocking Corrections Applied

#	Issue	Resolution
CC-1	Terminal state deleted worktrees/branches	Preserve by default; manual gated cleanup only
CC-2	SessionManager location conflict	M4 API, M5+ worker
CC-3	Event duplicates under retry	`run_events.idempotency_key`
CC-4	Destructive command enforcement overclaimed	Devflow-direct hard, TUI best-effort
CC-5	UUID extension missing	`CREATE EXTENSION IF NOT EXISTS pgcrypto`
CC-6	Advisory lock not enough for active-run uniqueness	partial unique index
CC-7	Undefined transition sequence in event keys	cause-based keys
CC-8	Approval paused transition missing	explicit approval transition table
CC-9	AutoSelect order nondeterministic	deterministic sort
CC-10	SSE plugin compatibility assumed	M1 smoke + native fallback
CC-11	ApprovalAction included pause	split `ApprovalDecisionAction`; `pauseRun` is run-level
CC-12	Artifact hash key collision	include phase id and path
CC-13	Resume previous state not stored	`runs.paused_from_state`
CC-14	repo path aliasing	canonical realpath storage
CC-15	M4 sweep left tmux sessions ambiguous	cascade session state to `FAILED_NEEDS_HUMAN`; real tmux production-default only after M5+M6
CC-16	Prompt hash used phaseId but envelope uses phaseKey	prompt hash uses phaseKey
CC-17	abortRun transition too narrow	abort from any non-terminal run state
CC-18	approval pending transition wording conflicted with pause epoch	pending can transition once per pending epoch; paused may unpause to pending
CC-19	`tsc -b --noEmit` is brittle with TypeScript 5.6 project references on clean worktrees	build still uses `tsc -b`; no-emit verification uses root `tsconfig.typecheck.json`
CC-20	`sendPrompt` retry count was ambiguous against Temporal activity attempts	§8.3 now states retry budget means initial attempt plus retries; §15.2 remains Temporal-level attempts only
CC-21	Duplicate prompt dedup handling conflicted with adapter retry idempotency	duplicate `dedupKey` returns idempotent success without reprocessing
CC-22	Normal workflow approval gates and human-required recovery gates were easy to conflate	§13.1 names normal workflow gates; §18 keeps human_required recovery gates paused
CC-23	Phase start and event append could diverge under retry/error	phase start and `phase.started` append occur in one DB transaction
CC-24	Repair attempt replay lost repair prompt identity and one-repair budget	repair attempts are derived from `phase.started.payload.repair`, replay uses repair instructions and `prompt.repaired`, and cannot start attempt 3
CC-25	`validating` replay failed if crash happened before artifact row insert	replay revalidates the expected artifact file when state is `validating` but no artifact row exists
CC-26	Session bootstrap state/events could diverge	session row/state and `session.created` / `session.ready` events are committed in one DB transaction
CC-27	`validating` replay could reuse stale previous-attempt artifact rows	artifact-row replay requires `artifact.created_at >= tui_sessions.last_prompt_at`; otherwise the file is revalidated
CC-28	repair `running` replay rejected existing READY sessions with previous attempt prompt hash	current-attempt repair prompt is considered unsent, so replay may reuse the session and send `prompt.repaired`
CC-29	API Temporal approval replay omitted M4 approval side-effect repair	API approval signal reader now wires `replayAppliedApprovalSideEffects`, so already-applied terminal approval replays can repair missing final reports
CC-30	`running` replay could validate stale artifacts without prompt proof	`running` replay requires matching prompt event proof; BUSY replay without prompt event uses current artifact signature as baseline and ignores stale files
CC-31	M5 activity list over-specified granular activities not implemented by the M4 parity adapter	M5 locks the compatibility activity wrapper surface; granular activity split is deferred to a later hardening ADR
CC-32	Already-applied `approve` / `request_changes` replay repaired missing reports for `completed` / `failed` but missed `aborted`	approval replay side-effect repair now composes missing final reports for all terminal states
CC-33	API-side already-applied `reject` / `abort` replay tried to dispose sessions through DB-only replay validation runtime	API replay side effects are report-repair only; worker-side decision application owns session disposal
CC-34	Closed-workflow approval settlement waited for reports but did not replay approval side effects	settlement now verifies the requested decision, replays side effects, then waits for the terminal report
CC-35	Baseline-protected BUSY replay recorded synthetic prompt proof before the baseline wait was durable	baseline replay no longer records synthetic prompt events; replay without real prompt proof keeps treating existing files as stale
CC-36	SSE reconnect wording used per-run `seq` for global stream even though `seq` is not globally monotonic	`/sse/runs/:runId` uses per-run `seq`; `/sse/global` uses global `run_events.id` and emits only scope=`both` summary events
CC-37	Run SSE replay could emit historical derived events after the first page	run SSE drains historical rows up to a high-water `seq` with only `run.event_appended`, then switches to live derived events
CC-38	Normal phase start changed run state to `planning` / `executing` without a summary event source	`phase.started` payload includes `runState`; SSE derives `run.state_changed` from that live event
CC-39	No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models	add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` (final v3 entry — v4 reinterprets the OpenRouter integration as the deepagents-driven session adapter; the standalone HTTP `OpenRouterAdapter` from CC-39 is superseded by DR-1)

Decision Records (v4)

ID	Decision	Rationale	Impact
DR-1	v3 → v4 major bump: delete TS monorepo, rewrite in Python on LangChain `deepagents`.	(1) Claude/Anthropic direct API cost is prohibitive for a single-user toolchain. (2) OpenRouter cost-tuned models (DeepSeek, etc.) require a multi-turn, tool-using agent harness; `deepagents` is Python-only with no 1:1 TS port. (3) Switching languages is shorter than reimplementing the harness.	Step 0 (commit `0e61b2d`) deleted `apps/`, `packages/`, `tests/`, `scripts/`, pnpm/tsconfig metadata. The Python rewrite lives at `my-deepagent/` and reached Step 15 (real OpenRouter E2E PASS, ~$0.05/run) before the v3 codebase was removed. CC-39's separate `OpenRouterAdapter` is replaced by `my_deepagent.session.build_agent` (deepagents 0.6.1 with LocalShellBackend + SafetyShellMiddleware). v3 CC counters frozen; v4 begins its own series. Recovery: `git checkout pre-python-rewrite -- <path>`.

Future Open Questions

FOQ-1, M12: full backtest objective DSL.
FOQ-2, M13: template factory generation prompts.
FOQ-3, post-M10: optional third backend such as Gemini.
FOQ-4, post-M8: WebSocket vs SSE if transcript pressure requires it.

23. Kickoff Order

v3 historical order (TS, completed up to M8 before the v4 pivot):

M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
M1.3: apps/cli skeleton + devflow doctor.
M1.4: packages/core skeleton with config, enums, errors, hash, prompt-envelope, run-event types.
M2.1: Zod schemas for Template/Persona, persona YAML loader, hashing.
M2.2: Binding algorithm + tests.
M2.3: Artifact schema registry + first three schemas.
M3.1: SessionAdapter interface + FakeSessionAdapter.
M3.2: Transcript chunk capture + DB persistence.
M3.3: engine-shaped harness running a single fake phase end-to-end.
M4: assemble run engine; lock contract; full fake development@1 minus reviewers.
M5 in parallel with M6 once M4 is green.

v4 r1 order (Python, status as of v0.1.0):

Step	Scope	Status
Step 0	Scaffold `my-deepagent/` (uv workspace, ruff, mypy, alembic, .pre-commit)	DONE (`17ba5d7`)
Step 1	`devflow_core` → `my_deepagent.{config,enums,errors,hash,persona,prompt_envelope,run_event}`	DONE
Step 2	`devflow_db` → `my_deepagent.persistence.{db,models,checkpointer}` + Alembic baseline	DONE
Step 3	`mydeepagent doctor` (typer)	DONE
Step 4	Persona / workflow seeding + binding (`my_deepagent.{persona,workflow,binding}`)	DONE
Step 5	Artifact schema registry (`my_deepagent.artifact_schema`)	DONE
Step 6	Distribution: init/login/logout/keys, governance consent, i18n (ko/en)	DONE
Step 7	WorkflowEngine + ArtifactWatcherMiddleware (replaces v3 §15 in-process engine)	DONE
Step 8	Budget guardrails (`my_deepagent.budget` + cost preview + CostMiddleware)	DONE
Step 9	Crash recovery + concurrency (`my_deepagent.recovery` + `mydeepagent runs …`)	DONE
Step 10	Interactive REPL (`mydeepagent` no-subcommand + slash commands)	DONE
Step 11	Audit log + structlog secret scrubbing	DONE
Step 12	Doctor 8-check + OpenRouter pricing fetch + `mydeepagent pricing`	DONE
Step 13	Tmux adapter (M6-Py)	DEFERRED — not in v0.1.0
Step 14	TUI recovery (M7-Py)	DEFERRED — not in v0.1.0
Step 15	End-to-end real OpenRouter integration test	DONE (`733c9be`)
Step 0-purge	Delete v3 TS monorepo per DR-1	DONE (`0e61b2d`)
M5-Py	Temporal worker (`apps/worker`)	NEXT
M8-Py	FastAPI + SSE (`apps/api`)	NEXT

69 KiB Raw Blame History

Devflow Implementation Plan v4 r1

0. Document Status

1. Stack Decisions

1.1 Workspace

1.2 Tooling

1.3 Database

1.4 Logging

1.5 Config

1.6 HTTP / SSE

2. Directory Layout

3. mydeepagent doctor

4. Database Schema

4.1 workflow_templates

4.2 agent_personas

4.3 runs

4.4 run_inputs

4.5 run_bindings

4.6 run_phases

4.7 run_events

4.8 approval_requests

4.9 approval_decisions

4.10 tui_sessions

4.11 tui_transcript_chunks

4.12 artifacts

4.13 commands

4.14 review_findings

4.15 Backtest Stub Tables

5. Enums

5.1 Backend

5.2 Capability

5.3 RiskLevel

5.4 ApprovalDecisionAction

5.5 ApprovalState

5.6 RunState

5.7 RunPhaseState

5.8 SessionState

6. Content-Addressed Hashing

6.1 Canonical JSON

6.2 Hash Subjects

7. Template, Persona, Binding

7.1 Template Schema

7.2 Persona Schema

7.3 Override Semantics

7.4 Binding Algorithm

7.5 Seeding

8. Session Runtime

8.1 SessionAdapter Interface

8.2 Session State Machine

8.3 Recovery Counters

8.4 SessionManager Singleton

8.5 OpenRouter Adapter — v4 r1 deepagents rewrite

9. Prompt Envelope

9.1 Wire Format

9.2 Schema

9.3 Rules

9.4 Backend Prelude

10. Artifact Schema Registry

10.1 Layout

10.2 Loader

10.3 Validation Flow

10.4 Final Report

10.5 Backtest Objective Stub

11. Run Events

11.1 Idempotency Keys

12. Fake Session Adapter

12.1 Behavior

12.2 Sentinel Triggers

12.3 Transcript

13. State Machines

13.1 Run State

13.2 Run Phase State

14. Approval State

14.1 Transitions

14.2 Decision Idempotency

14.3 Destructive Command Enforcement

15. Run Engine and Temporal Contract

15.1 Public API

15.2 Temporal Shape

15.3 Hard Constraints

69 KiB

Raw Blame History

3. `mydeepagent doctor`

4.1 `workflow_templates`

4.2 `agent_personas`

4.3 `runs`

4.4 `run_inputs`

4.5 `run_bindings`

4.6 `run_phases`

4.7 `run_events`

4.8 `approval_requests`

4.9 `approval_decisions`

4.10 `tui_sessions`

4.11 `tui_transcript_chunks`

4.12 `artifacts`

4.13 `commands`

4.14 `review_findings`

5.1 `Backend`

5.2 `Capability`

5.3 `RiskLevel`

5.4 `ApprovalDecisionAction`

5.5 `ApprovalState`

5.6 `RunState`

5.7 `RunPhaseState`

5.8 `SessionState`

M9: `development@1` Fake-Agent Full Run