docs: patch plan.md to v4 r1 (Python rewrite spec) + .gitignore node_modules

plan.md v4 r1 patches (per plan-v4-draft.md §0/§1/§2/§3/§8.5/§18/§22/§23):

- §0 header: v3 r13 → v4 r1 + note explaining the language migration. v3 CC
  counter frozen at CC-39; v4 begins its own series (DR-1 below).
- §1 Stack Decisions: full rewrite for Python (uv / pydantic v2 /
  pydantic-settings / SQLAlchemy 2 async + aiosqlite / typer + prompt_toolkit
  / structlog / FastAPI + sse-starlette).
- §2 Directory Layout: collapse v3 multi-package monorepo → single
  `my-deepagent/` project. TS `apps/`, `packages/`, `tests/`, `scripts/` are
  gone after `0e61b2d`.
- §3 doctor: 13-check (Node/pnpm/Docker/Drizzle) → 8-check (python/uv/git/
  workspace_root/config+governance/openrouter_api_key/openrouter_ping+pricing
  upsert/disk+sqlite integrity).
- §8.5 OpenRouter Adapter: full rewrite. v3 marker-extraction HTTP adapter
  (CC-39) is superseded by the deepagents 0.6.1 multi-turn tool-using agent
  driven by `my_deepagent.session.build_agent`. Native write_file/read_file/
  bash via LocalShellBackend; SafetyShellMiddleware enforces destructive
  command + deny-path policy; ArtifactWatcherMiddleware observes artifact
  writes; CostMiddleware records usage. Known v0.1.0 limits documented:
  usage_metadata empty on OpenRouter-forwarded responses, Anthropic-via-
  OpenRouter tool_calls.args ValidationError requires DeepSeek workaround.
- §18 Errors: add `token_budget_exceeded` and `tool_quota_exceeded` under
  human_required.
- §22 Decision Log: add DR-1 "v3 → v4 major bump" with rationale, scope,
  recovery path (pre-python-rewrite tag at c9fed71).
- §23 Kickoff Order: v3 historical order preserved + v4 Python step matrix
  showing Step 0~12 + Step 15 DONE, Step 13/14 (tmux/TUI recovery) DEFERRED.

§4~§17 (DB schema, enums, hashing, template/persona/binding, session
runtime, prompt envelope, artifact schema registry, run events, fake
adapter, state machines, approval state, run engine + Temporal contract,
WriteSet/worktree, SSE contract) are language-neutral domain spec and remain
unchanged for the Python implementation.

.gitignore: re-add `node_modules/` (legacy Node tree kept ignored until
`rm -rf` cleanup outside git).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
chungyeong
2026-05-16 17:16:47 +09:00
parent 0e61b2d907
commit 1d0dfb273b
2 changed files with 323 additions and 252 deletions

3
.gitignore vendored
View File

@@ -1,3 +1,6 @@
# Legacy Node tree — kept ignored until removed by `rm -rf node_modules`
node_modules/
.DS_Store .DS_Store
.env .env
.env.local .env.local

View File

@@ -1,12 +1,23 @@
# Devflow Implementation Plan v3 r13 # Devflow Implementation Plan v4 r1
## 0. Document Status ## 0. Document Status
- **v4 r1: language migration TS → Python.** Major version bump; the TypeScript
monorepo (apps/, packages/, tests/, scripts/, pnpm/tsconfig metadata) was
deleted in `0e61b2d` after being re-implemented under `my-deepagent/`.
v3 CC counters are preserved as historical context; v4 begins its own CC
series (DR-1 below; CC-Py-1 onward as new change clarifications land).
- This document supersedes v2 and all earlier v3 drafts where conflicting. - This document supersedes v2 and all earlier v3 drafts where conflicting.
- Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy. - Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy.
- Target OS: macOS 13+ / Linux. No Windows. - Target OS: macOS 13+ / Linux. No Windows.
- All paths are Unix-style. All times are stored UTC. - All paths are Unix-style. All times are stored UTC.
- Decisions in this document are locked unless explicitly marked `(provisional)`. Override requires updating this document, not only code. - Decisions in this document are locked unless explicitly marked `(provisional)`. Override requires updating this document, not only code.
- §1 Stack Decisions, §2 Directory Layout, §3 doctor checklist, §22 Decision Log
have been rewritten for v4 r1. §4~§17 (DB schema, enums, hashing, template/
persona/binding, session runtime, prompt envelope, artifact registry, run
events, fake adapter, state machines, errors, write set, SSE contract) are
language-neutral domain spec and remain valid for the Python implementation.
- v3 CC history (informational):
- r1 applied CC-1 through CC-5. - r1 applied CC-1 through CC-5.
- r2 applied CC-6 through CC-10. - r2 applied CC-6 through CC-10.
- r3 applied CC-11 through CC-15. - r3 applied CC-11 through CC-15.
@@ -19,218 +30,188 @@
- r10 applies CC-29 through CC-31. - r10 applies CC-29 through CC-31.
- r11 applies CC-32. - r11 applies CC-32.
- r12 applies CC-33 through CC-35. - r12 applies CC-33 through CC-35.
- r13 applies CC-39. - r13 applied CC-39 (final v3 revision; superseded by v4 r1).
## 1. Stack Decisions ## 1. Stack Decisions
### 1.1 Workspace ### 1.1 Workspace
- `pnpm 9` with workspaces. No Turbo. - **Python 3.12+**, managed by **uv** workspaces (`uv sync`, `uv add`, `uv run`).
- Node 22 LTS, pinned by `.nvmrc` and `package.json#engines`. - Pinned via `.python-version`. No Node, no pnpm, no tsc.
- TypeScript 5.6 with project references via `tsc -b`. - `pyproject.toml` at repo root + per-package `pyproject.toml` under
- `strict: true`. `packages/<name>/` (uv workspace members).
- No `any` unless accompanied by an explicit annotation comment explaining why. - Imports are absolute. No `from . import *`.
### 1.2 Tooling ### 1.2 Tooling
- Build: | Concern | Choice | Notes |
- `tsup` for libraries, CJS + ESM dual output. |---------|--------|-------|
- `vite` for `apps/web`. | Lint / format | **ruff** | One root `ruff.toml`. `ruff check .` + `ruff format --check .`. |
- `tsx` for `apps/cli`, `apps/api`, and `apps/worker` in dev. | Type check | **mypy --strict** | `mypy.ini` enables strict mode; tests relax `disallow_untyped_defs`. |
- `node` for prod-ish local runs. | Test | **pytest** + **pytest-asyncio** + **pytest-httpx** + **respx** | `pytest -q`. |
- Test: | Pre-commit | **pre-commit** (`.pre-commit-config.yaml`) | Runs ruff + mypy + pytest --collect-only. |
- `vitest` with workspace config. | Schema validation | **pydantic v2** + **pydantic-settings** | Replaces zod. |
- Coverage via `@vitest/coverage-v8`. | YAML | **PyYAML** | Persona/template YAML loaders. |
- No coverage gate at M1. | JSON Schema | **jsonschema** (2020-12) | Artifact registry. |
- M9 adds coverage gate: >=70% lines on `packages/core`, `packages/session`, `packages/run-engine`. | HTTP client | **httpx** (async) | OpenRouter / pricing fetch. |
- Lint/format: | Logging | **structlog** + **rich** | Replaces pino. `_scrub_processor` redacts secrets before stderr / JSON sinks. |
- `biome`. | CLI | **typer** + **prompt_toolkit** | Replaces commander; prompt_toolkit drives the interactive REPL. |
- One root config. | OS dirs | **platformdirs** | XDG data / state / config dirs. |
- Pre-commit: | Secrets | **keyring** | macOS Keychain / Linux Secret Service / Windows Credential Store. |
- `lefthook`.
- Runs `biome check --write` on staged files.
- Runs `tsc -p tsconfig.typecheck.json --noEmit`.
- Runs related Vitest tests on changed packages.
### 1.3 Database ### 1.3 Database
- Postgres 16 via Docker Compose. - **SQLite 3 (WAL mode)** via **aiosqlite**, ORM: **SQLAlchemy 2.0 async**.
- Drizzle ORM + `drizzle-kit generate`. - Migrations: **Alembic** (baseline + per-feature revisions).
- Generated SQL migrations are committed. - WAL + `busy_timeout=5000` + `PRAGMA foreign_keys=ON` enforced at connect.
- Migrations are never auto-applied at runtime except through the explicit migration runner invoked by `devflow up`. - Postgres (the v3 default) is parked: single-machine + single-user removes the
- Migration runner: multi-process concurrency justification, and aiosqlite + the
- `scripts/migrate.ts`. `ux_active_run_repo_base` partial unique index covers the active-run
- Takes `DATABASE_URL`. uniqueness invariant. Postgres can be reinstated for multi-tenant later.
- `devflow up` waits for Postgres health and then runs pending migrations.
### 1.4 Logging ### 1.4 Logging
- `pino`. - **structlog**, JSON sink to stderr by default, rich pretty sink when stdout is
- `pino-pretty` in dev, JSON otherwise. a TTY.
- Standard fields: - Standard fields: `time`, `level`, `module`, `run_id?`, `phase_id?`, `role?`,
- `time` `event_id?`, `interactive_session_id?`.
- `level` - `_scrub_processor` redacts OpenRouter / Anthropic / OpenAI / LangSmith /
- `module` GitHub / GitLab API keys and generic `Bearer …` tokens before emission.
- `runId?` - Levels: same semantics as v3 (`trace`/`debug`/`info`/`warn`/`error`).
- `phaseId?`
- `role?`
- `eventId?`
- Levels:
- `trace`: transcript chunks only.
- `debug`: internal state transitions.
- `info`: run events.
- `warn`: recoverable errors.
- `error`: human-required or fatal errors.
### 1.5 Config ### 1.5 Config
- Single Zod schema in `packages/core/src/config.ts`. - Single `pydantic-settings` BaseSettings in
- Source precedence, high to low: `my_deepagent.config.Config` with `MYDEEPAGENT_` env prefix and optional TOML
- `process.env` source.
- `.env.local` - Source precedence (high → low): explicit overrides → `os.environ` (with
- `.env` `MYDEEPAGENT_` prefix) → `.env``config.toml` → schema defaults.
- schema defaults - Config is loaded once at process start, validated, frozen, and re-exported as
- Config is loaded once at process start, validated, frozen, and exported as typed `Config`. an immutable typed `Config`.
- Config validation failure is fatal. - Validation failure is fatal (exit code 2).
- Required keys at M1: - Required keys at v0.1.0:
- `DATABASE_URL` - `MYDEEPAGENT_DATABASE_URL` (default `sqlite+aiosqlite:///<state_dir>/db.sqlite3`)
- `WORKSPACE_ROOT` - `MYDEEPAGENT_WORKSPACE_ROOT`
- `LOG_LEVEL` - `MYDEEPAGENT_LOG_LEVEL`
- `MYDEEPAGENT_OPENROUTER_API_KEY` when the OpenRouter backend is enabled
(resolution order: config → env → OS keyring → error).
- Path canonicalization: `workspace_root` is resolved via `Path.resolve()` at
config load. Any path entering the system is canonicalized before storage or
hashing.
Additional required keys when `openrouter` backend is enabled: Backend registration (deepagents-flavored):
- `OPENROUTER_API_KEY` ```python
class BackendConfig(BaseModel, frozen=True):
- M5 adds: id: Backend # openrouter | anthropic | openai | google | fake
- `TEMPORAL_ADDRESS` enabled: bool
- Path canonicalization: api_base_url: str | None = None # openrouter default https://openrouter.ai/api/v1
- `WORKSPACE_ROOT` is resolved through `fs.realpathSync` and stored as an absolute path at config load. api_key_env: str | None = None # default MYDEEPAGENT_OPENROUTER_API_KEY
- Any path entering the system must be canonicalized before storage or hashing.
- `repo_path` and `worktree_root` rules are defined in section 4.
Backend registration:
```ts
const BackendConfig = z.object({
id: Backend, // codex | claude | fake | openrouter
enabled: z.boolean(),
binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude when enabled
apiBaseUrl: z.string().optional(), // openrouter only; default https://openrouter.ai/api/v1
apiKeyEnv: z.string().optional(), // openrouter only; default OPENROUTER_API_KEY
});
``` ```
- `fake` is always available. - `fake` is always available.
- `codex` and `claude` are available only when: - `openrouter` is available only when enabled and the resolved key is present.
- `enabled=true` - Doctor warns on misconfig; binding fails fast at run start with
- binary resolves at process start. `human_required:backend_unavailable`.
- `openrouter` is available only when:
- `enabled=true`
- the env var named by `apiKeyEnv` (default `OPENROUTER_API_KEY`) is present and non-empty.
- `binaryPath` is ignored for `openrouter`.
- Resolution failure:
- `doctor` warns.
- binding fails fast at run start with `human_required:backend_unavailable`.
- Binding reads from `config.backends`, never directly from `PATH`.
### 1.6 HTTP ### 1.6 HTTP / SSE
- `fastify` 5. - **FastAPI** + **uvicorn** + **sse-starlette** for the M8-Py REST + SSE
- `@fastify/sensible`. surface (v3 r13 §17 contract unchanged: same event types, same headers,
- SSE primary strategy: same `data: <json>\n\n` framing).
- Try `fastify-sse-v2`. - Body validation via the same pydantic v2 models used elsewhere.
- Fastify 5 compatibility is not assumed. - WebSocket remains deferred unless SSE fails under transcript volume.
- M1 includes a smoke test.
- SSE fallback:
- Native `reply.raw`.
- Headers:
- `content-type: text/event-stream`
- `cache-control: no-cache`
- `connection: keep-alive`
- Write `data: <json>\n\n`.
- Manage heartbeats and reconnect manually.
- WebSocket is deferred unless SSE fails under transcript volume.
## 2. Directory Layout ## 2. Directory Layout
v4 r1 collapses the v3 multi-package monorepo into a single `my-deepagent/`
project. The TS `apps/`, `packages/`, `tests/`, `scripts/` trees were deleted
in `0e61b2d`; v3 §4~§17 module-by-module spec still applies but each module
now lives under `my_deepagent/<name>.py` instead of
`packages/<name>/src/<name>.ts`.
```text ```text
devflow/ <repo-root>/
├── package.json
├── pnpm-workspace.yaml
├── tsconfig.base.json
├── biome.json
├── lefthook.yml
├── vitest.workspace.ts
├── docker-compose.yml
├── .nvmrc
├── .env.example
├── docs/ ├── docs/
│ ├── plan.md │ ├── plan.md # this document
│ ├── adr/ │ ├── plan-v4-draft.md # v4 r1 design memo (informational)
│ └── schemas/ │ └── schemas/
│ ├── artifacts/ │ ├── artifacts/ # JSON Schema 2020-12 (language-neutral)
│ ├── personas/ │ ├── personas/ # YAML persona seed (language-neutral)
│ └── templates/ │ └── templates/ # YAML workflow templates
├── scripts/ ├── docker-compose.yml # Postgres + Temporal (still relevant for M5-Py)
│ ├── migrate.ts ├── .env.example
│ └── seed.ts ├── .gitignore
├── packages/ ├── my-deepagent-seed/ # v0.1.0 bootstrap kit (historical, may be pruned)
│ ├── core/ └── my-deepagent/
│ │ ── src/ ── pyproject.toml # uv workspace root
│ │ ├── config.ts ├── uv.lock
│ │ ├── enums.ts ├── ruff.toml
│ │ ├── hash.ts ├── mypy.ini
│ │ ├── errors.ts ├── alembic.ini
│ │ ├── template.ts ├── .pre-commit-config.yaml
│ │ ├── persona.ts ├── CHANGELOG.md
│ │ ├── binding.ts ├── alembic/
├── prompt-envelope.ts │ ├── env.py
│ │ ├── artifact-schema.ts │ └── versions/ # baseline + per-feature migrations
│ │ ├── run-event.ts ├── docs/schemas/ # mirror of repo-root docs/schemas for loader convenience
│ │ └── index.ts ├── src/my_deepagent/
│ ├── db/ │ ├── config.py # pydantic-settings Config (replaces §1.5 zod schema)
── src/ ── enums.py # closed-set enums (§5)
│ │ ├── schema/ │ ├── errors.py # error taxonomy (§18)
├── migrations/ ├── hash.py # content-addressed hashing (§6)
├── repositories/ ├── persona.py # Persona + loader (§7.2)
└── client.ts ├── workflow.py # WorkflowTemplate + loader (§7.1)
│ ├── session/ │ ├── binding.py # autoSelect / override / consent store (§7.4)
── src/ ── artifact_schema.py # JSON Schema 2020-12 registry (§10)
│ │ ├── adapter.ts │ ├── run_event.py # event types + idempotency keys (§11, §13.1)
│ │ ├── fake.ts │ ├── prompt_envelope.py # envelope builder (§9)
│ │ ├── tmux.ts ├── budget.py # BudgetTracker (v4-new)
│ │ ├── profiles/ │ ├── secrets.py # config → env → keyring resolution chain
│ │ │ ├── codex.ts │ ├── keys.py # OS keyring wrapper
└── claude.ts ├── audit.py # append-only JSONL audit log (v4-new)
│ │ ├── recovery.ts │ ├── logging.py # structlog + secret scrubber (§1.4)
└── transcript.ts ├── governance.py # first-run consent (v4-new)
│ ├── harness/ │ ├── i18n/ # ko / en catalog
── src/ ── recovery.py # sweep_orphan_runs (§19)
├── git.ts ├── session.py # deepagents adapter (§8.5, v4-new)
│ │ ├── worktree.ts │ ├── engine.py # WorkflowEngine — phase loop (§15)
├── runner.ts │ ├── persistence/
├── review.ts ├── db.py # SQLAlchemy 2 async engine
│ │ └── backtest.ts │ │ ├── models.py # ORM models (§4)
│ ├── run-engine/ │ │ └── checkpointer.py # LangGraph SqliteSaver context
── src/ ── middleware/
├── engine.ts ├── cost.py # CostMiddleware (v4-new)
├── phase-executor.ts ├── budget.py # BudgetMiddleware (v4-new)
│ │ └── approval.ts │ │ ├── audit.py # AuditToolMiddleware
│ └── workflows/ │ │ ├── safety.py # SafetyShellMiddleware (deny-path / destructive command)
└── src/ │ │ └── artifact_watcher.py # ArtifactWatcherMiddleware
├── workflow.ts ├── monitoring/
── activities.ts ── pricing.py # OpenRouter pricing cache
├── apps/ │ │ └── cost_estimator.py # pre-run preview
│ ├── api/ │ ├── cli/ # typer-driven CLI
│ ├── web/ │ │ ├── main.py # entry (interactive REPL when no subcommand)
│ ├── cli/ │ │ ├── doctor.py # §3 doctor checks (Python/uv version)
│ └── worker/ │ │ ├── init.py
│ │ ├── keys_cmd.py
│ │ ├── run.py
│ │ ├── runs.py
│ │ ├── stats.py
│ │ └── interactive.py # prompt_toolkit REPL
│ ├── tui/
│ │ └── approval.py # tri-state approval prompt
│ └── slash.py # REPL slash commands
└── tests/ └── tests/
├── e2e/ ├── unit/ # pure-Python unit tests
└── fixtures/ └── integration/ # async + persistence + real OpenRouter (gated)
``` ```
## 3. `devflow doctor` Future trees deferred:
- `apps/api/`, `apps/worker/` (M5-Py / M8-Py): FastAPI app and temporalio
worker. v4 r1 keeps them out until M5 lands.
- `apps/web/`: Web GUI port is out of scope for v0.1.0 (separate milestone).
## 3. `mydeepagent doctor`
Exit codes: Exit codes:
@@ -245,34 +226,42 @@ Each check emits:
- `detail` - `detail`
- `remediation` - `remediation`
Closed check list: Closed check list (v4 r1, 8 checks — Node/pnpm/Docker/Drizzle dropped):
1. Node version satisfies `>=22.0.0 <23`. 1. **python**: `python --version` satisfies `>=3.12,<3.14`.
2. pnpm version `>=9.0.0`. 2. **uv**: `uv --version` resolves (any).
3. `tmux` exists, version `>=3.3`. 3. **git**: `git --version` `>=2.40`.
4. `git` version `>=2.40`. 4. **workspace_root**: `MYDEEPAGENT_WORKSPACE_ROOT` exists, is a directory,
5. Docker daemon reachable. and is writable.
6. Postgres container running, `pg_isready` ok, `DATABASE_URL` connects. 5. **config+governance**: `Config` loads from env + `.env` + `config.toml`
7. No pending Drizzle migrations. without ValidationError; first-run governance consent file exists (or is
8. `WORKSPACE_ROOT` exists and is writable. created interactively on first run only).
9. `.env` resolves to valid `Config`. 6. **openrouter_api_key**: resolution chain (config → env → OS keyring)
10. `codex` in `PATH`, warn-only. yields a non-empty value. Warn-only when the OpenRouter backend is not
11. `claude` in `PATH`, warn-only. enabled.
12. Free disk on `WORKSPACE_ROOT` partition: 7. **openrouter_ping + pricing upsert**: `GET https://openrouter.ai/api/v1/models`
- warn under 10GB. with the bearer key.
- fail under 2GB. - `200` → pass; pricing rows are upserted into `model_pricing` for use by
- target green threshold: >=5GB. the `mydeepagent run` cost preview.
13. OpenRouter API reachable: when `openrouter` backend is enabled, `GET ${apiBaseUrl}/models` with the bearer key. - `401` → fail.
- pass on `200`. - any other non-200 / network error → warn.
- fail on `401`. 8. **disk+sqlite integrity**:
- warn on any other non-200 or network error. - Free disk on the `workspace_root` partition: warn under 10 GB, fail under
2 GB, green target ≥ 5 GB.
- SQLite DB file (if present) opens and `PRAGMA integrity_check` returns
`ok`.
Output: Output:
- Human table by default. - Rich human table by default.
- `--json` for machine-readable output. - `--json` for machine-readable output.
- `--quiet` prints only nonzero results. - `--quiet` prints only nonzero results.
- `--list-orphans` lists orphaned worktrees only; it never removes them.
Notes:
- `tmux` / `Docker` / `Postgres` / `pg_isready` / drizzle migration checks from
v3 §3 are dropped in v4 r1 — the v0.1.0 runtime is SQLite-only and tmux is
out of scope for the deepagents-driven session model.
- `--list-orphans` and friends are owned by `mydeepagent runs list/show` (§19).
## 4. Database Schema ## 4. Database Schema
@@ -882,53 +871,91 @@ Exhaustion creates a human gate with `recoveryHint`.
- persist `last_capture_seq`. - persist `last_capture_seq`.
- release advisory lock. - release advisory lock.
### 8.5 OpenRouter Adapter ### 8.5 OpenRouter Adapter — v4 r1 deepagents rewrite
HTTP-based `SessionAdapter` for the `openrouter` backend. No PTY, no tmux. **Supersedes the v3 marker-extraction HTTP adapter (CC-39).** In v4 the
OpenRouter integration is a multi-turn, tool-using agent driven by LangChain
`deepagents` 0.6.1 — no single-shot completions, no `<<<DEVFLOW_ARTIFACT_*>>>`
markers, no transcript replay reconstruction.
Method mapping: Construction — `my_deepagent.session.build_agent(persona, run_id, …)`:
- `start`: ```python
- allocate in-memory session state `{ messages: [], lastResponseAt }`. llm = ChatOpenAI(
- push the backend prelude (§9.4) as a `system` message. model=persona.model, # e.g. "openrouter:deepseek/deepseek-chat"
- `sendPrompt`: base_url=config.openrouter_api_base, # https://openrouter.ai/api/v1
- append the envelope `instructions` (full §9.1 envelope text) as a `user` message. api_key=resolve_openrouter_api_key(),
- POST `${apiBaseUrl}/chat/completions` with `Authorization: Bearer ${apiKey}` and body `{ model: persona.modelConfig.model, messages, max_tokens?, temperature?, top_p? }`. timeout=persona.model_params.timeout,
- append the assistant response as an `assistant` message. )
- `probe`: agent = deepagents.create_deep_agent(
- alive iff session state is held in the SessionManager map. model=llm,
- `paneActive` is always `true`. tools=[], # base tools come from LocalShellBackend
- `resume`: instructions=persona.system_prompt,
- in-memory messages are lost on process restart. subagents=[_subagent_to_dict(s) for s in persona.subagents],
- attempt restoration by replaying `tui_transcript_chunks` for the session into the messages array. middleware=[
- on irrecoverable failure, fall through to `rebootstrap`. SafetyShellMiddleware(...), # destructive command + deny-path guard
- `rebootstrap`: AuditToolMiddleware(...), # append-only JSONL audit log
- clear messages and re-push the prelude. ArtifactWatcherMiddleware(...), # write_file/edit_file detection
- `capture`: CostMiddleware(...), # usage_metadata + budget ledger
- split assistant responses into line-sized `TranscriptChunk`s and persist via the standard chunk pipeline. ],
- `dispose`: backend=LocalShellBackend( # bash + read_file + write_file + edit_file + ls
- drop the in-memory entry. cwd=worktree_root,
# `permissions` kwarg is intentionally omitted for local_shell backend
# (deepagents 0.6.1 NotImplementedError workaround — enforcement moves
# to SafetyShellMiddleware).
),
)
```
Method mapping (driven by `WorkflowEngine` rather than a v3-style adapter
interface):
- **Start**: `create_deep_agent` returns a `CompiledStateGraph` per phase.
No persistent session object is shared across phases — each phase is a
fresh agent invocation parameterized by persona + envelope.
- **Send prompt**: `await agent.ainvoke({"messages": [HumanMessage(envelope)]})`
where `envelope` is built by `WorkflowEngine._build_envelope` (§9 with the
artifact JSON Schema inlined so the model sees the exact required fields).
- **Tool use**: native `read_file` / `write_file` / `edit_file` / `ls` /
`bash` calls are emitted by the model and dispatched through
LocalShellBackend, recorded by AuditToolMiddleware, gated by
SafetyShellMiddleware.
- **Probe / resume / rebootstrap / dispose**: not applicable — the agent is
ephemeral per phase. Crash recovery operates at the run/phase level via
`sweep_orphan_runs` (§19), not at a session-adapter level.
Artifact production: Artifact production:
- HTTP agents cannot write to the workspace filesystem. The backend prelude (§9.4) instructs the model to emit the artifact body inside a single fenced block at the tail of the response: - The model writes the artifact directly to `expected_artifact_path` via the
`write_file` tool. ArtifactWatcherMiddleware observes the tool call and
notifies the engine.
- The envelope inlines the artifact's JSON Schema definition so the LLM has
the exact required fields.
- Schema validation is performed by `ArtifactSchemaRegistry.validate` on the
written file (§10). On failure, the engine retries once with a repair
prompt; second failure raises `human_required:artifact_invalid_after_repair`.
```text Error mapping (preserved from CC-39, applied per-call by the LangChain
<<<DEVFLOW_ARTIFACT_BEGIN>>> exception path):
{ "...": "..." }
<<<DEVFLOW_ARTIFACT_END>>>
```
- The adapter extracts the JSON between the markers and writes it atomically (temp file + rename) to `expectedArtifactPath`.
- Missing markers, multiple blocks, or JSON parse failure are treated as `artifact.invalid` and follow the standard repair/timeout flow in §10.3.
Error mapping:
- HTTP `401``human_required:backend_auth_failed`. - HTTP `401``human_required:backend_auth_failed`.
- HTTP `429``recoverable:rate_limited` (exponential backoff: 1s, 2s, 4s, max 30s). - HTTP `429``recoverable:rate_limited` (exponential backoff: 1 s, 2 s, 4 s,
max 30 s, owned by langchain-openai retries).
- HTTP `5xx``recoverable:network_blip`. - HTTP `5xx``recoverable:network_blip`.
- HTTP `400` with body code `model_not_found``human_required:model_unavailable`. - HTTP `400` with `model_not_found``human_required:model_unavailable`.
- Network error before any response → `recoverable:network_blip`. - BudgetTracker pre-call rejection → `human_required:token_budget_exceeded`.
- SafetyShellMiddleware blocked tool call → `human_required:tool_quota_exceeded`.
Known v0.1.0 limitations:
- `usage_metadata` is sometimes empty on responses forwarded by OpenRouter
(deepagents wraps the underlying ChatOpenAI response so token counts may
not surface). The recorder still fires and `LlmCallRow` is persisted, but
`input_tokens` / `output_tokens` may read 0. v0.2 will probe additional
response shapes (raw chunks / callbacks).
- Anthropic models via OpenRouter currently fail with a `tool_calls.args`
JSON-string vs dict ValidationError inside `langchain-openai` 1.2.1.
Workaround: pin DeepSeek personas via `BindingOverride`. Tracking for v0.2.
## 9. Prompt Envelope ## 9. Prompt Envelope
@@ -1549,19 +1576,22 @@ Reconnect:
## 18. Errors ## 18. Errors
`packages/core/src/errors.ts`: v4: `my_deepagent.errors.MyDeepAgentError` (replaces v3 `DevflowError` 1:1):
```ts ```python
type ErrorClass = 'recoverable' | 'human_required' | 'fatal'; class ErrorClass(StrEnum):
RECOVERABLE = "recoverable"
HUMAN_REQUIRED = "human_required"
FATAL = "fatal"
class DevflowError extends Error {
readonly class: ErrorClass; class MyDeepAgentError(Exception):
readonly code: string; error_class: ErrorClass
readonly runId?: string; code: str
readonly phaseId?: string; run_id: UUID | None
readonly recoveryHint?: string; phase_id: UUID | None
readonly cause?: unknown; recovery_hint: str | None
} cause: BaseException | None
``` ```
Recoverable: Recoverable:
@@ -1587,6 +1617,12 @@ Human required:
- `review_dispute_unresolved` - `review_dispute_unresolved`
- `backend_auth_failed` - `backend_auth_failed`
- `model_unavailable` - `model_unavailable`
- `token_budget_exceeded` *(v4 r1: BudgetTracker rejects a call whose
estimated cost would breach the per-run, per-day, or per-persona-daily cap
with `on_hit=block`.)*
- `tool_quota_exceeded` *(v4 r1: SafetyShellMiddleware blocked a tool call
due to deny-path / destructive-command policy, or a per-phase tool-call
cap was hit.)*
Fatal: Fatal:
@@ -1857,7 +1893,13 @@ M5+:
| CC-36 | SSE reconnect wording used per-run `seq` for global stream even though `seq` is not globally monotonic | `/sse/runs/:runId` uses per-run `seq`; `/sse/global` uses global `run_events.id` and emits only scope=`both` summary events | | CC-36 | SSE reconnect wording used per-run `seq` for global stream even though `seq` is not globally monotonic | `/sse/runs/:runId` uses per-run `seq`; `/sse/global` uses global `run_events.id` and emits only scope=`both` summary events |
| CC-37 | Run SSE replay could emit historical derived events after the first page | run SSE drains historical rows up to a high-water `seq` with only `run.event_appended`, then switches to live derived events | | CC-37 | Run SSE replay could emit historical derived events after the first page | run SSE drains historical rows up to a high-water `seq` with only `run.event_appended`, then switches to live derived events |
| CC-38 | Normal phase start changed run state to `planning` / `executing` without a summary event source | `phase.started` payload includes `runState`; SSE derives `run.state_changed` from that live event | | CC-38 | Normal phase start changed run state to `planning` / `executing` without a summary event source | `phase.started` payload includes `runState`; SSE derives `run.state_changed` from that live event |
| CC-39 | No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models | add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` | | CC-39 | No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models | add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` (final v3 entry — v4 reinterprets the OpenRouter integration as the deepagents-driven session adapter; the standalone HTTP `OpenRouterAdapter` from CC-39 is **superseded by DR-1**) |
### Decision Records (v4)
| ID | Decision | Rationale | Impact |
|----|----------|-----------|--------|
| DR-1 | **v3 → v4 major bump: delete TS monorepo, rewrite in Python on LangChain `deepagents`.** | (1) Claude/Anthropic direct API cost is prohibitive for a single-user toolchain. (2) OpenRouter cost-tuned models (DeepSeek, etc.) require a multi-turn, tool-using agent harness; `deepagents` is Python-only with no 1:1 TS port. (3) Switching languages is shorter than reimplementing the harness. | Step 0 (commit `0e61b2d`) deleted `apps/`, `packages/`, `tests/`, `scripts/`, pnpm/tsconfig metadata. The Python rewrite lives at `my-deepagent/` and reached Step 15 (real OpenRouter E2E PASS, ~$0.05/run) before the v3 codebase was removed. CC-39's separate `OpenRouterAdapter` is replaced by `my_deepagent.session.build_agent` (deepagents 0.6.1 with LocalShellBackend + SafetyShellMiddleware). v3 CC counters frozen; v4 begins its own series. Recovery: `git checkout pre-python-rewrite -- <path>`. |
### Future Open Questions ### Future Open Questions
@@ -1868,6 +1910,8 @@ M5+:
## 23. Kickoff Order ## 23. Kickoff Order
v3 historical order (TS, completed up to M8 before the v4 pivot):
1. M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace. 1. M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace.
2. M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration. 2. M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration.
3. M1.3: `apps/cli` skeleton + `devflow doctor`. 3. M1.3: `apps/cli` skeleton + `devflow doctor`.
@@ -1880,3 +1924,27 @@ M5+:
10. M3.3: engine-shaped harness running a single fake phase end-to-end. 10. M3.3: engine-shaped harness running a single fake phase end-to-end.
11. M4: assemble run engine; lock contract; full fake `development@1` minus reviewers. 11. M4: assemble run engine; lock contract; full fake `development@1` minus reviewers.
12. M5 in parallel with M6 once M4 is green. 12. M5 in parallel with M6 once M4 is green.
v4 r1 order (Python, status as of v0.1.0):
| Step | Scope | Status |
|------|-------|--------|
| Step 0 | Scaffold `my-deepagent/` (uv workspace, ruff, mypy, alembic, .pre-commit) | DONE (`17ba5d7`) |
| Step 1 | `devflow_core``my_deepagent.{config,enums,errors,hash,persona,prompt_envelope,run_event}` | DONE |
| Step 2 | `devflow_db``my_deepagent.persistence.{db,models,checkpointer}` + Alembic baseline | DONE |
| Step 3 | `mydeepagent doctor` (typer) | DONE |
| Step 4 | Persona / workflow seeding + binding (`my_deepagent.{persona,workflow,binding}`) | DONE |
| Step 5 | Artifact schema registry (`my_deepagent.artifact_schema`) | DONE |
| Step 6 | Distribution: init/login/logout/keys, governance consent, i18n (ko/en) | DONE |
| Step 7 | WorkflowEngine + ArtifactWatcherMiddleware (replaces v3 §15 in-process engine) | DONE |
| Step 8 | Budget guardrails (`my_deepagent.budget` + cost preview + CostMiddleware) | DONE |
| Step 9 | Crash recovery + concurrency (`my_deepagent.recovery` + `mydeepagent runs …`) | DONE |
| Step 10 | Interactive REPL (`mydeepagent` no-subcommand + slash commands) | DONE |
| Step 11 | Audit log + structlog secret scrubbing | DONE |
| Step 12 | Doctor 8-check + OpenRouter pricing fetch + `mydeepagent pricing` | DONE |
| Step 13 | Tmux adapter (M6-Py) | DEFERRED — not in v0.1.0 |
| Step 14 | TUI recovery (M7-Py) | DEFERRED — not in v0.1.0 |
| Step 15 | End-to-end real OpenRouter integration test | DONE (`733c9be`) |
| Step 0-purge | Delete v3 TS monorepo per DR-1 | DONE (`0e61b2d`) |
| M5-Py | Temporal worker (`apps/worker`) | NEXT |
| M8-Py | FastAPI + SSE (`apps/api`) | NEXT |