diff --git a/.gitignore b/.gitignore index 186a48f..77f62de 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,6 @@ +# Legacy Node tree — kept ignored until removed by `rm -rf node_modules` +node_modules/ + .DS_Store .env .env.local diff --git a/docs/plan.md b/docs/plan.md index 27a69e2..ffdf325 100644 --- a/docs/plan.md +++ b/docs/plan.md @@ -1,12 +1,23 @@ -# Devflow Implementation Plan v3 r13 +# Devflow Implementation Plan v4 r1 ## 0. Document Status +- **v4 r1: language migration TS → Python.** Major version bump; the TypeScript + monorepo (apps/, packages/, tests/, scripts/, pnpm/tsconfig metadata) was + deleted in `0e61b2d` after being re-implemented under `my-deepagent/`. + v3 CC counters are preserved as historical context; v4 begins its own CC + series (DR-1 below; CC-Py-1 onward as new change clarifications land). - This document supersedes v2 and all earlier v3 drafts where conflicting. - Single-user, single-machine assumption. No auth, no retention policy, no observability dashboards, no multi-tenancy. - Target OS: macOS 13+ / Linux. No Windows. - All paths are Unix-style. All times are stored UTC. - Decisions in this document are locked unless explicitly marked `(provisional)`. Override requires updating this document, not only code. +- §1 Stack Decisions, §2 Directory Layout, §3 doctor checklist, §22 Decision Log + have been rewritten for v4 r1. §4~§17 (DB schema, enums, hashing, template/ + persona/binding, session runtime, prompt envelope, artifact registry, run + events, fake adapter, state machines, errors, write set, SSE contract) are + language-neutral domain spec and remain valid for the Python implementation. +- v3 CC history (informational): - r1 applied CC-1 through CC-5. - r2 applied CC-6 through CC-10. - r3 applied CC-11 through CC-15. @@ -19,218 +30,188 @@ - r10 applies CC-29 through CC-31. - r11 applies CC-32. - r12 applies CC-33 through CC-35. -- r13 applies CC-39. +- r13 applied CC-39 (final v3 revision; superseded by v4 r1). ## 1. Stack Decisions ### 1.1 Workspace -- `pnpm 9` with workspaces. No Turbo. -- Node 22 LTS, pinned by `.nvmrc` and `package.json#engines`. -- TypeScript 5.6 with project references via `tsc -b`. -- `strict: true`. -- No `any` unless accompanied by an explicit annotation comment explaining why. +- **Python 3.12+**, managed by **uv** workspaces (`uv sync`, `uv add`, `uv run`). +- Pinned via `.python-version`. No Node, no pnpm, no tsc. +- `pyproject.toml` at repo root + per-package `pyproject.toml` under + `packages//` (uv workspace members). +- Imports are absolute. No `from . import *`. ### 1.2 Tooling -- Build: - - `tsup` for libraries, CJS + ESM dual output. - - `vite` for `apps/web`. - - `tsx` for `apps/cli`, `apps/api`, and `apps/worker` in dev. - - `node` for prod-ish local runs. -- Test: - - `vitest` with workspace config. - - Coverage via `@vitest/coverage-v8`. - - No coverage gate at M1. - - M9 adds coverage gate: >=70% lines on `packages/core`, `packages/session`, `packages/run-engine`. -- Lint/format: - - `biome`. - - One root config. -- Pre-commit: - - `lefthook`. - - Runs `biome check --write` on staged files. - - Runs `tsc -p tsconfig.typecheck.json --noEmit`. - - Runs related Vitest tests on changed packages. +| Concern | Choice | Notes | +|---------|--------|-------| +| Lint / format | **ruff** | One root `ruff.toml`. `ruff check .` + `ruff format --check .`. | +| Type check | **mypy --strict** | `mypy.ini` enables strict mode; tests relax `disallow_untyped_defs`. | +| Test | **pytest** + **pytest-asyncio** + **pytest-httpx** + **respx** | `pytest -q`. | +| Pre-commit | **pre-commit** (`.pre-commit-config.yaml`) | Runs ruff + mypy + pytest --collect-only. | +| Schema validation | **pydantic v2** + **pydantic-settings** | Replaces zod. | +| YAML | **PyYAML** | Persona/template YAML loaders. | +| JSON Schema | **jsonschema** (2020-12) | Artifact registry. | +| HTTP client | **httpx** (async) | OpenRouter / pricing fetch. | +| Logging | **structlog** + **rich** | Replaces pino. `_scrub_processor` redacts secrets before stderr / JSON sinks. | +| CLI | **typer** + **prompt_toolkit** | Replaces commander; prompt_toolkit drives the interactive REPL. | +| OS dirs | **platformdirs** | XDG data / state / config dirs. | +| Secrets | **keyring** | macOS Keychain / Linux Secret Service / Windows Credential Store. | ### 1.3 Database -- Postgres 16 via Docker Compose. -- Drizzle ORM + `drizzle-kit generate`. -- Generated SQL migrations are committed. -- Migrations are never auto-applied at runtime except through the explicit migration runner invoked by `devflow up`. -- Migration runner: - - `scripts/migrate.ts`. - - Takes `DATABASE_URL`. - - `devflow up` waits for Postgres health and then runs pending migrations. +- **SQLite 3 (WAL mode)** via **aiosqlite**, ORM: **SQLAlchemy 2.0 async**. +- Migrations: **Alembic** (baseline + per-feature revisions). +- WAL + `busy_timeout=5000` + `PRAGMA foreign_keys=ON` enforced at connect. +- Postgres (the v3 default) is parked: single-machine + single-user removes the + multi-process concurrency justification, and aiosqlite + the + `ux_active_run_repo_base` partial unique index covers the active-run + uniqueness invariant. Postgres can be reinstated for multi-tenant later. ### 1.4 Logging -- `pino`. -- `pino-pretty` in dev, JSON otherwise. -- Standard fields: - - `time` - - `level` - - `module` - - `runId?` - - `phaseId?` - - `role?` - - `eventId?` -- Levels: - - `trace`: transcript chunks only. - - `debug`: internal state transitions. - - `info`: run events. - - `warn`: recoverable errors. - - `error`: human-required or fatal errors. +- **structlog**, JSON sink to stderr by default, rich pretty sink when stdout is + a TTY. +- Standard fields: `time`, `level`, `module`, `run_id?`, `phase_id?`, `role?`, + `event_id?`, `interactive_session_id?`. +- `_scrub_processor` redacts OpenRouter / Anthropic / OpenAI / LangSmith / + GitHub / GitLab API keys and generic `Bearer …` tokens before emission. +- Levels: same semantics as v3 (`trace`/`debug`/`info`/`warn`/`error`). ### 1.5 Config -- Single Zod schema in `packages/core/src/config.ts`. -- Source precedence, high to low: - - `process.env` - - `.env.local` - - `.env` - - schema defaults -- Config is loaded once at process start, validated, frozen, and exported as typed `Config`. -- Config validation failure is fatal. -- Required keys at M1: - - `DATABASE_URL` - - `WORKSPACE_ROOT` - - `LOG_LEVEL` +- Single `pydantic-settings` BaseSettings in + `my_deepagent.config.Config` with `MYDEEPAGENT_` env prefix and optional TOML + source. +- Source precedence (high → low): explicit overrides → `os.environ` (with + `MYDEEPAGENT_` prefix) → `.env` → `config.toml` → schema defaults. +- Config is loaded once at process start, validated, frozen, and re-exported as + an immutable typed `Config`. +- Validation failure is fatal (exit code 2). +- Required keys at v0.1.0: + - `MYDEEPAGENT_DATABASE_URL` (default `sqlite+aiosqlite:////db.sqlite3`) + - `MYDEEPAGENT_WORKSPACE_ROOT` + - `MYDEEPAGENT_LOG_LEVEL` + - `MYDEEPAGENT_OPENROUTER_API_KEY` when the OpenRouter backend is enabled + (resolution order: config → env → OS keyring → error). +- Path canonicalization: `workspace_root` is resolved via `Path.resolve()` at + config load. Any path entering the system is canonicalized before storage or + hashing. -Additional required keys when `openrouter` backend is enabled: +Backend registration (deepagents-flavored): -- `OPENROUTER_API_KEY` - -- M5 adds: - - `TEMPORAL_ADDRESS` -- Path canonicalization: - - `WORKSPACE_ROOT` is resolved through `fs.realpathSync` and stored as an absolute path at config load. - - Any path entering the system must be canonicalized before storage or hashing. - - `repo_path` and `worktree_root` rules are defined in section 4. - -Backend registration: - -```ts -const BackendConfig = z.object({ - id: Backend, // codex | claude | fake | openrouter - enabled: z.boolean(), - binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude when enabled - apiBaseUrl: z.string().optional(), // openrouter only; default https://openrouter.ai/api/v1 - apiKeyEnv: z.string().optional(), // openrouter only; default OPENROUTER_API_KEY -}); +```python +class BackendConfig(BaseModel, frozen=True): + id: Backend # openrouter | anthropic | openai | google | fake + enabled: bool + api_base_url: str | None = None # openrouter default https://openrouter.ai/api/v1 + api_key_env: str | None = None # default MYDEEPAGENT_OPENROUTER_API_KEY ``` - `fake` is always available. -- `codex` and `claude` are available only when: - - `enabled=true` - - binary resolves at process start. -- `openrouter` is available only when: - - `enabled=true` - - the env var named by `apiKeyEnv` (default `OPENROUTER_API_KEY`) is present and non-empty. - - `binaryPath` is ignored for `openrouter`. -- Resolution failure: - - `doctor` warns. - - binding fails fast at run start with `human_required:backend_unavailable`. -- Binding reads from `config.backends`, never directly from `PATH`. +- `openrouter` is available only when enabled and the resolved key is present. +- Doctor warns on misconfig; binding fails fast at run start with + `human_required:backend_unavailable`. -### 1.6 HTTP +### 1.6 HTTP / SSE -- `fastify` 5. -- `@fastify/sensible`. -- SSE primary strategy: - - Try `fastify-sse-v2`. - - Fastify 5 compatibility is not assumed. - - M1 includes a smoke test. -- SSE fallback: - - Native `reply.raw`. - - Headers: - - `content-type: text/event-stream` - - `cache-control: no-cache` - - `connection: keep-alive` - - Write `data: \n\n`. - - Manage heartbeats and reconnect manually. -- WebSocket is deferred unless SSE fails under transcript volume. +- **FastAPI** + **uvicorn** + **sse-starlette** for the M8-Py REST + SSE + surface (v3 r13 §17 contract unchanged: same event types, same headers, + same `data: \n\n` framing). +- Body validation via the same pydantic v2 models used elsewhere. +- WebSocket remains deferred unless SSE fails under transcript volume. ## 2. Directory Layout +v4 r1 collapses the v3 multi-package monorepo into a single `my-deepagent/` +project. The TS `apps/`, `packages/`, `tests/`, `scripts/` trees were deleted +in `0e61b2d`; v3 §4~§17 module-by-module spec still applies but each module +now lives under `my_deepagent/.py` instead of +`packages//src/.ts`. + ```text -devflow/ -├── package.json -├── pnpm-workspace.yaml -├── tsconfig.base.json -├── biome.json -├── lefthook.yml -├── vitest.workspace.ts -├── docker-compose.yml -├── .nvmrc -├── .env.example +/ ├── docs/ -│ ├── plan.md -│ ├── adr/ +│ ├── plan.md # this document +│ ├── plan-v4-draft.md # v4 r1 design memo (informational) │ └── schemas/ -│ ├── artifacts/ -│ ├── personas/ -│ └── templates/ -├── scripts/ -│ ├── migrate.ts -│ └── seed.ts -├── packages/ -│ ├── core/ -│ │ └── src/ -│ │ ├── config.ts -│ │ ├── enums.ts -│ │ ├── hash.ts -│ │ ├── errors.ts -│ │ ├── template.ts -│ │ ├── persona.ts -│ │ ├── binding.ts -│ │ ├── prompt-envelope.ts -│ │ ├── artifact-schema.ts -│ │ ├── run-event.ts -│ │ └── index.ts -│ ├── db/ -│ │ └── src/ -│ │ ├── schema/ -│ │ ├── migrations/ -│ │ ├── repositories/ -│ │ └── client.ts -│ ├── session/ -│ │ └── src/ -│ │ ├── adapter.ts -│ │ ├── fake.ts -│ │ ├── tmux.ts -│ │ ├── profiles/ -│ │ │ ├── codex.ts -│ │ │ └── claude.ts -│ │ ├── recovery.ts -│ │ └── transcript.ts -│ ├── harness/ -│ │ └── src/ -│ │ ├── git.ts -│ │ ├── worktree.ts -│ │ ├── runner.ts -│ │ ├── review.ts -│ │ └── backtest.ts -│ ├── run-engine/ -│ │ └── src/ -│ │ ├── engine.ts -│ │ ├── phase-executor.ts -│ │ └── approval.ts -│ └── workflows/ -│ └── src/ -│ ├── workflow.ts -│ └── activities.ts -├── apps/ -│ ├── api/ -│ ├── web/ -│ ├── cli/ -│ └── worker/ -└── tests/ - ├── e2e/ - └── fixtures/ +│ ├── artifacts/ # JSON Schema 2020-12 (language-neutral) +│ ├── personas/ # YAML persona seed (language-neutral) +│ └── templates/ # YAML workflow templates +├── docker-compose.yml # Postgres + Temporal (still relevant for M5-Py) +├── .env.example +├── .gitignore +├── my-deepagent-seed/ # v0.1.0 bootstrap kit (historical, may be pruned) +└── my-deepagent/ + ├── pyproject.toml # uv workspace root + ├── uv.lock + ├── ruff.toml + ├── mypy.ini + ├── alembic.ini + ├── .pre-commit-config.yaml + ├── CHANGELOG.md + ├── alembic/ + │ ├── env.py + │ └── versions/ # baseline + per-feature migrations + ├── docs/schemas/ # mirror of repo-root docs/schemas for loader convenience + ├── src/my_deepagent/ + │ ├── config.py # pydantic-settings Config (replaces §1.5 zod schema) + │ ├── enums.py # closed-set enums (§5) + │ ├── errors.py # error taxonomy (§18) + │ ├── hash.py # content-addressed hashing (§6) + │ ├── persona.py # Persona + loader (§7.2) + │ ├── workflow.py # WorkflowTemplate + loader (§7.1) + │ ├── binding.py # autoSelect / override / consent store (§7.4) + │ ├── artifact_schema.py # JSON Schema 2020-12 registry (§10) + │ ├── run_event.py # event types + idempotency keys (§11, §13.1) + │ ├── prompt_envelope.py # envelope builder (§9) + │ ├── budget.py # BudgetTracker (v4-new) + │ ├── secrets.py # config → env → keyring resolution chain + │ ├── keys.py # OS keyring wrapper + │ ├── audit.py # append-only JSONL audit log (v4-new) + │ ├── logging.py # structlog + secret scrubber (§1.4) + │ ├── governance.py # first-run consent (v4-new) + │ ├── i18n/ # ko / en catalog + │ ├── recovery.py # sweep_orphan_runs (§19) + │ ├── session.py # deepagents adapter (§8.5, v4-new) + │ ├── engine.py # WorkflowEngine — phase loop (§15) + │ ├── persistence/ + │ │ ├── db.py # SQLAlchemy 2 async engine + │ │ ├── models.py # ORM models (§4) + │ │ └── checkpointer.py # LangGraph SqliteSaver context + │ ├── middleware/ + │ │ ├── cost.py # CostMiddleware (v4-new) + │ │ ├── budget.py # BudgetMiddleware (v4-new) + │ │ ├── audit.py # AuditToolMiddleware + │ │ ├── safety.py # SafetyShellMiddleware (deny-path / destructive command) + │ │ └── artifact_watcher.py # ArtifactWatcherMiddleware + │ ├── monitoring/ + │ │ ├── pricing.py # OpenRouter pricing cache + │ │ └── cost_estimator.py # pre-run preview + │ ├── cli/ # typer-driven CLI + │ │ ├── main.py # entry (interactive REPL when no subcommand) + │ │ ├── doctor.py # §3 doctor checks (Python/uv version) + │ │ ├── init.py + │ │ ├── keys_cmd.py + │ │ ├── run.py + │ │ ├── runs.py + │ │ ├── stats.py + │ │ └── interactive.py # prompt_toolkit REPL + │ ├── tui/ + │ │ └── approval.py # tri-state approval prompt + │ └── slash.py # REPL slash commands + └── tests/ + ├── unit/ # pure-Python unit tests + └── integration/ # async + persistence + real OpenRouter (gated) ``` -## 3. `devflow doctor` +Future trees deferred: +- `apps/api/`, `apps/worker/` (M5-Py / M8-Py): FastAPI app and temporalio + worker. v4 r1 keeps them out until M5 lands. +- `apps/web/`: Web GUI port is out of scope for v0.1.0 (separate milestone). + +## 3. `mydeepagent doctor` Exit codes: @@ -245,34 +226,42 @@ Each check emits: - `detail` - `remediation` -Closed check list: +Closed check list (v4 r1, 8 checks — Node/pnpm/Docker/Drizzle dropped): -1. Node version satisfies `>=22.0.0 <23`. -2. pnpm version `>=9.0.0`. -3. `tmux` exists, version `>=3.3`. -4. `git` version `>=2.40`. -5. Docker daemon reachable. -6. Postgres container running, `pg_isready` ok, `DATABASE_URL` connects. -7. No pending Drizzle migrations. -8. `WORKSPACE_ROOT` exists and is writable. -9. `.env` resolves to valid `Config`. -10. `codex` in `PATH`, warn-only. -11. `claude` in `PATH`, warn-only. -12. Free disk on `WORKSPACE_ROOT` partition: - - warn under 10GB. - - fail under 2GB. - - target green threshold: >=5GB. -13. OpenRouter API reachable: when `openrouter` backend is enabled, `GET ${apiBaseUrl}/models` with the bearer key. - - pass on `200`. - - fail on `401`. - - warn on any other non-200 or network error. +1. **python**: `python --version` satisfies `>=3.12,<3.14`. +2. **uv**: `uv --version` resolves (any). +3. **git**: `git --version` `>=2.40`. +4. **workspace_root**: `MYDEEPAGENT_WORKSPACE_ROOT` exists, is a directory, + and is writable. +5. **config+governance**: `Config` loads from env + `.env` + `config.toml` + without ValidationError; first-run governance consent file exists (or is + created interactively on first run only). +6. **openrouter_api_key**: resolution chain (config → env → OS keyring) + yields a non-empty value. Warn-only when the OpenRouter backend is not + enabled. +7. **openrouter_ping + pricing upsert**: `GET https://openrouter.ai/api/v1/models` + with the bearer key. + - `200` → pass; pricing rows are upserted into `model_pricing` for use by + the `mydeepagent run` cost preview. + - `401` → fail. + - any other non-200 / network error → warn. +8. **disk+sqlite integrity**: + - Free disk on the `workspace_root` partition: warn under 10 GB, fail under + 2 GB, green target ≥ 5 GB. + - SQLite DB file (if present) opens and `PRAGMA integrity_check` returns + `ok`. Output: -- Human table by default. +- Rich human table by default. - `--json` for machine-readable output. - `--quiet` prints only nonzero results. -- `--list-orphans` lists orphaned worktrees only; it never removes them. + +Notes: +- `tmux` / `Docker` / `Postgres` / `pg_isready` / drizzle migration checks from + v3 §3 are dropped in v4 r1 — the v0.1.0 runtime is SQLite-only and tmux is + out of scope for the deepagents-driven session model. +- `--list-orphans` and friends are owned by `mydeepagent runs list/show` (§19). ## 4. Database Schema @@ -882,53 +871,91 @@ Exhaustion creates a human gate with `recoveryHint`. - persist `last_capture_seq`. - release advisory lock. -### 8.5 OpenRouter Adapter +### 8.5 OpenRouter Adapter — v4 r1 deepagents rewrite -HTTP-based `SessionAdapter` for the `openrouter` backend. No PTY, no tmux. +**Supersedes the v3 marker-extraction HTTP adapter (CC-39).** In v4 the +OpenRouter integration is a multi-turn, tool-using agent driven by LangChain +`deepagents` 0.6.1 — no single-shot completions, no `<<>>` +markers, no transcript replay reconstruction. -Method mapping: +Construction — `my_deepagent.session.build_agent(persona, run_id, …)`: -- `start`: - - allocate in-memory session state `{ messages: [], lastResponseAt }`. - - push the backend prelude (§9.4) as a `system` message. -- `sendPrompt`: - - append the envelope `instructions` (full §9.1 envelope text) as a `user` message. - - POST `${apiBaseUrl}/chat/completions` with `Authorization: Bearer ${apiKey}` and body `{ model: persona.modelConfig.model, messages, max_tokens?, temperature?, top_p? }`. - - append the assistant response as an `assistant` message. -- `probe`: - - alive iff session state is held in the SessionManager map. - - `paneActive` is always `true`. -- `resume`: - - in-memory messages are lost on process restart. - - attempt restoration by replaying `tui_transcript_chunks` for the session into the messages array. - - on irrecoverable failure, fall through to `rebootstrap`. -- `rebootstrap`: - - clear messages and re-push the prelude. -- `capture`: - - split assistant responses into line-sized `TranscriptChunk`s and persist via the standard chunk pipeline. -- `dispose`: - - drop the in-memory entry. +```python +llm = ChatOpenAI( + model=persona.model, # e.g. "openrouter:deepseek/deepseek-chat" + base_url=config.openrouter_api_base, # https://openrouter.ai/api/v1 + api_key=resolve_openrouter_api_key(), + timeout=persona.model_params.timeout, +) +agent = deepagents.create_deep_agent( + model=llm, + tools=[], # base tools come from LocalShellBackend + instructions=persona.system_prompt, + subagents=[_subagent_to_dict(s) for s in persona.subagents], + middleware=[ + SafetyShellMiddleware(...), # destructive command + deny-path guard + AuditToolMiddleware(...), # append-only JSONL audit log + ArtifactWatcherMiddleware(...), # write_file/edit_file detection + CostMiddleware(...), # usage_metadata + budget ledger + ], + backend=LocalShellBackend( # bash + read_file + write_file + edit_file + ls + cwd=worktree_root, + # `permissions` kwarg is intentionally omitted for local_shell backend + # (deepagents 0.6.1 NotImplementedError workaround — enforcement moves + # to SafetyShellMiddleware). + ), +) +``` + +Method mapping (driven by `WorkflowEngine` rather than a v3-style adapter +interface): + +- **Start**: `create_deep_agent` returns a `CompiledStateGraph` per phase. + No persistent session object is shared across phases — each phase is a + fresh agent invocation parameterized by persona + envelope. +- **Send prompt**: `await agent.ainvoke({"messages": [HumanMessage(envelope)]})` + where `envelope` is built by `WorkflowEngine._build_envelope` (§9 with the + artifact JSON Schema inlined so the model sees the exact required fields). +- **Tool use**: native `read_file` / `write_file` / `edit_file` / `ls` / + `bash` calls are emitted by the model and dispatched through + LocalShellBackend, recorded by AuditToolMiddleware, gated by + SafetyShellMiddleware. +- **Probe / resume / rebootstrap / dispose**: not applicable — the agent is + ephemeral per phase. Crash recovery operates at the run/phase level via + `sweep_orphan_runs` (§19), not at a session-adapter level. Artifact production: -- HTTP agents cannot write to the workspace filesystem. The backend prelude (§9.4) instructs the model to emit the artifact body inside a single fenced block at the tail of the response: +- The model writes the artifact directly to `expected_artifact_path` via the + `write_file` tool. ArtifactWatcherMiddleware observes the tool call and + notifies the engine. +- The envelope inlines the artifact's JSON Schema definition so the LLM has + the exact required fields. +- Schema validation is performed by `ArtifactSchemaRegistry.validate` on the + written file (§10). On failure, the engine retries once with a repair + prompt; second failure raises `human_required:artifact_invalid_after_repair`. -```text -<<>> -{ "...": "..." } -<<>> -``` - -- The adapter extracts the JSON between the markers and writes it atomically (temp file + rename) to `expectedArtifactPath`. -- Missing markers, multiple blocks, or JSON parse failure are treated as `artifact.invalid` and follow the standard repair/timeout flow in §10.3. - -Error mapping: +Error mapping (preserved from CC-39, applied per-call by the LangChain +exception path): - HTTP `401` → `human_required:backend_auth_failed`. -- HTTP `429` → `recoverable:rate_limited` (exponential backoff: 1s, 2s, 4s, max 30s). +- HTTP `429` → `recoverable:rate_limited` (exponential backoff: 1 s, 2 s, 4 s, + max 30 s, owned by langchain-openai retries). - HTTP `5xx` → `recoverable:network_blip`. -- HTTP `400` with body code `model_not_found` → `human_required:model_unavailable`. -- Network error before any response → `recoverable:network_blip`. +- HTTP `400` with `model_not_found` → `human_required:model_unavailable`. +- BudgetTracker pre-call rejection → `human_required:token_budget_exceeded`. +- SafetyShellMiddleware blocked tool call → `human_required:tool_quota_exceeded`. + +Known v0.1.0 limitations: + +- `usage_metadata` is sometimes empty on responses forwarded by OpenRouter + (deepagents wraps the underlying ChatOpenAI response so token counts may + not surface). The recorder still fires and `LlmCallRow` is persisted, but + `input_tokens` / `output_tokens` may read 0. v0.2 will probe additional + response shapes (raw chunks / callbacks). +- Anthropic models via OpenRouter currently fail with a `tool_calls.args` + JSON-string vs dict ValidationError inside `langchain-openai` 1.2.1. + Workaround: pin DeepSeek personas via `BindingOverride`. Tracking for v0.2. ## 9. Prompt Envelope @@ -1549,19 +1576,22 @@ Reconnect: ## 18. Errors -`packages/core/src/errors.ts`: +v4: `my_deepagent.errors.MyDeepAgentError` (replaces v3 `DevflowError` 1:1): -```ts -type ErrorClass = 'recoverable' | 'human_required' | 'fatal'; +```python +class ErrorClass(StrEnum): + RECOVERABLE = "recoverable" + HUMAN_REQUIRED = "human_required" + FATAL = "fatal" -class DevflowError extends Error { - readonly class: ErrorClass; - readonly code: string; - readonly runId?: string; - readonly phaseId?: string; - readonly recoveryHint?: string; - readonly cause?: unknown; -} + +class MyDeepAgentError(Exception): + error_class: ErrorClass + code: str + run_id: UUID | None + phase_id: UUID | None + recovery_hint: str | None + cause: BaseException | None ``` Recoverable: @@ -1587,6 +1617,12 @@ Human required: - `review_dispute_unresolved` - `backend_auth_failed` - `model_unavailable` +- `token_budget_exceeded` *(v4 r1: BudgetTracker rejects a call whose + estimated cost would breach the per-run, per-day, or per-persona-daily cap + with `on_hit=block`.)* +- `tool_quota_exceeded` *(v4 r1: SafetyShellMiddleware blocked a tool call + due to deny-path / destructive-command policy, or a per-phase tool-call + cap was hit.)* Fatal: @@ -1857,7 +1893,13 @@ M5+: | CC-36 | SSE reconnect wording used per-run `seq` for global stream even though `seq` is not globally monotonic | `/sse/runs/:runId` uses per-run `seq`; `/sse/global` uses global `run_events.id` and emits only scope=`both` summary events | | CC-37 | Run SSE replay could emit historical derived events after the first page | run SSE drains historical rows up to a high-water `seq` with only `run.event_appended`, then switches to live derived events | | CC-38 | Normal phase start changed run state to `planning` / `executing` without a summary event source | `phase.started` payload includes `runState`; SSE derives `run.state_changed` from that live event | -| CC-39 | No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models | add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` | +| CC-39 | No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models | add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` (final v3 entry — v4 reinterprets the OpenRouter integration as the deepagents-driven session adapter; the standalone HTTP `OpenRouterAdapter` from CC-39 is **superseded by DR-1**) | + +### Decision Records (v4) + +| ID | Decision | Rationale | Impact | +|----|----------|-----------|--------| +| DR-1 | **v3 → v4 major bump: delete TS monorepo, rewrite in Python on LangChain `deepagents`.** | (1) Claude/Anthropic direct API cost is prohibitive for a single-user toolchain. (2) OpenRouter cost-tuned models (DeepSeek, etc.) require a multi-turn, tool-using agent harness; `deepagents` is Python-only with no 1:1 TS port. (3) Switching languages is shorter than reimplementing the harness. | Step 0 (commit `0e61b2d`) deleted `apps/`, `packages/`, `tests/`, `scripts/`, pnpm/tsconfig metadata. The Python rewrite lives at `my-deepagent/` and reached Step 15 (real OpenRouter E2E PASS, ~$0.05/run) before the v3 codebase was removed. CC-39's separate `OpenRouterAdapter` is replaced by `my_deepagent.session.build_agent` (deepagents 0.6.1 with LocalShellBackend + SafetyShellMiddleware). v3 CC counters frozen; v4 begins its own series. Recovery: `git checkout pre-python-rewrite -- `. | ### Future Open Questions @@ -1868,6 +1910,8 @@ M5+: ## 23. Kickoff Order +v3 historical order (TS, completed up to M8 before the v4 pivot): + 1. M1.1: repo + pnpm + tsconfig + biome + lefthook + vitest workspace. 2. M1.2: docker-compose + Postgres healthcheck + drizzle-kit + first migration. 3. M1.3: `apps/cli` skeleton + `devflow doctor`. @@ -1880,3 +1924,27 @@ M5+: 10. M3.3: engine-shaped harness running a single fake phase end-to-end. 11. M4: assemble run engine; lock contract; full fake `development@1` minus reviewers. 12. M5 in parallel with M6 once M4 is green. + +v4 r1 order (Python, status as of v0.1.0): + +| Step | Scope | Status | +|------|-------|--------| +| Step 0 | Scaffold `my-deepagent/` (uv workspace, ruff, mypy, alembic, .pre-commit) | DONE (`17ba5d7`) | +| Step 1 | `devflow_core` → `my_deepagent.{config,enums,errors,hash,persona,prompt_envelope,run_event}` | DONE | +| Step 2 | `devflow_db` → `my_deepagent.persistence.{db,models,checkpointer}` + Alembic baseline | DONE | +| Step 3 | `mydeepagent doctor` (typer) | DONE | +| Step 4 | Persona / workflow seeding + binding (`my_deepagent.{persona,workflow,binding}`) | DONE | +| Step 5 | Artifact schema registry (`my_deepagent.artifact_schema`) | DONE | +| Step 6 | Distribution: init/login/logout/keys, governance consent, i18n (ko/en) | DONE | +| Step 7 | WorkflowEngine + ArtifactWatcherMiddleware (replaces v3 §15 in-process engine) | DONE | +| Step 8 | Budget guardrails (`my_deepagent.budget` + cost preview + CostMiddleware) | DONE | +| Step 9 | Crash recovery + concurrency (`my_deepagent.recovery` + `mydeepagent runs …`) | DONE | +| Step 10 | Interactive REPL (`mydeepagent` no-subcommand + slash commands) | DONE | +| Step 11 | Audit log + structlog secret scrubbing | DONE | +| Step 12 | Doctor 8-check + OpenRouter pricing fetch + `mydeepagent pricing` | DONE | +| Step 13 | Tmux adapter (M6-Py) | DEFERRED — not in v0.1.0 | +| Step 14 | TUI recovery (M7-Py) | DEFERRED — not in v0.1.0 | +| Step 15 | End-to-end real OpenRouter integration test | DONE (`733c9be`) | +| Step 0-purge | Delete v3 TS monorepo per DR-1 | DONE (`0e61b2d`) | +| M5-Py | Temporal worker (`apps/worker`) | NEXT | +| M8-Py | FastAPI + SSE (`apps/api`) | NEXT |