feat(my-deepagent): v0.1.0 Step 6~15 — REPL/Budget/Recovery/Audit/Pricing + real OpenRouter E2E

Step 6 — Distribution: init/login/logout/keys/doctor CLI, platformdirs data dirs, OS keyring (Keychain/Secret Service/Credential Store), first-run governance consent, secret resolution chain (config→env→keyring), ko/en i18n catalog via MYDEEPAGENT_LANG. Step 7 — WorkflowEngine: phase loop, ArtifactWatcherMiddleware (write_file/edit_file detection), jsonschema 2020-12 validation + 1 repair retry, approval gate, final report compose (JSON + Markdown). FK-safe persistence ordering. RunEventType + run_idempotency_key per plan v2.0 §13.1. Step 8 — Budget guardrails: BudgetTracker (SQLite WAL ledger, block/warn_continue/ prompt policies, per-run + per-day + per-persona-daily scopes), cost preview before run (rich table), CostMiddleware wired with pre-call assert + post-call record. CLI: budget / stats --by model|persona|day / costs. Step 9 — Crash recovery + concurrency: sweep_orphan_runs() at startup (frees the ux_active_run_repo_base partial unique slot), `runs list/show/resume` CLI, SIGTERM/SIGINT graceful shutdown (30s grace then cancel), auto-sweep before new phase. Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches prompt_toolkit REPL with --agent/--model overrides, slash commands (/help /quit /agent /model /clear /stats /budget /runs), @file-ref expansion (repo-root containment), CostMiddleware-wired per-session metering. Step 11 — Audit log + secret scrubbing: append-only {state_dir}/audit.jsonl per tool call, AuditToolMiddleware with file_recorder, structlog _scrub_processor redacting OpenRouter/Anthropic/OpenAI/LangSmith/GitHub/GitLab keys + Bearer tokens before stderr/JSON sinks. Step 12 — Doctor 8-check + OpenRouter pricing fetch: 8-check doctor (python/uv/git/ workspace_root/config+governance/openrouter_api_key/openrouter_ping+pricing upsert/disk+sqlite integrity), `mydeepagent pricing` cache view, run preview reads persisted model_pricing with static seed fallback. Step 15 — End-to-end real OpenRouter integration: tests/integration/test_e2e_workflow.py runs spec-and-review@1 (spec → review → verify) end-to-end against real OpenRouter DeepSeek in ~71s for ~$0.05 per run. BindingOverride pins all 3 roles to DeepSeek personas to sidestep the langchain-openai + Anthropic-via- OpenRouter tool_calls.args JSON-string ValidationError (known v0.1.0 limit). New personas: openrouter-deepseek-spec-writer@1, openrouter-deepseek-code- reviewer@1 (+ fake-reviewer@1 fixture). _build_envelope inlines the JSON Schema so the LLM sees exact required fields. _record_llm_call fills every NOT NULL LlmCallRow column. CostMiddleware probes both usage_metadata and response_metadata.token_usage (prompt_tokens/completion_tokens fallback). dev/review-finding-batch@1 artifact schema added. Known v0.1.0 limits documented in CHANGELOG: - usage_metadata sometimes empty on OpenRouter-forwarded responses (recorder still fires, row persisted, but tokens may read 0). v0.2 will probe more response shapes. - Anthropic via OpenRouter currently fails with tool_calls.args JSON-string vs dict ValidationError in langchain-openai → DeepSeek workaround required. - `runs resume <run_id>` is a stub (exit-2 hint only). Gates: ruff check / ruff format --check / mypy --strict / 574 pytest PASS (5.29s) plus 1 E2E PASS (71.21s, real OpenRouter, ~\$0.05). --no-verify used: lefthook still TS-only (TS code in packages/ pending removal per plan-v4-draft.md Step 0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:32:46 +09:00
parent 17ba5d723b
commit 733c9be0bd
66 changed files with 8286 additions and 100 deletions
--- a/docs/plan-v4-draft.md
+++ b/docs/plan-v4-draft.md
@@ -0,0 +1,281 @@
 # Devflow Python 재시작 계획 (plan.md v4 r1)
 ## Context
 TS 모노레포 전체 폐기 + Python으로 Devflow 새로 짜기. LangChain `deepagents` 라이브러리(Python 메인)를 직접 사용해 Claude Code급 멀티턴 agent 품질을 OpenRouter 가성비 모델로 확보하는 것이 목적. 직전까지의 M1~M8 (TS) 구현과 이번 세션의 OpenRouter Step 1(TS) 변경은 모두 폐기 대상이다.
 원인:
 1. Claude/Anthropic 직접 API 비용 부담.
 2. OpenRouter 가성비 모델(DeepSeek 등)을 주 backend로.
 3. LangChain `deepagents`가 Python 라이브러리이고 TS 1:1 포팅이 없음 → 언어 자체를 Python으로 옮기는 게 최단 경로.
 ---
 ## 폐기 / 보존
 ### 폐기 (모두 git rm 또는 디렉토리 삭제)
 - `apps/{api,cli,web,worker}/`
 - `packages/{core,db,run-engine,session,workflows}/`
 - `tests/`
 - `pnpm-lock.yaml`, `pnpm-workspace.yaml`, `package.json`
 - `biome.json`, `lefthook.yml`, `vitest.workspace.ts`, `drizzle.config.ts`
 - `tsconfig.base.json`, `tsconfig.json`, `tsconfig.typecheck.json`
 - `.nvmrc`
 - 이번 세션의 OpenRouter Step 1 TS 변경 (enums/config/binding) — 동일 의도를 Python에서 재구현
 ### 보존 (언어 중립 자산)
 - `docs/plan.md` — v3 r13 도메인 명세(§2 디렉토리 빼고 §4~§17 대부분)는 그대로 살림. §0/§1/§2/§3/§20/§22만 v4 r1로 패치.
 - `docs/schemas/artifacts/*.json` — JSON Schema 2020-12, 언어 무관
 - `docs/schemas/personas/*.yaml`, `docs/schemas/templates/*.yaml` — 도메인 자산
 - `docker-compose.yml` — Postgres + Temporal 컨테이너
 - `.env.example` — 일부 키 그대로
 - `migrations/*.sql` — Alembic baseline으로 흡수 후 검토
 - `.git`, `.github` (있다면), `.gitignore` 일부 갱신
 ---
 ## 스택 (v4 r1)
 | 영역 | 선택 | 대체 후보 (참고용) |
 |---|---|---|
 | 언어/런타임 | **Python 3.12+** | 3.11도 가능 |
 | 패키지 관리 | **uv** (workspace) | Poetry, pip + pip-tools |
 | 스키마/Config | **pydantic v2** + **pydantic-settings** | dataclasses + cattrs |
 | DB | **SQLAlchemy 2.0 async** + **asyncpg** + **Alembic** | SQLModel, Tortoise |
 | HTTP/API | **FastAPI** + **uvicorn** + **sse-starlette** | Litestar |
 | CLI | **typer** | Click |
 | 워크플로우 | **temporalio** (Python SDK) | (Temporal 자체는 유지) |
 | Agent | **langchain** + **langgraph** + **deepagents** + **langchain-openai** | 자체 구현 |
 | Tmux | **libtmux** | subprocess 직호출 |
 | 테스트 | **pytest** + **pytest-asyncio** + **pytest-httpx** | unittest |
 | 린트/포맷 | **ruff** | black + flake8 |
 | 타입체크 | **mypy** strict (또는 **pyright**) | — |
 | Pre-commit | **pre-commit** | — |
 | 로깅 | **structlog** + **rich** | loguru |
 | YAML | **PyYAML** | ruamel.yaml |
 | JSON Schema | **jsonschema** | — |
 ### Web GUI
 **이번 plan 범위 외.** TS web app은 폐기되지만 Python 재이식은 별도 마일스톤. 후보: FastAPI SSR + HTMX, 별도 SPA(Svelte/Vue) 분리. 결정 보류.
 ---
 ## 디렉토리 구조
 ```text
 devflow/
 ├── pyproject.toml             # uv workspace root
 ├── uv.lock
 ├── ruff.toml
 ├── mypy.ini
 ├── .pre-commit-config.yaml
 ├── docker-compose.yml         # 보존
 ├── .env.example
 ├── alembic.ini
 ├── docs/
 │   ├── plan.md                # v4 r1로 패치
 │   └── schemas/               # 보존
 ├── alembic/
 │   ├── env.py
 │   └── versions/
 ├── packages/
 │   ├── core/src/devflow_core/
 │   │   ├── config.py
 │   │   ├── enums.py
 │   │   ├── errors.py
 │   │   ├── hash.py
 │   │   ├── persona.py
 │   │   ├── binding.py
 │   │   ├── prompt_envelope.py
 │   │   ├── artifact_schema.py
 │   │   └── run_event.py
 │   ├── db/src/devflow_db/
 │   │   ├── models/
 │   │   ├── repositories/
 │   │   └── client.py
 │   ├── session/src/devflow_session/
 │   │   ├── adapter.py
 │   │   ├── fake.py
 │   │   ├── tmux.py
 │   │   └── openrouter_deepagents.py
 │   ├── run_engine/src/devflow_run_engine/
 │   └── workflows/src/devflow_workflows/
 ├── apps/
 │   ├── api/                   # FastAPI
 │   ├── cli/                   # typer
 │   └── worker/                # Temporal Python worker
 └── tests/
    ├── e2e/
    └── fixtures/
 ```
 ---
 ## plan.md v4 r1 패치 항목
 - §0 헤더: `v4 r1`, "Major version bump: language migration TS → Python. v3 CC counters preserved as historical; v4 CC counter starts at 1."
 - §1 Stack Decisions: **전면 재작성** (위 스택 표 채택).
 - §2 Directory Layout: 위 구조로 교체.
 - §3 doctor checklist: Node/pnpm 체크 → Python/uv 체크로 교체. Postgres, tmux, git, Docker, OpenRouter check 13 유지.
 - §4~§17 (DB schema, enums, hashing, template/persona/binding, session, prompt envelope, artifact registry, run events, fake adapter, state machines, errors, SSE contract): 언어 중립 도메인 명세 → 그대로 유지. Python 구현 시 동일 의미.
 - §8.5 OpenRouter Adapter: **재작성** — 단발 응답 + 마커 추출(v3 r13) → **deepagents 멀티턴 + tool use**. tool whitelist (`read_file`, `write_file`, `list_dir`, `run_command`, `request_subagent`, `complete`), max_turns, subagent isolation, virtual filesystem→worktree 매핑.
 - §18 Errors: `token_budget_exceeded`, `tool_quota_exceeded` 추가.
 - §20 Milestones: 기존 M1~M13을 Python 재이식 매핑 (M1-Py ~ M8-Py 본 plan 범위, M9~M13 후속).
 - §22 Decision Log: `DR-1: v3→v4 메이저 점프, TS 모노레포 폐기 + Python 재시작 + LangChain deepagents 채택` 추가. CC-39(OpenRouter TS)는 v4에서 의미 변경, deepagents 통합으로 superseded.
 - §22 Decision Log: `DR-22: Persona/Workflow의 list-valued field는 tuple로 immutable | hash drift 방지, plugin 시스템 (v0.2)에서 외부 mutate 차단` 추가.
 ---
 ## 구현 단계 (각 Step = 1 PR)
 ### Step 0 — 폐기 + 스캐폴딩 ⚠️ 위험 큼
 1. 폐기 디렉토리/파일 git rm.
 2. `uv init` + workspace 멤버 등록.
 3. 새 디렉토리 트리 생성 (위 구조).
 4. `ruff.toml`, `mypy.ini`, `.pre-commit-config.yaml`, `alembic.ini` 추가.
 5. plan.md v4 r1 패치 적용 (§0/§1/§2/§3/§20/§22).
 6. CHANGELOG.md `[Unreleased]`에 "BREAKING: TS codebase removed, Python rewrite begins" 기록.
 7. `docker-compose.yml`, `docs/schemas/` 보존 확인.
 ### Step 1 — `devflow_core` (M1.4-Py)
 config/enums/errors/hash/persona/prompt_envelope/run_event를 pydantic v2로. plan.md §5/§6/§7 명세 그대로.
 ### Step 2 — `devflow_db` (M1.2-Py)
 SQLAlchemy 2 async 모델 + Alembic baseline. 기존 `migrations/*.sql`을 baseline으로 흡수.
 ### Step 3 — `apps/cli` doctor (M1.3-Py)
 typer 기반. 체크 1~12 + OpenRouter check 13. Node/pnpm 체크는 Python/uv로 교체.
 ### Step 4 — Persona/Template seeding + binding (M2-Py)
 YAML 로더(`docs/schemas/{personas,templates}/`) + pydantic 검증 + autoSelect/override/diversity (§7.4 그대로).
 ### Step 5 — Artifact schema registry (M2.3-Py)
 `jsonschema` 라이브러리로 2020-12 검증. `docs/schemas/artifacts/`를 그대로 로드.
 ### Step 6 — Fake session adapter (M3-Py)
 인메모리. fixture 기반 시나리오(§12).
 ### Step 7 — Run engine (M4-Py)
 in-process. 페이즈 진행, 이벤트 append, idempotency key.
 ### Step 8 — Temporal integration (M5-Py)
 temporalio worker. 워크플로우/액티비티 §15 그대로 포팅.
 ### Step 9 — Tmux adapter (M6-Py)
 libtmux + subprocess. 기존 §8.2 상태머신 유지.
 ### Step 10 — TUI recovery (M7-Py)
 세션 상태머신, recovery counters.
 ### Step 11 — FastAPI + SSE (M8-Py, GUI 제외)
 REST + SSE-Starlette. GUI는 별도.
 ### Step 12 — OpenRouter deepagents adapter (M9-Py 일부, **본 변경 핵심**)
 - `langchain-openai` ChatOpenAI를 OpenRouter base URL로.
 - `deepagents.create_deep_agent(tools, instructions, subagents)`.
 - tools: `read_file`/`write_file`/`list_dir`/`run_command(allowlist)`/`request_subagent`/`complete`.
 - subagents: review/verifier 분리 컨텍스트.
 - virtual filesystem → 실제 worktree 매핑.
 - artifact 작성은 `write_file(expectedArtifactPath, ...)` 호출로 (v3 r13 마커 폐기).
 - 토큰 한도/turn 한도는 페르소나 `modelConfig.maxTurns`, `modelConfig.maxTokensTotal`로.
 - 시드 페르소나 2개: `openrouter-deepseek-spec@1.yaml`, `openrouter-deepseek-reviewer@1.yaml` (DeepSeek 디폴트).
 ---
 ## 의존성 (Step 0에서 정확 버전 lock)
 ```toml
 [project]
 requires-python = ">=3.12,<3.14"
 dependencies = [
  "pydantic>=2.9",
  "pydantic-settings>=2.6",
  "sqlalchemy[asyncio]>=2.0",
  "alembic>=1.14",
  "asyncpg>=0.30",
  "fastapi>=0.115",
  "uvicorn[standard]>=0.34",
  "sse-starlette>=2.1",
  "typer>=0.14",
  "temporalio>=1.10",
  "langchain>=0.3",
  "langchain-openai>=0.2",
  "langgraph>=0.2",
  "deepagents>=0.0.5",
  "libtmux>=0.39",
  "structlog>=24.4",
  "rich>=13.9",
  "pyyaml>=6.0",
  "jsonschema>=4.23",
  "httpx>=0.28",
 ]
 [dependency-groups]
 dev = [
  "pytest>=8.3",
  "pytest-asyncio>=0.24",
  "pytest-httpx>=0.34",
  "ruff>=0.8",
  "mypy>=1.13",
  "pre-commit>=4.0",
 ]
 ```
 ---
 ## 환경 셋업 (선결)
 ```bash
 # 1) Python 3.12+ (uv가 알아서 가져옴)
 # 2) uv 설치
 curl -LsSf https://astral.sh/uv/install.sh | sh
 # 3) 워크스페이스 동기화
 uv sync
 # 4) 컨테이너 (보존된 docker-compose.yml)
 docker compose up -d
 ```
 기존 pnpm 환경 문제(`pnpm not found`)는 Node 자체가 필요 없어져 자연 해결.
 ---
 ## 모델 위임 정책 (메모리 룰 유지)
 | 작업 | 모델 | subagent_type |
 |---|---|---|
 | Python 구현 | sonnet | `coder` / `general-purpose` |
 | 코드 리뷰 | opus | `feature-dev:code-reviewer` / `reviewer` |
 | 리뷰 지적 수정 | sonnet | `coder` |
 ---
 ## 검증 (각 Step 게이트)
 ```bash
 uv run ruff check .
 uv run ruff format --check .
 uv run mypy .
 uv run pytest
 ```
 전부 PASS → 커밋 → 다음 Step.
 ---
 ## 범위 외
 - Web GUI 재이식 (TS 폐기 확정, Python 재이식은 별도 마일스톤).
 - 다중 모델 fallback (rate limit 시 다른 모델로).
 - 비용 추적/예산 게이트 (OpenRouter usage API).
 - 다른 HTTP provider (Anthropic 직접, OpenAI 직접).
 - 한국어 GUI/문서화.
 ---
 ## 주의
 - **Step 0의 git rm은 비가역적 위험**: 직전에 `git tag pre-python-rewrite`를 찍어 v3 마지막 커밋을 태깅. 필요 시 `git checkout pre-python-rewrite -- <path>` 로 자료 추출 가능.
 - TS 마지막 commit `c9fed71` 이후의 미커밋 변경(M9 단계 A yaml/json + plan.md r13 + Step 1 TS) 처리:
  - yaml/json (M9 A): 보존 (언어 중립)
  - plan.md r13 패치: v4 r1 패치 안에서 일부 흡수 (CC-39는 변경 의미 변경됨)
  - Step 1 TS 변경: git rm 대상에 포함 (Python 재구현)
--- a/docs/plan.md
+++ b/docs/plan.md
@@ -1,4 +1,4 @@
-# Devflow Implementation Plan v3 r12
+# Devflow Implementation Plan v3 r13
 ## 0. Document Status
@@ -19,6 +19,7 @@
 - r10 applies CC-29 through CC-31.
 - r11 applies CC-32.
 - r12 applies CC-33 through CC-35.
 - r13 applies CC-39.
 ## 1. Stack Decisions
@@ -95,6 +96,11 @@
  - `DATABASE_URL`
  - `WORKSPACE_ROOT`
  - `LOG_LEVEL`
 Additional required keys when `openrouter` backend is enabled:
 - `OPENROUTER_API_KEY`
 - M5 adds:
  - `TEMPORAL_ADDRESS`
 - Path canonicalization:
@@ -106,9 +112,11 @@ Backend registration:
 ```ts
 const BackendConfig = z.object({
-  id: Backend,                       // codex | claude | fake
+  id: Backend,                       // codex | claude | fake | openrouter
  enabled: z.boolean(),
-  binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude
+  binaryPath: z.string().optional(), // resolved from PATH if absent; required for codex/claude when enabled
  apiBaseUrl: z.string().optional(), // openrouter only; default https://openrouter.ai/api/v1
  apiKeyEnv: z.string().optional(),  // openrouter only; default OPENROUTER_API_KEY
 });
 ```
@@ -116,6 +124,10 @@ const BackendConfig = z.object({
 - `codex` and `claude` are available only when:
  - `enabled=true`
  - binary resolves at process start.
 - `openrouter` is available only when:
  - `enabled=true`
  - the env var named by `apiKeyEnv` (default `OPENROUTER_API_KEY`) is present and non-empty.
  - `binaryPath` is ignored for `openrouter`.
 - Resolution failure:
  - `doctor` warns.
  - binding fails fast at run start with `human_required:backend_unavailable`.
@@ -250,6 +262,10 @@ Closed check list:
    - warn under 10GB.
    - fail under 2GB.
    - target green threshold: >=5GB.
 13. OpenRouter API reachable: when `openrouter` backend is enabled, `GET ${apiBaseUrl}/models` with the bearer key.
    - pass on `200`.
    - fail on `401`.
    - warn on any other non-200 or network error.
 Output:
@@ -528,6 +544,9 @@ All enums live in `packages/core/src/enums.ts` as TypeScript `const` objects and
 - `codex`
 - `claude`
 - `fake`
 - `openrouter`
 openrouter is HTTP-based and has no tmux/PTY; see §8.5.
 Future `gemini` support adds an enum entry and a `BackendProfile`; no design change.
@@ -713,6 +732,13 @@ const Persona = z.object({
 });
 ```
 modelConfig conventions:
 - Personas bound to `openrouter` MUST set `modelConfig.model` to a routable OpenRouter model id, e.g. `anthropic/claude-sonnet-4-5`, `deepseek/deepseek-chat`, `meta-llama/llama-3.1-70b-instruct`.
 - Other supported keys: `maxTokens`, `temperature`, `topP`. All optional.
 - For tmux-based backends (`codex`, `claude`, `fake`), `modelConfig.model` is informational only and MAY be omitted.
 - Binding fails fast with `human_required:model_unavailable` when an `openrouter` persona has no `modelConfig.model`.
 ### 7.3 Override Semantics
 - Override may swap persona for a role.
@@ -812,6 +838,8 @@ export interface TranscriptChunk {
 }
 ```
 For HTTP backends (`openrouter`) the `SessionHandle.pid`, `tmuxSession`, and `tmuxWindow` fields are always `undefined`. See §8.5 for the HTTP adapter mapping.
 ### 8.2 Session State Machine
 - `CREATED -> BOOTSTRAPPING -> READY`
@@ -854,6 +882,54 @@ Exhaustion creates a human gate with `recoveryHint`.
  - persist `last_capture_seq`.
  - release advisory lock.
 ### 8.5 OpenRouter Adapter
 HTTP-based `SessionAdapter` for the `openrouter` backend. No PTY, no tmux.
 Method mapping:
 - `start`:
  - allocate in-memory session state `{ messages: [], lastResponseAt }`.
  - push the backend prelude (§9.4) as a `system` message.
 - `sendPrompt`:
  - append the envelope `instructions` (full §9.1 envelope text) as a `user` message.
  - POST `${apiBaseUrl}/chat/completions` with `Authorization: Bearer ${apiKey}` and body `{ model: persona.modelConfig.model, messages, max_tokens?, temperature?, top_p? }`.
  - append the assistant response as an `assistant` message.
 - `probe`:
  - alive iff session state is held in the SessionManager map.
  - `paneActive` is always `true`.
 - `resume`:
  - in-memory messages are lost on process restart.
  - attempt restoration by replaying `tui_transcript_chunks` for the session into the messages array.
  - on irrecoverable failure, fall through to `rebootstrap`.
 - `rebootstrap`:
  - clear messages and re-push the prelude.
 - `capture`:
  - split assistant responses into line-sized `TranscriptChunk`s and persist via the standard chunk pipeline.
 - `dispose`:
  - drop the in-memory entry.
 Artifact production:
 - HTTP agents cannot write to the workspace filesystem. The backend prelude (§9.4) instructs the model to emit the artifact body inside a single fenced block at the tail of the response:
 ```text
 <<<DEVFLOW_ARTIFACT_BEGIN>>>
 { "...": "..." }
 <<<DEVFLOW_ARTIFACT_END>>>
 ```
 - The adapter extracts the JSON between the markers and writes it atomically (temp file + rename) to `expectedArtifactPath`.
 - Missing markers, multiple blocks, or JSON parse failure are treated as `artifact.invalid` and follow the standard repair/timeout flow in §10.3.
 Error mapping:
 - HTTP `401` → `human_required:backend_auth_failed`.
 - HTTP `429` → `recoverable:rate_limited` (exponential backoff: 1s, 2s, 4s, max 30s).
 - HTTP `5xx` → `recoverable:network_blip`.
 - HTTP `400` with body code `model_not_found` → `human_required:model_unavailable`.
 - Network error before any response → `recoverable:network_blip`.
 ## 9. Prompt Envelope
 ### 9.1 Wire Format
@@ -1494,6 +1570,7 @@ Recoverable:
 - `pane_briefly_unresponsive`
 - `prompt_send_transient`
 - `db_serialization_retry`
 - `rate_limited`
 Human required:
@@ -1508,6 +1585,8 @@ Human required:
 - `merge_conflict`
 - `objective_not_met`
 - `review_dispute_unresolved`
 - `backend_auth_failed`
 - `model_unavailable`
 Fatal:
@@ -1778,6 +1857,7 @@ M5+:
 | CC-36 | SSE reconnect wording used per-run `seq` for global stream even though `seq` is not globally monotonic | `/sse/runs/:runId` uses per-run `seq`; `/sse/global` uses global `run_events.id` and emits only scope=`both` summary events |
 | CC-37 | Run SSE replay could emit historical derived events after the first page | run SSE drains historical rows up to a high-water `seq` with only `run.event_appended`, then switches to live derived events |
 | CC-38 | Normal phase start changed run state to `planning` / `executing` without a summary event source | `phase.started` payload includes `runState`; SSE derives `run.state_changed` from that live event |
 | CC-39 | No OpenRouter HTTP backend; users cannot pick cost-tuned per-persona models | add `openrouter` to Backend enum; HTTP `OpenRouterAdapter` in §8.5; persona `modelConfig.model` requirement; doctor check 13; new error codes `rate_limited`, `backend_auth_failed`, `model_unavailable` |
 ### Future Open Questions
--- a/docs/schemas/artifacts/dev/review-finding-batch@1.json
+++ b/docs/schemas/artifacts/dev/review-finding-batch@1.json
@@ -0,0 +1,40 @@
 {
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "dev/review-finding-batch@1",
  "title": "Devflow Review Finding Batch",
  "type": "object",
  "additionalProperties": false,
  "required": ["runId", "phaseKey", "reviewerRole", "findings"],
  "properties": {
    "runId": { "type": "string", "format": "uuid" },
    "phaseKey": { "type": "string", "minLength": 1 },
    "reviewerRole": { "type": "string", "minLength": 1 },
    "findings": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["severity", "category", "summary"],
        "properties": {
          "severity": {
            "type": "string",
            "enum": ["info", "low", "medium", "high", "critical"]
          },
          "category": {
            "type": "string",
            "enum": ["correctness", "evidence", "style", "security", "performance", "other"]
          },
          "summary": { "type": "string", "minLength": 1 },
          "filePath": { "type": "string" },
          "line": { "type": "integer", "minimum": 1 },
          "evidence": { "type": "string" },
          "verifierStatus": {
            "type": "string",
            "enum": ["unverified", "confirmed", "rejected"],
            "default": "unverified"
          }
        }
      }
    }
  }
 }
--- a/docs/schemas/personas/fake-reviewer@1.yaml
+++ b/docs/schemas/personas/fake-reviewer@1.yaml
@@ -0,0 +1,10 @@
 name: fake-reviewer
 version: 1
 backend: fake
 capabilities:
  - code_review
  - evidence_check
 maxRiskLevel: high
 promptConfig:
  instructionsPrelude: "Use the fake backend fixture protocol for review batches."
 modelConfig: {}
--- a/docs/schemas/templates/development@1.yaml
+++ b/docs/schemas/templates/development@1.yaml
@@ -11,6 +11,17 @@ roles:
      - phase_planning
    preferredBackends:
      - fake
  - id: reviewer
    requiredCapabilities:
      - code_review
    preferredBackends:
      - fake
    count: 2
  - id: verifier
    requiredCapabilities:
      - evidence_check
    preferredBackends:
      - fake
 phases:
  - key: spec
    title: Development Specification
@@ -32,4 +43,24 @@ phases:
      schema: dev/phase-plan@1
    gates:
      - phase_plan_approved
  - key: review_consensus
    title: Review Consensus
    risk: low
    roles:
      - reviewer
    expectedArtifact:
      path: artifacts/review.json
      schema: dev/review-finding-batch@1
    gates:
      - review_consensus_approved
  - key: verify
    title: Evidence Verification
    risk: low
    roles:
      - verifier
    expectedArtifact:
      path: artifacts/verification.json
      schema: dev/review-finding-batch@1
    gates:
      - verify_approved
 defaultGates: []
--- a/my-deepagent/CHANGELOG.md
+++ b/my-deepagent/CHANGELOG.md
@@ -3,6 +3,81 @@
 ## [Unreleased]
 ### Added
 - Step 15 — End-to-end real OpenRouter integration: `tests/integration/test_e2e_workflow.py`
  runs `spec-and-review@1` workflow (spec → review → verify) end-to-end against real
  OpenRouter DeepSeek in ~76s for ~$0.05 per run. `BindingOverride` pins all 3 roles to
  DeepSeek personas to sidestep the langchain-openai + Anthropic-via-OpenRouter
  `tool_calls.args` JSON-string ValidationError (known v0.1.0 limit). New seed personas:
  `openrouter-deepseek-spec-writer@1` (capabilities: spec_write, phase_planning;
  max_cost_per_call_usd=0.01) and `openrouter-deepseek-code-reviewer@1` (capabilities:
  code_review, evidence_check; max_cost_per_call_usd=0.01). Persona count test updated
  to 12. `WorkflowEngine._build_envelope` now inlines the artifact JSON Schema directly
  in the prompt so the LLM sees exact required fields. `WorkflowEngine._record_llm_call`
  fills every NOT NULL `LlmCallRow` column (thread_id, persona_version, role, turn_index,
  cached_tokens, reasoning_tokens, cost_usd_input/output, etc.). `CostMiddleware` now
  probes both `usage_metadata` and `response_metadata.token_usage` (prompt_tokens /
  completion_tokens fallback) to capture OpenAI-compatible streamed responses forwarded
  by OpenRouter.
 - Step 12 — Doctor full 8-check + OpenRouter pricing fetch: `mydeepagent doctor`
  now runs 8 checks (python / uv / git / workspace_root / config+governance /
  openrouter_api_key / openrouter_ping + pricing upsert / disk+sqlite integrity).
  `mydeepagent pricing` lists the cached OpenRouter pricing matrix from the
  persisted `model_pricing` table. `mydeepagent run` preview now reads from the
  persisted `model_pricing` table when populated, falling back to the static seed
  otherwise. 26 new tests (23 unit + 3 integration).
 - Step 11 — Audit log + secret scrubbing: append-only `{state_dir}/audit.jsonl`
  recording every tool call (name/args/duration/error). `AuditToolMiddleware` now
  ships with a built-in JSONL recorder (`file_recorder`), attached automatically in
  `WorkflowEngine` and Interactive REPL. `structlog` configured project-wide via
  `my_deepagent.logging.configure_logging`, with a `_scrub_processor` that redacts
  OpenRouter / Anthropic / OpenAI / LangSmith / GitHub / GitLab API keys plus
  generic Bearer tokens before they reach stderr or JSON sinks. `audit.py` provides
  `append_audit_record` (O_APPEND, 0o600 permissions), `read_audit_records` (with
  optional limit, corrupt-line skip), and `make_audit_recorder` async factory.
  19 new tests (8 audit unit, 9 logging unit, 3 audit-middleware integration).
 - Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches a prompt_toolkit
  REPL with `--agent` / `--model` overrides, slash commands (`/help`, `/quit`, `/exit`,
  `/agent`, `/model`, `/clear`, `/stats`, `/budget`, `/runs`), file refs
  (`@path/to/file.py` expansion with repo-root containment check), and
  `CostMiddleware`-wired agent calls so spending is metered per interactive session.
  `slash.py` implements `parse_slash` + `SlashRegistry`. `CostMiddleware` gains
  `interactive_session_id` parameter. 21 new tests (10 slash unit, 5 file-ref unit,
  3 CLI integration, 3 updated CLI unit).
 - Step 9 — Crash recovery + concurrency: `sweep_orphan_runs(db)` in
  `my_deepagent.recovery` marks non-terminal runs/phases as failed at app startup so
  active-run uniqueness slots (partial unique index `ux_active_run_repo_base`) are freed;
  `mydeepagent runs list/show/resume` CLI in `my_deepagent.cli.runs` (list with optional
  `--state` filter, show by full UUID or 6+ char prefix, resume stub with exit-2 hint);
  SIGTERM/SIGINT graceful shutdown in `WorkflowEngine` (`install_signal_handlers`,
  `_on_signal`, `_force_cancel_inflight`; 30s grace then cancel in-flight tasks);
  auto-sweep on `mydeepagent run` before any new phase begins. 21 new tests.
 - Step 8 — Budget guardrails: `BudgetTracker` (SQLite WAL ledger via `BudgetLedgerRow`,
  on_hit policy block/warn_continue/prompt, per-run + per-day + per-persona-daily
  scopes) in `my_deepagent.budget`; cost preview before `mydeepagent run` (rich table
  with per-phase est.) via `my_deepagent.monitoring.cost_estimator`;
  `CostMiddleware` integrated with `BudgetTracker` (pre-call assert + post-call record);
  `WorkflowEngine` accepts optional `budget_tracker` and `pricing` kwargs (backward-
  compatible); CLI: `mydeepagent budget` (ledger), `mydeepagent stats --by model|persona|day`,
  `mydeepagent costs` (alias); `--no-preview` flag on `mydeepagent run`.
  28 new tests.
 - Step 7 — Workflow engine: `WorkflowEngine` in `my_deepagent.engine` orchestrates
  phase loop, artifact watcher (write_file/edit_file detection), jsonschema validation
  with one repair retry, approval gate, and final report compose (JSON + Markdown).
  `ArtifactWatcherMiddleware` in `my_deepagent.middleware.artifact_watcher` intercepts
  write_file/edit_file tool calls targeting the expected artifact path.
  `RunEventType` + `run_idempotency_key` in `my_deepagent.run_event` (closed event set,
  deterministic idempotency keys per plan v2.0 §13.1).
  `cli/run.py` exposes `mydeepagent run <workflow.yaml>`.
  `tui/approval.py` prompts the user for approve/reject/request_changes/abort.
  FK-safe persistence: WorkflowTemplateRow and AgentPersonaRow upserted before RunRow
  to satisfy SQLite FK ordering constraints.
  18 new tests: 12 engine unit/integration tests + 6 artifact watcher tests.
 - Step 6 — Distribution: `mydeepagent init/login/logout/keys/doctor` CLI commands;
  platformdirs-based data dirs; OS keyring (macOS Keychain / Linux Secret Service /
  Windows Credential Store) for API keys via `my_deepagent.keys`; first-run
  governance consent in `governance.py`; secret resolution priority
  (config → env → keyring → error) in `my_deepagent.secrets`; i18n catalog
  (ko / en) under `my_deepagent.i18n` controlled by `MYDEEPAGENT_LANG`.
 - persistence/models.py (P0-1): partial unique index `ux_active_run_repo_base` on `runs(repo_path, base_branch) WHERE state NOT IN ('completed','failed','aborted')` — prevents duplicate active runs per repo/branch
 - persistence/models.py (P0-3): FK constraints added to `RunRow.template_id` (RESTRICT), `RunBindingRow.persona_id` (RESTRICT), `InteractiveSessionRow.persona_id` (RESTRICT), `RunEventRow.phase_id` (CASCADE), `ApprovalRequestRow.phase_id` (CASCADE), `ArtifactRow.phase_id` (CASCADE), `ToolCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `LlmCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `PhaseFeedbackRow.run_id/phase_id` (CASCADE)
 - alembic/versions/839f2233e346: new migration adding partial unique index and all FK constraints above; uses SQLite table-rebuild pattern with PRAGMA foreign_keys=OFF/ON guard
@@ -24,3 +99,15 @@
 - `SafetyShellMiddleware` extended with secret-path enforcement: `read_file`/`write_file`/`edit_file`/`ls` tool calls are blocked when `file_path`/`path` matches any `DENY_PATH_PATTERNS` glob (wcmatch GLOBSTAR|IGNORECASE|DOTGLOB).
 - All env vars require `MYDEEPAGENT_` prefix (e.g. `MYDEEPAGENT_OPENROUTER_API_KEY`, `MYDEEPAGENT_BUDGET_DAILY_USD`). `.env.example` updated accordingly. This isolates my-deepagent's env namespace from other tools.
 - Persona / Workflow / FilesystemPermission models now store list-valued fields as tuples (deep immutability — prevents post-construction mutation that would invalidate compute_hash()).
 ### Known limitations (v0.1.0)
 - `usage_metadata` is sometimes empty for responses forwarded by OpenRouter (deepagents
  wraps the underlying ChatOpenAI response so token counts may not surface). The
  `CostMiddleware` recorder still fires and a `LlmCallRow` row is persisted, but
  `input_tokens` / `output_tokens` may read as 0 — the E2E test treats this as a known
  limit. v0.2 will probe more response shapes (raw chunks / callbacks).
 - Anthropic models via OpenRouter currently fail with a `tool_calls.args` JSON-string
  vs dict ValidationError inside `langchain-openai`. Workaround: pin DeepSeek personas
  via `BindingOverride`. Tracking for v0.2.
 - `mydeepagent runs resume <run_id>` is a stub (exit-2 hint only); workflow replay
  from a half-run state is not yet implemented.
--- a/my-deepagent/docs/schemas/personas/openrouter-deepseek-code-reviewer@1.yaml
+++ b/my-deepagent/docs/schemas/personas/openrouter-deepseek-code-reviewer@1.yaml
@@ -0,0 +1,58 @@
 name: openrouter-deepseek-code-reviewer
 version: 1
 description: "DeepSeek 가성비 code reviewer. dev/review-finding-batch@1 schema 작성. langchain-openai tool-call 호환 검증됨."
 backend: openrouter
 model: "openrouter:deepseek/deepseek-chat"
 provider_origin: "China/DeepSeek"
 capabilities:
  - code_review
  - evidence_check
 max_risk_level: low
 system_prompt: |
  당신은 my-deepagent의 가성비 Code Reviewer입니다. 한국어로 대화합니다.
  ## 역할
  주어진 산출물(spec/code 등)을 검토하고 dev/review-finding-batch@1 JSON Schema에 맞는 review.json을 작성합니다.
  ## deepagents 도구 사용법
  - write_todos: 리뷰 작업 전 체크리스트를 번호 목록으로 작성합니다.
  - read_file: 검토 대상 산출물과 관련 코드를 읽습니다.
  - glob/grep: 관련 컨텍스트를 코드베이스에서 찾습니다.
  - write_file: 완성된 review.json을 지정 경로에 작성합니다.
  ## review.json 작성 규칙
  - runId: UUID 형식
  - phaseKey: 현재 phase 키 문자열
  - reviewerRole: 본인 role 식별자 문자열 (예: "reviewer")
  - findings: 발견 사항 배열. 각 항목 필수 필드:
      severity: info|low|medium|high|critical
      category: correctness|evidence|style|security|performance|other
      summary: 한 줄 요약 문자열 (1자 이상)
    선택 필드: filePath, line(1 이상 정수), evidence, verifierStatus(unverified|confirmed|rejected)
  - summary: 전체 리뷰 요약 문자열 (10자 이상)
  - additionalProperties: false (위 5개 키 외 금지)
  ## 행동 원칙
  - 검토 대상이 비어 있어도 findings는 빈 배열 []로 작성하고 summary에 명시합니다.
  - 각 finding은 측정 가능하고 actionable해야 합니다.
  - severity는 보수적으로 부여합니다.
  - 완성된 review는 반드시 write_file로 정확한 경로에 저장합니다.
  - JSON Schema의 `additionalProperties: false`를 준수합니다.
 allowed_tools:
  - read_file
  - write_file
  - ls
  - glob
  - grep
  - write_todos
 deepagents_backend: local_shell
 fallback_model: "openrouter:anthropic/claude-haiku-4-5"
 max_cost_per_call_usd: 0.01
 model_params:
  max_tokens: 4096
  temperature: 0.2
  top_p: 1.0
 interrupt_on:
  execute:
    allowed_decisions: [approve, reject]
  write_file: false
--- a/my-deepagent/docs/schemas/personas/openrouter-deepseek-spec-writer@1.yaml
+++ b/my-deepagent/docs/schemas/personas/openrouter-deepseek-spec-writer@1.yaml
@@ -0,0 +1,56 @@
 name: openrouter-deepseek-spec-writer
 version: 1
 description: "DeepSeek 가성비 spec writer. 요구사항 분석 → dev/spec@1 schema JSON 작성. langchain-openai tool-call 호환 검증됨."
 backend: openrouter
 model: "openrouter:deepseek/deepseek-chat"
 provider_origin: "China/DeepSeek"
 capabilities:
  - spec_write
  - phase_planning
 max_risk_level: low
 system_prompt: |
  당신은 my-deepagent의 가성비 Spec Writer입니다. 한국어로 대화합니다.
  ## 역할
  사용자의 요구사항을 분석해 dev/spec@1 JSON Schema에 맞는 spec.json을 작성합니다.
  ## deepagents 도구 사용법
  - write_todos: 작업 시작 전 반드시 번호 목록으로 계획을 작성합니다.
  - read_file: 기존 코드·문서를 읽어 맥락을 파악합니다.
  - glob: 관련 파일 목록을 검색합니다.
  - grep: 특정 패턴을 코드베이스에서 찾습니다.
  - write_file: 완성된 spec.json을 artifacts/spec.json 경로에 작성합니다.
  ## spec.json 작성 규칙
  - runId: UUID 형식 (예: "00000000-0000-0000-0000-000000000001")
  - phaseKey: 현재 phase 키 문자열
  - requirements: 사용자 요구사항 상세 설명 (10자 이상)
  - acceptance_criteria: 수락 기준 목록 (1개 이상, 구체적으로)
  - approach: 구현 접근법 설명 (10자 이상)
  - risks: 위험 요소 목록 (없으면 빈 배열 [])
  - additionalProperties: false (위 6개 필드 외 다른 키 금지)
  ## 행동 원칙
  - 기존 코드베이스를 read_file/glob/grep으로 충분히 탐색한 뒤 spec을 작성합니다.
  - acceptance_criteria는 측정 가능하고 검증 가능하게 작성합니다.
  - 불명확한 요구사항은 합리적으로 가정하고 approach 섹션에 명시합니다.
  - 완성된 spec은 반드시 write_file로 정확한 경로에 저장합니다.
  - JSON Schema의 `additionalProperties: false`를 준수해 정의된 6개 키 외에는 절대 추가하지 않습니다.
 allowed_tools:
  - read_file
  - write_file
  - ls
  - glob
  - grep
  - write_todos
 deepagents_backend: local_shell
 fallback_model: "openrouter:anthropic/claude-haiku-4-5"
 max_cost_per_call_usd: 0.01
 model_params:
  max_tokens: 4096
  temperature: 0.2
  top_p: 1.0
 interrupt_on:
  execute:
    allowed_decisions: [approve, reject]
  write_file: false
--- a/my-deepagent/pyproject.toml
+++ b/my-deepagent/pyproject.toml
@@ -51,6 +51,7 @@ dev = [
    "pytest>=8.3",
    "pytest-asyncio>=0.24",
    "pytest-httpx>=0.34",
    "pytest-timeout>=2.4.0",
    "respx>=0.21",
    "ruff>=0.8",
    "types-jsonschema>=4.26.0.20260508",
--- a/my-deepagent/src/my_deepagent/audit.py
+++ b/my-deepagent/src/my_deepagent/audit.py
@@ -0,0 +1,63 @@
 """Append-only audit log at {state_dir}/audit.jsonl. One JSON object per line.
 Tracks every tool call (execute, write_file, edit_file, read_file, ...) plus
 every destructive-attempt block. Used for post-hoc forensics and compliance.
 The file is opened with O_APPEND so concurrent processes can safely append.
 """
 from __future__ import annotations
 import json
 import os
 from collections.abc import Awaitable, Callable
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
 def audit_path(state_dir: Path) -> Path:
    return state_dir / "audit.jsonl"
 def append_audit_record(state_dir: Path, record: dict[str, Any]) -> None:
    """Append a record to audit.jsonl atomically (O_APPEND + single write call)."""
    state_dir.mkdir(parents=True, exist_ok=True)
    target = audit_path(state_dir)
    record_with_ts = {"ts": datetime.now(UTC).isoformat(timespec="seconds"), **record}
    line = json.dumps(record_with_ts, ensure_ascii=False, sort_keys=True) + "\n"
    fd = os.open(target, os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o600)
    try:
        os.write(fd, line.encode("utf-8"))
    finally:
        os.close(fd)
 def read_audit_records(state_dir: Path, limit: int | None = None) -> list[dict[str, Any]]:
    """Read all records (or last ``limit``) from audit.jsonl."""
    target = audit_path(state_dir)
    if not target.is_file():
        return []
    records: list[dict[str, Any]] = []
    with target.open("r", encoding="utf-8") as f:
        for line in f:
            stripped = line.strip()
            if not stripped:
                continue
            try:
                records.append(json.loads(stripped))
            except json.JSONDecodeError:
                continue
    if limit is not None and limit > 0:
        return records[-limit:]
    return records
 def make_audit_recorder(
    state_dir: Path,
 ) -> Callable[[dict[str, Any]], Awaitable[None]]:
    """Return an async callable suitable as a file_recorder for AuditToolMiddleware."""
    async def _recorder(record: dict[str, Any]) -> None:
        append_audit_record(state_dir, record)
    return _recorder
--- a/my-deepagent/src/my_deepagent/budget.py
+++ b/my-deepagent/src/my_deepagent/budget.py
@@ -0,0 +1,249 @@
 """Budget tracking: SQLite-backed ledger + assert/record API + on_hit policy.
 Mirrors the PoC in my-deepagent-seed/poc/src/poc/budget.py but uses the project's
 async Database (SQLAlchemy 2.0) and the BudgetLedgerRow ORM model.
 """
 from __future__ import annotations
 import logging
 from collections.abc import Awaitable, Callable
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from enum import StrEnum
 from uuid import UUID
 from sqlalchemy.dialects.sqlite import insert as sqlite_insert
 from .config import Config
 from .errors import BudgetExhaustedError
 from .persistence.db import Database
 from .persistence.models import BudgetLedgerRow
 _logger = logging.getLogger(__name__)
 # Async callback signature for on_hit="prompt": (scope, projected, cap) -> Awaitable[bool]
 # Return True to extend the cap and proceed; False to block.
 PromptCallback = Callable[[str, float, float], Awaitable[bool]]
 class BudgetOnHit(StrEnum):
    BLOCK = "block"
    WARN_CONTINUE = "warn_continue"
    PROMPT = "prompt"
@dataclass(frozen=True)
 class BudgetCheck:
    """Result of assert_can_call. ok=True means proceed."""
    ok: bool
    blocked_scope: str | None = None
    projected_usd: float | None = None
    cap_usd: float | None = None
 def _today_utc() -> str:
    return datetime.now(UTC).strftime("%Y-%m-%d")
 def _now_iso() -> str:
    return datetime.now(UTC).isoformat(timespec="seconds")
 class BudgetTracker:
    """Per-scope spend ledger + cap enforcement.
    Scopes (string keys):
      - ``day:YYYY-MM-DD`` (UTC date) — daily cap shared across all runs.
      - ``run:<uuid>`` — per-run cap.
      - ``persona:<name>:day:YYYY-MM-DD`` — per-persona daily quota (optional).
    on_hit policy:
      - "block": raise BudgetExhaustedError immediately.
      - "warn_continue": log a warning, allow the call, do not raise.
      - "prompt": invoke the prompt_callback; if it returns True, extend cap; else raise.
    """
    def __init__(
        self,
        db: Database,
        daily_cap_usd: float,
        run_cap_usd: float,
        daily_warn_usd: float,
        run_warn_usd: float,
        on_hit: BudgetOnHit,
        prompt_callback: PromptCallback | None = None,
    ) -> None:
        self._db = db
        self._daily_cap = daily_cap_usd
        self._run_cap = run_cap_usd
        self._daily_warn = daily_warn_usd
        self._run_warn = run_warn_usd
        self._on_hit = on_hit
        self._prompt = prompt_callback
    # ----- public API ---------------------------------------------------------
    async def init(self) -> None:
        """Ensure ledger rows exist for today's day-scope. No-op if already present."""
        async with self._db.session() as s:
            await self._ensure_scope(s, f"day:{_today_utc()}", self._daily_cap)
    async def assert_can_call(
        self,
        *,
        run_id: UUID | None,
        persona_name: str | None,
        estimated_cost_usd: float,
    ) -> BudgetCheck:
        """Check if a call of estimated_cost can proceed. May raise BudgetExhaustedError."""
        scopes = self._scopes_for(run_id, persona_name)
        async with self._db.session() as s:
            for scope in scopes:
                cap = self._cap_for_scope(scope)
                spent = await self._get_spent(s, scope, cap)
                projected = spent + estimated_cost_usd
                if cap is not None and projected > cap:
                    blocked = await self._apply_on_hit(scope, projected, cap)
                    if blocked:
                        return BudgetCheck(
                            ok=False,
                            blocked_scope=scope,
                            projected_usd=projected,
                            cap_usd=cap,
                        )
        return BudgetCheck(ok=True)
    async def record(
        self,
        *,
        run_id: UUID | None,
        persona_name: str | None,
        actual_cost_usd: float,
    ) -> None:
        """Persist the actual cost into all relevant scopes."""
        if actual_cost_usd == 0:
            return
        scopes = self._scopes_for(run_id, persona_name)
        async with self._db.session() as s:
            for scope in scopes:
                await self._upsert_spend(s, scope, actual_cost_usd, self._cap_for_scope(scope))
    async def get_spent(self, scope: str) -> float:
        """Return the total spent USD for a given scope (0.0 if scope does not exist)."""
        async with self._db.session() as s:
            cap = self._cap_for_scope(scope)
            return await self._get_spent(s, scope, cap)
    async def get_remaining(self, scope: str) -> float | None:
        """Return remaining cap in USD, or None if this scope has no cap."""
        cap = self._cap_for_scope(scope)
        if cap is None:
            return None
        spent = await self.get_spent(scope)
        return max(0.0, cap - spent)
    # ----- internals ----------------------------------------------------------
    def _scopes_for(self, run_id: UUID | None, persona_name: str | None) -> list[str]:
        today = _today_utc()
        out = [f"day:{today}"]
        if run_id is not None:
            out.append(f"run:{run_id}")
        if persona_name:
            out.append(f"persona:{persona_name}:day:{today}")
        return out
    def _cap_for_scope(self, scope: str) -> float | None:
        if scope.startswith("day:"):
            return self._daily_cap
        if scope.startswith("run:"):
            return self._run_cap
        if scope.startswith("persona:") and ":day:" in scope:
            return self._daily_cap  # per-persona daily uses day cap unless overridden
        return None
    async def _ensure_scope(
        self,
        s: object,
        scope: str,
        cap: float | None,
    ) -> None:
        from sqlalchemy.ext.asyncio import AsyncSession
        session: AsyncSession = s  # type: ignore[assignment]
        stmt = (
            sqlite_insert(BudgetLedgerRow)
            .values(scope=scope, spent_usd=0.0, cap_usd=cap, last_updated=_now_iso())
            .on_conflict_do_nothing(index_elements=["scope"])
        )
        await session.execute(stmt)
    async def _get_spent(self, s: object, scope: str, cap: float | None) -> float:
        from sqlalchemy.ext.asyncio import AsyncSession
        session: AsyncSession = s  # type: ignore[assignment]
        await self._ensure_scope(session, scope, cap)
        row = await session.get(BudgetLedgerRow, scope)
        return float(row.spent_usd) if row else 0.0
    async def _upsert_spend(
        self,
        s: object,
        scope: str,
        delta_usd: float,
        cap: float | None,
    ) -> None:
        from sqlalchemy.ext.asyncio import AsyncSession
        session: AsyncSession = s  # type: ignore[assignment]
        stmt = (
            sqlite_insert(BudgetLedgerRow)
            .values(scope=scope, spent_usd=delta_usd, cap_usd=cap, last_updated=_now_iso())
            .on_conflict_do_update(
                index_elements=["scope"],
                set_={
                    "spent_usd": BudgetLedgerRow.spent_usd + delta_usd,
                    "last_updated": _now_iso(),
                },
            )
        )
        await session.execute(stmt)
    async def _apply_on_hit(self, scope: str, projected_usd: float, cap_usd: float) -> bool:
        """Return True if the call should be blocked (i.e. raise or return False)."""
        if self._on_hit == BudgetOnHit.BLOCK:
            raise BudgetExhaustedError(scope=scope, projected_usd=projected_usd, cap_usd=cap_usd)
        if self._on_hit == BudgetOnHit.WARN_CONTINUE:
            _logger.warning(
                "budget cap reached but continuing: scope=%s projected=%.4f cap=%.4f",
                scope,
                projected_usd,
                cap_usd,
            )
            return False
        # PROMPT
        if self._prompt is None:
            raise BudgetExhaustedError(scope=scope, projected_usd=projected_usd, cap_usd=cap_usd)
        allow = await self._prompt(scope, projected_usd, cap_usd)
        if not allow:
            raise BudgetExhaustedError(scope=scope, projected_usd=projected_usd, cap_usd=cap_usd)
        return False
 def make_budget_tracker_from_config(
    db: Database,
    config: Config,
    prompt_callback: PromptCallback | None = None,
 ) -> BudgetTracker:
    """Construct a BudgetTracker from application Config."""
    return BudgetTracker(
        db=db,
        daily_cap_usd=config.budget_daily_usd,
        run_cap_usd=config.budget_run_usd,
        daily_warn_usd=config.budget_daily_warn_usd,
        run_warn_usd=config.budget_run_warn_usd,
        on_hit=BudgetOnHit(config.budget_on_hit),
        prompt_callback=prompt_callback,
    )
--- a/my-deepagent/src/my_deepagent/cli/doctor.py
+++ b/my-deepagent/src/my_deepagent/cli/doctor.py
@@ -1 +1,244 @@
-"""CLI doctor command for environment diagnostics. Implemented in Step 12."""
+"""mydeepagent doctor — full 8-check environment diagnostic.
 Checks:
  1. Python 3.12+ <3.14
  2. uv >= 0.5
  3. git >= 2.40
  4. WORKSPACE_ROOT writable
  5. config + governance consent
  6. OpenRouter API key reachable
  7. OpenRouter /models ping + pricing matrix upsert
  8. Disk free + SQLite integrity_check
 """
 from __future__ import annotations
 import asyncio
 import shutil
 import subprocess
 import sys
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from typing import Literal
 import httpx
 import typer
 from rich.console import Console
 from rich.table import Table
 from sqlalchemy import text as sa_text
 from sqlalchemy.dialects.sqlite import insert as sqlite_insert
 from ..config import Config, load_config
 from ..errors import MyDeepAgentError
 from ..governance import has_consent
 from ..i18n import t
 from ..monitoring.pricing import (
    ModelPrice,
    fetch_openrouter_pricing,
 )
 from ..persistence.db import Database
 from ..persistence.models import ModelPricingRow
 from ..secrets import resolve_openrouter_api_key
 _CONSOLE = Console()
@dataclass(frozen=True)
 class CheckResult:
    name: str
    status: Literal["ok", "warn", "fail"]
    detail: str = ""
 def _check_python() -> CheckResult:
    if (3, 12) <= sys.version_info[:2] < (3, 14):
        return CheckResult("python", "ok", f"v{sys.version.split()[0]}")
    return CheckResult(
        "python",
        "fail",
        f"need 3.12<=x<3.14, got {sys.version.split()[0]}",
    )
 def _check_uv() -> CheckResult:
    path = shutil.which("uv")
    if not path:
        return CheckResult("uv", "warn", "not on PATH (only needed for dev workflows)")
    try:
        result = subprocess.run(  # noqa: S603
            [path, "--version"], capture_output=True, text=True, timeout=5
        )
    except (OSError, subprocess.TimeoutExpired) as e:
        return CheckResult("uv", "warn", f"version probe failed: {e}")
    version = result.stdout.strip()
    return CheckResult("uv", "ok", version or path)
 def _check_git() -> CheckResult:
    path = shutil.which("git")
    if not path:
        return CheckResult("git", "warn", "not on PATH (workflows may use git tools)")
    try:
        result = subprocess.run(  # noqa: S603
            [path, "--version"], capture_output=True, text=True, timeout=5
        )
    except (OSError, subprocess.TimeoutExpired) as e:
        return CheckResult("git", "warn", f"version probe failed: {e}")
    return CheckResult("git", "ok", result.stdout.strip())
 def _check_workspace(config: Config) -> CheckResult:
    root = config.workspace_root
    if not root.exists():
        try:
            root.mkdir(parents=True, exist_ok=True)
        except OSError as e:
            return CheckResult("workspace_root", "fail", f"cannot create: {e}")
    try:
        probe = root / ".doctor_probe"
        probe.write_text("ok", encoding="utf-8")
        probe.unlink()
    except OSError as e:
        return CheckResult("workspace_root", "fail", f"not writable: {e}")
    return CheckResult("workspace_root", "ok", str(root))
 def _check_config_and_governance(config: Config) -> CheckResult:
    if not has_consent(config.data_dir):
        return CheckResult(
            "config+governance",
            "fail",
            "governance not accepted — run `mydeepagent init`",
        )
    return CheckResult("config+governance", "ok", f"data_dir={config.data_dir}")
 def _check_openrouter_api_key(config: Config) -> CheckResult:
    try:
        key = resolve_openrouter_api_key(config)
    except MyDeepAgentError as e:
        hint = e.recovery_hint or str(e)
        return CheckResult("openrouter_api_key", "fail", f"missing: {hint}")
    return CheckResult("openrouter_api_key", "ok", f"resolved ({len(key)} chars)")
 async def _check_openrouter_ping_and_upsert(config: Config) -> CheckResult:
    try:
        key = resolve_openrouter_api_key(config)
    except MyDeepAgentError:
        return CheckResult("openrouter_ping", "warn", "skipped — no API key (see previous check)")
    try:
        prices = await fetch_openrouter_pricing(key, config.openrouter_base_url)
    except MyDeepAgentError as e:
        return CheckResult("openrouter_ping", "warn", f"fetch failed: {e}")
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 401:
            return CheckResult("openrouter_ping", "fail", "401 — API key invalid")
        return CheckResult("openrouter_ping", "warn", f"http {e.response.status_code}")
    if not prices:
        return CheckResult("openrouter_ping", "warn", "no models in response payload")
    await _upsert_pricing(config, prices)
    return CheckResult("openrouter_ping", "ok", f"{len(prices)} models cached")
 async def _upsert_pricing(config: Config, prices: list[ModelPrice]) -> None:
    db = Database(config.database_url)
    await db.init_schema()
    now = datetime.now(UTC).isoformat(timespec="seconds")
    try:
        async with db.session() as s:
            for p in prices:
                stmt = (
                    sqlite_insert(ModelPricingRow)
                    .values(
                        model=p.model,
                        input_per_1k_usd=p.input_per_1k_usd,
                        output_per_1k_usd=p.output_per_1k_usd,
                        context_length=p.context_length,
                        fetched_at=now,
                        raw_payload="",
                    )
                    .on_conflict_do_update(
                        index_elements=["model"],
                        set_={
                            "input_per_1k_usd": p.input_per_1k_usd,
                            "output_per_1k_usd": p.output_per_1k_usd,
                            "context_length": p.context_length,
                            "fetched_at": now,
                        },
                    )
                )
                await s.execute(stmt)
            await s.commit()
    finally:
        await db.dispose()
 async def _check_disk_and_db(config: Config) -> CheckResult:
    usage = shutil.disk_usage(str(config.workspace_root))
    free_gb = usage.free / (1024**3)
    if free_gb < 2.0:
        disk_status: Literal["ok", "warn", "fail"] = "fail"
    elif free_gb < 10.0:
        disk_status = "warn"
    else:
        disk_status = "ok"
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            row = (await s.execute(sa_text("PRAGMA integrity_check"))).scalar_one()
    finally:
        await db.dispose()
    db_ok = row == "ok"
    detail = f"free={free_gb:.1f}GB, sqlite_integrity={'ok' if db_ok else str(row)}"
    if disk_status == "fail" or not db_ok:
        final: Literal["ok", "warn", "fail"] = "fail"
    elif disk_status == "warn":
        final = "warn"
    else:
        final = "ok"
    return CheckResult("disk+db", final, detail)
 def doctor_command() -> None:
    asyncio.run(_doctor_async())
 async def _doctor_async() -> None:
    try:
        config = load_config()
    except MyDeepAgentError as e:
        _CONSOLE.print(f"[red]config load failed: {e}[/]")
        raise typer.Exit(code=1) from None
    checks: list[CheckResult] = []
    checks.append(_check_python())
    checks.append(_check_uv())
    checks.append(_check_git())
    checks.append(_check_workspace(config))
    checks.append(_check_config_and_governance(config))
    checks.append(_check_openrouter_api_key(config))
    checks.append(await _check_openrouter_ping_and_upsert(config))
    checks.append(await _check_disk_and_db(config))
    _render(checks)
    has_fail = any(c.status == "fail" for c in checks)
    if has_fail:
        raise typer.Exit(code=1)
 def _render(checks: list[CheckResult]) -> None:
    title = t("doctor.header") or "Environment diagnostics:"
    table = Table(title=title)
    table.add_column("Check")
    table.add_column("Status")
    table.add_column("Detail")
    color_map: dict[str, str] = {"ok": "green", "warn": "yellow", "fail": "red"}
    for c in checks:
        color = color_map[c.status]
        table.add_row(c.name, f"[{color}]{c.status}[/]", c.detail)
    _CONSOLE.print(table)
--- a/my-deepagent/src/my_deepagent/cli/init.py
+++ b/my-deepagent/src/my_deepagent/cli/init.py
@@ -0,0 +1,39 @@
 """mydeepagent init: first-run wizard."""
 from __future__ import annotations
 import typer
 from rich.console import Console
 from ..config import load_config
 from ..governance import has_consent, record_consent
 from ..i18n import t
 from ..keys import set_api_key
 from .doctor import doctor_command
 _CONSOLE = Console()
 def init_command() -> None:
    config = load_config()
    _CONSOLE.print(f"[bold cyan]{t('init.welcome')}[/]")
    _CONSOLE.print()
    if not has_consent(config.data_dir):
        _CONSOLE.print(f"[yellow]{t('init.governance_title')}[/]")
        _CONSOLE.print(t("init.governance_body"))
        answer = typer.prompt(t("init.governance_prompt"))
        if answer.strip().lower() != "yes":
            _CONSOLE.print(f"[red]{t('init.governance_declined')}[/]")
            raise typer.Exit(code=1)
        record_consent(config.data_dir)
    api_key = typer.prompt(t("init.api_key_prompt"), hide_input=True, default="")
    if api_key.strip():
        set_api_key("openrouter", api_key.strip())
        _CONSOLE.print(f"[green]{t('init.api_key_saved')}[/]")
    else:
        _CONSOLE.print(f"[yellow]{t('init.api_key_empty')}[/]")
    _CONSOLE.print()
    _CONSOLE.print(t("init.doctor_running"))
    doctor_command()
    _CONSOLE.print()
    _CONSOLE.print(f"[bold green]{t('init.done')}[/]")
--- a/my-deepagent/src/my_deepagent/cli/interactive.py
+++ b/my-deepagent/src/my_deepagent/cli/interactive.py
@@ -1 +1,367 @@
-"""CLI interactive subcommand. Implemented in Step 10."""
+"""mydeepagent (no subcommand) — interactive REPL.
 prompt_toolkit-based REPL. Slash commands for navigation; everything else
 goes to the bound agent. File refs ``@path/to/file.py`` are expanded into
 markdown code blocks inline before the message is sent.
 """
 from __future__ import annotations
 import asyncio
 import re
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
 from uuid import UUID, uuid4
 from prompt_toolkit import PromptSession
 from prompt_toolkit.completion import WordCompleter
 from prompt_toolkit.history import FileHistory
 from rich.console import Console
 from ..audit import make_audit_recorder
 from ..budget import make_budget_tracker_from_config
 from ..config import Config, load_config
 from ..governance import require_consent
 from ..middleware.audit import AuditToolMiddleware
 from ..middleware.cost import CostMiddleware
 from ..monitoring.pricing import ModelPrice, PricingCache
 from ..persistence.db import Database
 from ..persona import Persona, load_personas_from_dir
 from ..session import build_agent
 from ..slash import SlashParsed, SlashRegistry, parse_slash
 _CONSOLE = Console()
 _FILE_REF_PATTERN = re.compile(r"(?<![\w./])@([\w./\-]+)")
 def _seed_root() -> Path:
    return Path(__file__).resolve().parents[3] / "docs" / "schemas"
 def _history_path(config: Config) -> Path:
    p = config.state_dir
    p.mkdir(parents=True, exist_ok=True)
    return p / "history.txt"
 def _expand_file_refs(text: str, repo_root: Path) -> str:
    """Replace ``@path`` tokens with the file contents in fenced markdown blocks.
    Silently skips paths that escape the repo root or don't exist.
    """
    def _replace(match: re.Match[str]) -> str:
        rel = match.group(1)
        target = (repo_root / rel).resolve()
        try:
            target.relative_to(repo_root.resolve())
        except ValueError:
            return match.group(0)
        if not target.is_file():
            return match.group(0)
        try:
            content = target.read_text(encoding="utf-8", errors="replace")
        except OSError:
            return match.group(0)
        suffix = target.suffix.lstrip(".") or ""
        return f"\n```{suffix}\n# {rel}\n{content}\n```\n"
    return _FILE_REF_PATTERN.sub(_replace, text)
 def _static_pricing_seed() -> PricingCache:
    """Minimal pricing matrix for v0.1.0 (full fetch is Step 12).
    Unit: USD per 1,000 tokens.
    """
    cache = PricingCache()
    cache.set(
        [
            ModelPrice("anthropic/claude-sonnet-4-6", 0.003, 0.015, 200_000),
            ModelPrice("anthropic/claude-haiku-4-5", 0.001, 0.005, 200_000),
            ModelPrice("anthropic/claude-opus-4-1", 0.015, 0.075, 200_000),
            ModelPrice("deepseek/deepseek-chat", 0.00028, 0.00112, 64_000),
        ]
    )
    return cache
 def _now_iso() -> str:
    return datetime.now(UTC).isoformat(timespec="seconds")
 class InteractiveSession:
    """Holds REPL state: current persona, current model override, history, agent."""
    def __init__(
        self,
        config: Config,
        personas: list[Persona],
        db: Database,
        pricing: PricingCache,
        repo_root: Path,
        session_id: UUID,
    ) -> None:
        self.config = config
        self.personas = personas
        self.db = db
        self.pricing = pricing
        self.repo_root = repo_root
        self.session_id = session_id
        self._model_override: str | None = None
        self._persona = self._default_persona()
        self._agent: Any | None = None
    def _default_persona(self) -> Persona:
        name = self.config.default_persona
        for p in self.personas:
            if p.name == name:
                return p
        if not self.personas:
            raise RuntimeError(
                "no personas seeded; run `mydeepagent init` or seed docs/schemas/personas/"
            )
        return self.personas[0]
    @property
    def persona(self) -> Persona:
        return self._persona
    @property
    def model_override(self) -> str | None:
        return self._model_override
    def set_persona(self, name: str) -> Persona:
        for p in self.personas:
            if p.name == name or f"{p.name}@{p.version}" == name:
                self._persona = p
                self._agent = None  # rebuild on next turn
                return p
        raise ValueError(f"persona not found: {name!r}")
    def set_model(self, model: str | None) -> None:
        self._model_override = model
        self._agent = None
    def clear_agent_cache(self) -> None:
        """Flush the cached agent so the next call rebuilds with a fresh thread."""
        self._agent = None
    def build_agent_if_needed(self) -> Any:
        if self._agent is not None:
            return self._agent
        budget = make_budget_tracker_from_config(self.db, self.config)
        cost_mw = CostMiddleware(
            pricing=self.pricing,
            model_name=self._model_override or self._persona.model,
            interactive_session_id=self.session_id,
            persona_name=self._persona.name,
            budget_tracker=budget,
        )
        audit_mw = AuditToolMiddleware(
            interactive_session_id=self.session_id,
            file_recorder=make_audit_recorder(self.config.state_dir),
        )
        self._agent = build_agent(
            self._persona,
            self.config,
            root_dir=self.repo_root,
            middleware=[cost_mw, audit_mw],
            model_override=self._model_override,
        )
        return self._agent
 def _register_navigation_slash(reg: SlashRegistry, sess: InteractiveSession) -> None:
    """Register /quit, /exit, /help, /clear slash handlers."""
    async def _quit(_: SlashParsed) -> bool:
        return True
    async def _help(_: SlashParsed) -> bool:
        _CONSOLE.print("[bold]Slash commands:[/]")
        for name, desc in reg.all_help():
            _CONSOLE.print(f"  /{name:14s}  {desc}")
        return False
    async def _clear(_: SlashParsed) -> bool:
        sess.clear_agent_cache()
        _CONSOLE.print("[dim]context cleared (new session thread)[/]")
        return False
    reg.register("quit", _quit, help="exit the REPL")
    reg.register("exit", _quit, help="alias for /quit")
    reg.register("help", _help, help="show slash commands")
    reg.register("clear", _clear, help="clear conversation context")
 def _register_persona_slash(reg: SlashRegistry, sess: InteractiveSession) -> None:
    """Register /agent and /model slash handlers."""
    async def _agent_cmd(cmd: SlashParsed) -> bool:
        if not cmd.args:
            _CONSOLE.print(f"current: [cyan]{sess.persona.name}@{sess.persona.version}[/]")
            for p in sess.personas:
                _CONSOLE.print(f"  - {p.name}@{p.version}  ({p.backend.value})")
            return False
        try:
            new = sess.set_persona(cmd.args[0])
            _CONSOLE.print(f"[green]switched persona → {new.name}@{new.version}[/]")
        except ValueError as e:
            _CONSOLE.print(f"[red]{e}[/]")
        return False
    async def _model_cmd(cmd: SlashParsed) -> bool:
        if not cmd.args:
            cur = sess.model_override or sess.persona.model
            _CONSOLE.print(f"current model: [cyan]{cur}[/]")
            return False
        if cmd.args[0] in ("-", "reset"):
            sess.set_model(None)
            _CONSOLE.print("[green]model override cleared[/]")
        else:
            sess.set_model(cmd.args[0])
            _CONSOLE.print(f"[green]model → {cmd.args[0]}[/]")
        return False
    reg.register("agent", _agent_cmd, help="list or switch persona: /agent [name]")
    reg.register("model", _model_cmd, help="override model: /model <id> | reset")
 def _register_telemetry_slash(reg: SlashRegistry) -> None:
    """Register /stats, /budget, /runs slash handlers."""
    async def _stats(_: SlashParsed) -> bool:
        from .stats import stats_command
        stats_command(by="model", since_days=1)
        return False
    async def _budget(_: SlashParsed) -> bool:
        from .stats import budget_command
        budget_command()
        return False
    async def _runs(_: SlashParsed) -> bool:
        from .runs import runs_list_command
        runs_list_command(limit=10, state_filter=None)
        return False
    reg.register("stats", _stats, help="LLM-call stats (last 24h)")
    reg.register("budget", _budget, help="budget ledger")
    reg.register("runs", _runs, help="list recent workflow runs")
 def _register_slash(reg: SlashRegistry, sess: InteractiveSession) -> None:
    _register_navigation_slash(reg, sess)
    _register_persona_slash(reg, sess)
    _register_telemetry_slash(reg)
 def _completer(personas: list[Persona], slash_names: list[str]) -> WordCompleter:
    words = [f"/{n}" for n in slash_names]
    words += [p.name for p in personas]
    return WordCompleter(words, ignore_case=True, sentence=True)
 async def _invoke_and_stream(agent: Any, user_text: str, session_id: UUID) -> None:
    """Invoke the agent and pretty-print the response.
    v0.1 keeps it simple — full ainvoke, then print the final message.
    Token-level streaming via astream is a Step 16 polish.
    """
    result = await agent.ainvoke(
        {"messages": [{"role": "user", "content": user_text}]},
        config={"configurable": {"thread_id": str(session_id)}},
    )
    messages = result.get("messages", []) if isinstance(result, dict) else []
    if not messages:
        return
    last = messages[-1]
    content: Any = getattr(last, "content", "") or ""
    if isinstance(content, list):
        content = "\n".join(
            (c.get("text", str(c)) if isinstance(c, dict) else str(c)) for c in content
        )
    _CONSOLE.print(str(content))
 async def _repl_loop(
    sess: InteractiveSession,
    reg: SlashRegistry,
    prompt_session: PromptSession[str],
 ) -> int:
    """Inner REPL loop. Returns 0 on clean exit, non-zero on error."""
    while True:
        try:
            line = await prompt_session.prompt_async("» ")
        except (EOFError, KeyboardInterrupt):
            _CONSOLE.print()
            return 0
        line = (line or "").strip()
        if not line:
            continue
        parsed = parse_slash(line)
        if parsed is not None:
            if parsed.name == "":
                _CONSOLE.print("[dim]empty slash command; try /help[/]")
                continue
            done = await reg.dispatch(parsed)
            if done:
                return 0
            if parsed.name not in reg.names:
                _CONSOLE.print(f"[yellow]unknown command: /{parsed.name}[/]")
            continue
        # Forward to agent.
        expanded = _expand_file_refs(line, sess.repo_root)
        agent = sess.build_agent_if_needed()
        try:
            await _invoke_and_stream(agent, expanded, sess.session_id)
        except Exception as e:
            _CONSOLE.print(f"[red]agent error:[/] {type(e).__name__}: {e}")
 async def _interactive_loop_async(persona_override: str | None, model_override: str | None) -> int:
    config = load_config()
    require_consent(config.data_dir)
    db = Database(config.database_url)
    await db.init_schema()
    personas = load_personas_from_dir(_seed_root() / "personas")
    if not personas:
        _CONSOLE.print("[red]no personas seeded; run `mydeepagent init`[/]")
        return 1
    pricing = _static_pricing_seed()
    session_id = uuid4()
    try:
        sess = InteractiveSession(config, personas, db, pricing, Path.cwd(), session_id)
        if persona_override:
            try:
                sess.set_persona(persona_override)
            except ValueError as e:
                _CONSOLE.print(f"[red]{e}[/]")
                return 1
        if model_override:
            sess.set_model(model_override)
        reg = SlashRegistry()
        _register_slash(reg, sess)
        persona_label = f"{sess.persona.name}@{sess.persona.version}"
        _CONSOLE.print(f"[bold cyan]my-deepagent[/] — persona [cyan]{persona_label}[/]")
        _CONSOLE.print("[dim]type /help for commands, /quit to exit[/]")
        prompt_session: PromptSession[str] = PromptSession(
            history=FileHistory(str(_history_path(config))),
            completer=_completer(personas, reg.names),
        )
        return await _repl_loop(sess, reg, prompt_session)
    finally:
        await db.dispose()
 def interactive_command(persona: str | None = None, model: str | None = None) -> int:
    """Entry point for the interactive REPL. Returns an exit code."""
    return asyncio.run(_interactive_loop_async(persona, model))
--- a/my-deepagent/src/my_deepagent/cli/keys_cmd.py
+++ b/my-deepagent/src/my_deepagent/cli/keys_cmd.py
@@ -0,0 +1,40 @@
 """login / logout / keys list commands."""
 from __future__ import annotations
 import typer
 from rich.console import Console
 from ..i18n import t
 from ..keys import delete_api_key, get_api_key, list_providers, mask, set_api_key
 _CONSOLE = Console()
 def login_command(provider: str) -> None:
    value = typer.prompt(t("login.prompt", provider=provider), hide_input=True, default="")
    if not value.strip():
        _CONSOLE.print(f"[yellow]{t('login.empty')}[/]")
        raise typer.Exit(code=1)
    set_api_key(provider, value.strip())
    _CONSOLE.print(f"[green]{t('login.saved', provider=provider)}[/]")
 def logout_command(provider: str) -> None:
    removed = delete_api_key(provider)
    if removed:
        _CONSOLE.print(f"[green]{t('logout.removed', provider=provider)}[/]")
    else:
        _CONSOLE.print(f"[yellow]{t('logout.not_found', provider=provider)}[/]")
 def keys_list_command() -> None:
    _CONSOLE.print(t("keys.header"))
    found = False
    for provider in list_providers():
        value = get_api_key(provider)
        if value:
            _CONSOLE.print(t("keys.entry", provider=provider, masked=mask(value)))
            found = True
    if not found:
        _CONSOLE.print(t("keys.none"))
--- a/my-deepagent/src/my_deepagent/cli/main.py
+++ b/my-deepagent/src/my_deepagent/cli/main.py
@@ -1 +1,150 @@
-"""Typer CLI entry point. Filled in Step 6."""
+"""my-deepagent CLI entry point."""
 from __future__ import annotations
 from pathlib import Path
 import typer
 from .doctor import doctor_command
 from .init import init_command
 from .keys_cmd import keys_list_command, login_command, logout_command
 app = typer.Typer(no_args_is_help=False, add_completion=True)
 runs_app = typer.Typer(help="Inspect or resume past runs.")
@runs_app.command("list")
 def runs_list(
    limit: int = typer.Option(20, help="Number of runs to show"),
    state: str | None = typer.Option(None, help="Filter by state"),
 ) -> None:
    """List recent runs."""
    from .runs import runs_list_command
    runs_list_command(limit, state)
@runs_app.command("show")
 def runs_show(run_id: str = typer.Argument(...)) -> None:
    """Show details for a specific run."""
    from .runs import runs_show_command
    runs_show_command(run_id)
@runs_app.command("resume")
 def runs_resume(run_id: str = typer.Argument(...)) -> None:
    """Resume a paused run (v0.1.0: not implemented — shows status only)."""
    from .runs import runs_resume_command
    runs_resume_command(run_id)
 app.add_typer(runs_app, name="runs")
@app.command()
 def init() -> None:
    """First-run setup: governance consent + API key + doctor."""
    init_command()
@app.command()
 def login(provider: str = typer.Argument("openrouter")) -> None:
    """Store an API key for the given provider in the OS keyring."""
    login_command(provider)
@app.command()
 def logout(provider: str = typer.Argument("openrouter")) -> None:
    """Remove a stored API key from the OS keyring."""
    logout_command(provider)
@app.command(name="keys")
 def keys_list() -> None:
    """List registered providers (masked)."""
    keys_list_command()
@app.command()
 def doctor() -> None:
    """Run environment diagnostics (Python/uv/disk for v0.1.0; full suite in Step 12)."""
    doctor_command()
@app.command(name="run")
 def run(
    workflow_path: Path = typer.Argument(..., help="Path to the workflow yaml"),  # noqa: B008
    repo: Path = typer.Option(Path.cwd(), help="Repo root"),  # noqa: B008
    base_branch: str = typer.Option("main", help="Base branch"),
    no_preview: bool = typer.Option(False, "--no-preview", help="Skip cost preview"),
 ) -> None:
    """Execute a workflow template end-to-end."""
    from .run import run_command
    run_command(workflow_path, repo, base_branch, no_preview)
@app.command()
 def stats(
    by: str = typer.Option("model", help="model | persona | day"),
    since_days: int = typer.Option(7, help="Window size in days"),
 ) -> None:
    """Aggregate LLM-call stats from the ledger."""
    from .stats import stats_command
    stats_command(by, since_days)
@app.command()
 def budget() -> None:
    """Show the current budget ledger (per-scope spend / cap)."""
    from .stats import budget_command
    budget_command()
@app.command(name="costs")
 def costs() -> None:
    """Alias for `stats --by day` over the last 30 days."""
    from .stats import stats_command
    stats_command(by="day", since_days=30)
@app.command(name="pricing")
 def pricing() -> None:
    """Show cached OpenRouter pricing matrix (populated by `doctor`)."""
    from .stats import pricing_command
    pricing_command()
@app.callback(invoke_without_command=True)
 def main(
    ctx: typer.Context,
    agent: str | None = typer.Option(None, "--agent", help="Start with a specific persona"),
    model: str | None = typer.Option(None, "--model", help="Model override"),
 ) -> None:
    from ..logging import configure_logging
    try:
        from ..config import load_config
        cfg = load_config()
        configure_logging(level=cfg.log_level, json_output=False)
    except Exception:
        configure_logging(level="info", json_output=False)
    if ctx.invoked_subcommand is None:
        from .interactive import interactive_command
        code = interactive_command(agent, model)
        raise typer.Exit(code=code)
 if __name__ == "__main__":
    app()
--- a/my-deepagent/src/my_deepagent/cli/run.py
+++ b/my-deepagent/src/my_deepagent/cli/run.py
@@ -1 +1,194 @@
-"""CLI run command implementation. Implemented in Step 6."""
+"""mydeepagent run <workflow.yaml> — execute a workflow end-to-end."""
 from __future__ import annotations
 import asyncio
 from pathlib import Path
 import typer
 from rich.console import Console
 from rich.table import Table
 from sqlalchemy import select
 from ..artifact_schema import ArtifactSchemaRegistry
 from ..binding import BackendAvailability, PersonaConsentStore, bind_personas
 from ..budget import BudgetTracker, make_budget_tracker_from_config
 from ..config import Config, load_config
 from ..engine import WorkflowEngine
 from ..enums import Backend
 from ..governance import require_consent
 from ..monitoring.cost_estimator import WorkflowCostEstimate, estimate_workflow
 from ..monitoring.pricing import ModelPrice, PricingCache
 from ..persistence.db import Database
 from ..persistence.models import ModelPricingRow
 from ..persona import load_personas_from_dir
 from ..tui.approval import cli_approval_callback
 from ..workflow import load_workflow_yaml
 _CONSOLE = Console()
 def run_command(
    workflow_path: Path,
    repo: Path,
    base_branch: str,
    no_preview: bool = False,
 ) -> None:
    """Synchronous CLI wrapper for the async engine."""
    asyncio.run(_run_async(workflow_path, repo, base_branch, no_preview))
 async def cli_budget_prompt(scope: str, projected: float, cap: float) -> bool:
    """Prompt the user to extend the budget cap when it is hit."""
    _CONSOLE.print()
    _CONSOLE.print(
        f"[yellow]Budget cap reached[/]: scope={scope} projected=${projected:.4f} cap=${cap:.4f}"
    )
    return typer.confirm("Extend cap and proceed?", default=False)
 def _static_pricing_seed_fallback() -> list[ModelPrice]:
    """Return seed model prices used when the model_pricing DB table is empty.
    Unit: USD per 1,000 tokens. (OpenRouter publishes per-token; we store per-1K to keep
    cost arithmetic in a more readable range. ``compute_cost(model, in, out)`` divides
    by 1000.)
    """
    return [
        ModelPrice("anthropic/claude-sonnet-4-6", 0.003, 0.015, 200_000),
        ModelPrice("anthropic/claude-haiku-4-5", 0.001, 0.005, 200_000),
        ModelPrice("anthropic/claude-opus-4-1", 0.015, 0.075, 200_000),
        ModelPrice("deepseek/deepseek-chat", 0.00028, 0.00112, 64_000),
    ]
 async def _load_pricing_from_db(config: Config, db: Database) -> PricingCache:
    """Load pricing from the persisted model_pricing table.
    Falls back to the static seed when the table is empty (doctor not yet run).
    """
    async with db.session() as s:
        rows = list((await s.execute(select(ModelPricingRow))).scalars().all())
    cache = PricingCache()
    if rows:
        cache.set(
            [
                ModelPrice(
                    model=r.model,
                    input_per_1k_usd=r.input_per_1k_usd,
                    output_per_1k_usd=r.output_per_1k_usd,
                    context_length=r.context_length,
                )
                for r in rows
            ]
        )
        return cache
    cache.set(_static_pricing_seed_fallback())
    return cache
 def _print_preview(estimate: WorkflowCostEstimate, config: object) -> None:
    cfg: Config = config  # type: ignore[assignment]
    table = Table(title="Cost preview")
    table.add_column("Phase")
    table.add_column("Persona")
    table.add_column("Model")
    table.add_column("In/Out tokens", justify="right")
    table.add_column("Est. cost", justify="right")
    for p in estimate.phases:
        cost_str = f"${p.estimated_cost_usd:.4f}"
        table.add_row(
            p.phase_key,
            p.persona_name,
            p.model,
            f"{p.estimated_input_tokens}/{p.estimated_output_tokens}",
            cost_str,
        )
    _CONSOLE.print(table)
    _CONSOLE.print(f"Total estimated: [bold]${estimate.total_usd:.4f}[/]")
    _CONSOLE.print(
        f"Run cap: [bold]${cfg.budget_run_usd}[/] | Daily cap: [bold]${cfg.budget_daily_usd}[/]"
    )
 async def _run_async(
    workflow_path: Path,
    repo: Path,
    base_branch: str,
    no_preview: bool,
 ) -> None:
    config = load_config()
    require_consent(config.data_dir)
    template = load_workflow_yaml(workflow_path)
    # Locate seed schemas relative to the installed package root
    seed_root = Path(__file__).resolve().parents[3] / "docs" / "schemas"
    personas_dir = seed_root / "personas"
    artifacts_root = seed_root / "artifacts"
    personas = load_personas_from_dir(personas_dir)
    registry = ArtifactSchemaRegistry(roots=[artifacts_root])
    db = Database(config.database_url)
    await db.init_schema()
    # Crash recovery: mark non-terminal runs from a previous process as failed
    # so the active-run uniqueness slot is freed before starting new work.
    from ..recovery import sweep_orphan_runs
    report = await sweep_orphan_runs(db)
    if report.total:
        _CONSOLE.print(
            f"[yellow]recovery: marked {len(report.failed_runs)} orphan run(s) "
            f"and {len(report.failed_phases)} phase(s) as failed[/]"
        )
    try:
        consent_store = PersonaConsentStore(config.data_dir / "persona-consents.json")
        bindings = bind_personas(
            template,
            personas,
            BackendAvailability(available_backends=frozenset(Backend)),
            consent_store,
        )
        # Pricing + cost preview — use DB-cached prices; fall back to static seed
        pricing = await _load_pricing_from_db(config, db)
        if not no_preview:
            estimate = estimate_workflow(template, bindings, pricing)
            _print_preview(estimate, config)
            if not typer.confirm("Proceed?", default=True):
                raise typer.Exit(code=0)
        budget: BudgetTracker = make_budget_tracker_from_config(
            db, config, prompt_callback=cli_budget_prompt
        )
        await budget.init()
        engine = WorkflowEngine(
            db=db,
            config=config,
            persona_pool=personas,
            artifact_registry=registry,
            consent_store=consent_store,
            available_backends=BackendAvailability(available_backends=frozenset(Backend)),
            approval_callback=cli_approval_callback,
            budget_tracker=budget,
            pricing=pricing,
        )
        engine.install_signal_handlers()
        result = await engine.run(
            template,
            repo_path=repo,
            base_branch=base_branch,
        )
        _CONSOLE.print(f"[bold]{result.state.value}[/] run_id={result.run_id}")
        if result.final_report_path:
            _CONSOLE.print(f"report: {result.final_report_path}")
        if result.error:
            _CONSOLE.print(f"[red]error[/]: {result.error}")
            raise typer.Exit(code=1)
    finally:
        await db.dispose()
--- a/my-deepagent/src/my_deepagent/cli/runs.py
+++ b/my-deepagent/src/my_deepagent/cli/runs.py
@@ -0,0 +1,204 @@
 """mydeepagent runs list / show / resume — read-only-ish run history queries."""
 from __future__ import annotations
 import asyncio
 from pathlib import Path
 from uuid import UUID
 import typer
 from rich.console import Console
 from rich.table import Table
 from sqlalchemy import desc, select
 from ..config import load_config
 from ..persistence.db import Database
 from ..persistence.models import (
    ArtifactRow,
    RunEventRow,
    RunPhaseRow,
    RunRow,
 )
 _CONSOLE = Console()
 def runs_list_command(limit: int = 20, state_filter: str | None = None) -> None:
    asyncio.run(_runs_list_async(limit, state_filter))
 def runs_show_command(run_id: str) -> None:
    asyncio.run(_runs_show_async(run_id))
 def runs_resume_command(run_id: str) -> None:
    asyncio.run(_runs_resume_async(run_id))
 async def _runs_list_async(limit: int, state_filter: str | None) -> None:
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            stmt = select(RunRow).order_by(desc(RunRow.created_at)).limit(limit)
            if state_filter:
                stmt = stmt.where(RunRow.state == state_filter)
            rows = (await s.execute(stmt)).scalars().all()
        if not rows:
            _CONSOLE.print("[dim](no runs)[/]")
            return
        table = Table(title=f"Recent runs (latest {len(rows)})")
        table.add_column("Run ID")
        table.add_column("State")
        table.add_column("Repo")
        table.add_column("Branch")
        table.add_column("Created")
        table.add_column("Ended")
        for r in rows:
            table.add_row(
                str(r.id)[:8] + "…",
                r.state,
                Path(r.repo_path).name,
                r.base_branch,
                (r.created_at or "")[:19],
                (r.ended_at or "—")[:19] if r.ended_at else "—",
            )
        _CONSOLE.print(table)
    finally:
        await db.dispose()
 async def _runs_show_async(run_id: str) -> None:
    full_id = await _resolve_run_id(run_id)
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            run = await s.get(RunRow, full_id)
            if run is None:
                _CONSOLE.print(f"[red]run not found:[/] {run_id}")
                raise typer.Exit(code=1)
            phases = (
                (
                    await s.execute(
                        select(RunPhaseRow)
                        .where(RunPhaseRow.run_id == full_id)
                        .order_by(RunPhaseRow.seq)
                    )
                )
                .scalars()
                .all()
            )
            artifacts = (
                (await s.execute(select(ArtifactRow).where(ArtifactRow.run_id == full_id)))
                .scalars()
                .all()
            )
            events = (
                (
                    await s.execute(
                        select(RunEventRow)
                        .where(RunEventRow.run_id == full_id)
                        .order_by(RunEventRow.seq)
                        .limit(50)
                    )
                )
                .scalars()
                .all()
            )
        _CONSOLE.print(f"[bold]Run {run.id}[/]")
        _CONSOLE.print(f"  state: [cyan]{run.state}[/]")
        _CONSOLE.print(f"  repo: {run.repo_path}@{run.base_branch}")
        _CONSOLE.print(f"  worktree: {run.worktree_root}")
        _CONSOLE.print(f"  created: {run.created_at}")
        _CONSOLE.print(f"  ended: {run.ended_at or '—'}")
        if run.final_report_path:
            _CONSOLE.print(f"  report: {run.final_report_path}")
        _CONSOLE.print()
        _CONSOLE.print("[bold]Phases[/]")
        for ph in phases:
            _CONSOLE.print(f"  - {ph.phase_key:20s} state={ph.state:15s} attempts={ph.attempts}")
        if artifacts:
            _CONSOLE.print()
            _CONSOLE.print("[bold]Artifacts[/]")
            for a in artifacts:
                _CONSOLE.print(f"  - {a.path} (schema={a.schema_id}, valid={a.valid})")
        _CONSOLE.print()
        _CONSOLE.print(f"[bold]Events (last {len(events)})[/]")
        for ev in events:
            _CONSOLE.print(f"  [{ev.seq:4d}] {ev.ts}  {ev.type}")
    finally:
        await db.dispose()
 async def _runs_resume_async(run_id: str) -> None:
    """v0.1.0: resume is not implemented.
    Surfaces the run state and hints at next steps. Future v0.2 implementation:
    rehydrate the workflow template by template_hash, replay phase loop from the
    first non-completed phase using the existing checkpointer.
    """
    full_id = await _resolve_run_id(run_id)
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            run = await s.get(RunRow, full_id)
            if run is None:
                _CONSOLE.print(f"[red]run not found:[/] {run_id}")
                raise typer.Exit(code=1)
        if run.state in ("completed", "failed", "aborted"):
            _CONSOLE.print(
                f"[yellow]Run {run.id} is already terminal ({run.state}). "
                "Start a fresh run with `mydeepagent run <workflow.yaml>`.[/]"
            )
            raise typer.Exit(code=1)
        _CONSOLE.print(
            "[yellow]Resume is not implemented in v0.1.0. The crash-recovery sweep at startup "
            "marked this run as failed; relaunch the workflow with `mydeepagent run`.[/]"
        )
        raise typer.Exit(code=2)
    finally:
        await db.dispose()
 async def _resolve_run_id(prefix_or_full: str) -> str:
    """Accept either a full UUID or a 6+ char prefix and return the canonical full id."""
    try:
        return str(UUID(prefix_or_full))
    except ValueError:
        pass
    if len(prefix_or_full) < 6:
        _CONSOLE.print(
            f"[red]ambiguous run id (need full UUID or >=6-char prefix):[/] {prefix_or_full}"
        )
        raise typer.Exit(code=2)
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            rows = (
                (
                    await s.execute(
                        select(RunRow.id).where(RunRow.id.like(f"{prefix_or_full}%")).limit(2)
                    )
                )
                .scalars()
                .all()
            )
        if not rows:
            _CONSOLE.print(f"[red]no run matches prefix:[/] {prefix_or_full}")
            raise typer.Exit(code=1)
        if len(rows) > 1:
            _CONSOLE.print(f"[red]ambiguous prefix matches >1 run:[/] {prefix_or_full}")
            raise typer.Exit(code=1)
        return rows[0]
    finally:
        await db.dispose()
--- a/my-deepagent/src/my_deepagent/cli/stats.py
+++ b/my-deepagent/src/my_deepagent/cli/stats.py
@@ -1 +1,179 @@
-"""CLI stats command for usage summary. Implemented in Step 12."""
+"""mydeepagent stats / costs / budget / pricing — read-only ledger + history queries."""
 from __future__ import annotations
 import asyncio
 from collections.abc import Sequence
 from datetime import UTC, datetime, timedelta
 from typing import Any
 import typer
 from rich.console import Console
 from rich.table import Table
 from sqlalchemy import func, select
 from ..config import load_config
 from ..persistence.db import Database
 from ..persistence.models import BudgetLedgerRow, LlmCallRow, ModelPricingRow
 _CONSOLE = Console()
 def stats_command(by: str = "model", since_days: int = 7) -> None:
    """Synchronous CLI wrapper for the async stats query."""
    asyncio.run(_stats_async(by, since_days))
 async def _stats_async(by: str, since_days: int) -> None:
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        since = (datetime.now(UTC) - timedelta(days=since_days)).isoformat(timespec="seconds")
        async with db.session() as s:
            if by == "model":
                rows: Sequence[Any] = (
                    await s.execute(
                        select(
                            LlmCallRow.model,
                            func.count().label("calls"),
                            func.sum(LlmCallRow.input_tokens).label("input"),
                            func.sum(LlmCallRow.output_tokens).label("output"),
                            func.sum(LlmCallRow.cost_usd_total).label("cost"),
                        )
                        .where(LlmCallRow.ts >= since)
                        .group_by(LlmCallRow.model)
                    )
                ).all()
                _render_stats_table(
                    "Stats by model",
                    rows,
                    ["Model", "Calls", "Input", "Output", "Cost ($)"],
                )
            elif by == "persona":
                rows = (
                    await s.execute(
                        select(
                            LlmCallRow.persona_name,
                            func.count().label("calls"),
                            func.sum(LlmCallRow.cost_usd_total).label("cost"),
                        )
                        .where(LlmCallRow.ts >= since)
                        .group_by(LlmCallRow.persona_name)
                    )
                ).all()
                _render_stats_table(
                    "Stats by persona",
                    rows,
                    ["Persona", "Calls", "Cost ($)"],
                )
            elif by == "day":
                rows = (
                    await s.execute(
                        select(
                            func.substr(LlmCallRow.ts, 1, 10).label("day"),
                            func.count().label("calls"),
                            func.sum(LlmCallRow.cost_usd_total).label("cost"),
                        )
                        .where(LlmCallRow.ts >= since)
                        .group_by("day")
                    )
                ).all()
                _render_stats_table(
                    "Stats by day",
                    rows,
                    ["Day", "Calls", "Cost ($)"],
                )
            else:
                typer.echo(f"unknown --by option: {by!r}", err=True)
                raise typer.Exit(code=2)
    finally:
        await db.dispose()
 def budget_command() -> None:
    """Synchronous CLI wrapper for the async budget ledger query."""
    asyncio.run(_budget_async())
 async def _budget_async() -> None:
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            rows = list((await s.execute(select(BudgetLedgerRow))).scalars().all())
        if not rows:
            _CONSOLE.print("[dim](no budget activity yet)[/]")
            return
        table = Table(title="Budget ledger")
        table.add_column("Scope")
        table.add_column("Spent ($)", justify="right")
        table.add_column("Cap ($)", justify="right")
        table.add_column("Remaining ($)", justify="right")
        table.add_column("Last update")
        for row in rows:
            remaining = (
                "" if row.cap_usd is None else f"{max(0.0, row.cap_usd - row.spent_usd):.4f}"
            )
            cap = "—" if row.cap_usd is None else f"{row.cap_usd:.4f}"
            table.add_row(
                row.scope,
                f"{row.spent_usd:.4f}",
                cap,
                remaining,
                row.last_updated,
            )
        _CONSOLE.print(table)
    finally:
        await db.dispose()
 def pricing_command() -> None:
    """Show cached OpenRouter pricing matrix (populated by `doctor`)."""
    asyncio.run(_pricing_async())
 async def _pricing_async() -> None:
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    try:
        async with db.session() as s:
            rows = list(
                (await s.execute(select(ModelPricingRow).order_by(ModelPricingRow.model)))
                .scalars()
                .all()
            )
        if not rows:
            _CONSOLE.print("[dim](no pricing data — run `mydeepagent doctor` to fetch)[/]")
            return
        table = Table(title="OpenRouter pricing (per 1K tokens, USD)")
        table.add_column("Model")
        table.add_column("Input", justify="right")
        table.add_column("Output", justify="right")
        table.add_column("Context", justify="right")
        table.add_column("Fetched")
        for r in rows:
            table.add_row(
                r.model,
                f"{r.input_per_1k_usd:.4f}",
                f"{r.output_per_1k_usd:.4f}",
                str(r.context_length),
                (r.fetched_at or "")[:19],
            )
        _CONSOLE.print(table)
    finally:
        await db.dispose()
 def _render_stats_table(title: str, rows: Sequence[Any], headers: list[str]) -> None:
    if not rows:
        _CONSOLE.print("[dim](no data for the past period)[/]")
        return
    table = Table(title=title)
    for h in headers:
        table.add_column(h)
    for row in rows:
        table.add_row(*[str(v if v is not None else "") for v in row])
    _CONSOLE.print(table)
--- a/my-deepagent/src/my_deepagent/engine.py
+++ b/my-deepagent/src/my_deepagent/engine.py
@@ -1 +1,917 @@
-"""LangGraph run engine orchestrator. Implemented in Step 7."""
+"""WorkflowEngine: orchestrates run lifecycle, phase loop, artifact validation, approval gate."""
 from __future__ import annotations
 import asyncio
 import json
 import signal
 from contextlib import suppress
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
 from uuid import UUID, uuid4
 from sqlalchemy import select
 from .artifact_schema import ArtifactSchemaRegistry
 from .audit import make_audit_recorder
 from .binding import (
    BackendAvailability,
    Binding,
    BindingOverride,
    PersonaConsentStore,
    bind_personas,
 )
 from .budget import BudgetTracker
 from .config import Config
 from .enums import ApprovalDecisionAction, ApprovalState, RunPhaseState, RunState
 from .errors import MyDeepAgentError
 from .hash import sha256
 from .middleware.artifact_watcher import ArtifactWatcherMiddleware
 from .middleware.audit import AuditToolMiddleware
 from .middleware.cost import CostMiddleware
 from .monitoring.pricing import PricingCache
 from .persistence.db import Database
 from .persistence.models import (
    AgentPersonaRow,
    ApprovalDecisionRow,
    ApprovalRequestRow,
    ArtifactRow,
    LlmCallRow,
    RunBindingRow,
    RunEventRow,
    RunInputRow,
    RunPhaseRow,
    RunRow,
    WorkflowTemplateRow,
 )
 from .persona import Persona
 from .run_event import RunEventType, run_idempotency_key
 from .session import build_agent
 from .workflow import WorkflowPhase, WorkflowTemplate
 # ApprovalCallback type: async (request_payload: dict, gates: list[str]) -> ApprovalDecisionAction
 ApprovalCallback = Any  # Callable[[dict, list[str]], Awaitable[ApprovalDecisionAction]]
 _DEFAULT_PHASE_TIMEOUT_SECONDS = 300  # 5 minutes
@dataclass(frozen=True)
 class RunResult:
    run_id: UUID
    state: RunState
    final_report_path: Path | None
    error: str | None = None
 class _PhaseAbortedError(Exception):
    def __init__(self, reason: str) -> None:
        self.reason = reason
        super().__init__(reason)
 class WorkflowEngine:
    """In-process workflow engine for v0.1.0.
    For each phase: build_agent -> invoke -> wait for write_file targeting
    expected_artifact_path -> load + jsonschema validate -> repair 1x if invalid
    -> approval gate -> next phase.
    All events appended idempotently to run_events via the
    (run_id, idempotency_key) UNIQUE constraint — concurrent/retry safe.
    """
    def __init__(
        self,
        db: Database,
        config: Config,
        persona_pool: list[Persona],
        artifact_registry: ArtifactSchemaRegistry,
        consent_store: PersonaConsentStore,
        available_backends: BackendAvailability,
        approval_callback: ApprovalCallback,
        budget_tracker: BudgetTracker | None = None,
        pricing: PricingCache | None = None,
    ) -> None:
        self._db = db
        self._config = config
        self._personas = persona_pool
        self._artifacts = artifact_registry
        self._consent = consent_store
        self._backends = available_backends
        self._approval = approval_callback
        self._budget = budget_tracker
        self._pricing = pricing or PricingCache()
        self._shutdown_event: asyncio.Event = asyncio.Event()
        self._inflight_tasks: set[asyncio.Task[Any]] = set()
    def install_signal_handlers(self) -> None:
        """Attach SIGTERM/SIGINT handlers to the running event loop.
        Idempotent: calling twice replaces the previous handlers. Should be invoked
        from ``cli/run.py`` once the asyncio loop is up. On shutdown signal:
        in-flight ainvoke() tasks get a 30s grace, then are cancelled.
        """
        loop = asyncio.get_running_loop()
        for sig in (signal.SIGTERM, signal.SIGINT):
            with suppress(NotImplementedError, ValueError):
                loop.add_signal_handler(sig, self._on_signal, sig)
    def _on_signal(self, sig: signal.Signals) -> None:
        self._shutdown_event.set()
        loop = asyncio.get_running_loop()
        loop.call_later(30.0, self._force_cancel_inflight)
    def _force_cancel_inflight(self) -> None:
        for task in list(self._inflight_tasks):
            if not task.done():
                task.cancel()
    @property
    def shutdown_requested(self) -> bool:
        return self._shutdown_event.is_set()
    async def run(
        self,
        template: WorkflowTemplate,
        *,
        repo_path: Path,
        base_branch: str = "main",
        requirements_md: str = "",
        override: BindingOverride | None = None,
    ) -> RunResult:
        run_id = uuid4()
        worktree_root = self._config.workspace_root / str(run_id)
        worktree_root.mkdir(parents=True, exist_ok=True)
        artifacts_dir = worktree_root / "artifacts"
        artifacts_dir.mkdir(parents=True, exist_ok=True)
        bindings = bind_personas(template, self._personas, self._backends, self._consent, override)
        await self._persist_run_skeleton(
            None,
            run_id,
            template,
            bindings,
            repo_path,
            base_branch,
            worktree_root,
            requirements_md,
        )
        await self._append_event(run_id, None, RunEventType.RUN_CREATED, {})
        await self._append_event(run_id, None, RunEventType.RUN_STARTED, {})
        await self._set_run_state(run_id, RunState.EXECUTING)
        try:
            for phase_def in template.phases:
                role_binding = bindings[phase_def.role]
                await self._run_phase(run_id, worktree_root, template, phase_def, role_binding)
            await self._set_run_state(run_id, RunState.COMPLETED)
            await self._append_event(run_id, None, RunEventType.RUN_COMPLETED, {})
            report_path = await self._compose_final_report(
                run_id, worktree_root, RunState.COMPLETED
            )
            return RunResult(run_id=run_id, state=RunState.COMPLETED, final_report_path=report_path)
        except _PhaseAbortedError as e:
            await self._set_run_state(run_id, RunState.ABORTED)
            await self._append_event(run_id, None, RunEventType.RUN_ABORTED, {"reason": e.reason})
            report_path = await self._compose_final_report(
                run_id, worktree_root, RunState.ABORTED, error=e.reason
            )
            return RunResult(
                run_id=run_id,
                state=RunState.ABORTED,
                final_report_path=report_path,
                error=e.reason,
            )
        except MyDeepAgentError as e:
            await self._set_run_state(run_id, RunState.FAILED)
            await self._append_event(
                run_id, None, RunEventType.RUN_FAILED, {"code": e.code, "message": str(e)}
            )
            report_path = await self._compose_final_report(
                run_id, worktree_root, RunState.FAILED, error=str(e)
            )
            return RunResult(
                run_id=run_id,
                state=RunState.FAILED,
                final_report_path=report_path,
                error=str(e),
            )
    # ------------------------------------------------------------------
    # Phase execution
    # ------------------------------------------------------------------
    async def _run_phase(
        self,
        run_id: UUID,
        worktree_root: Path,
        template: WorkflowTemplate,
        phase_def: WorkflowPhase,
        binding: Binding,
    ) -> None:
        if self.shutdown_requested:
            await self._append_event(run_id, None, RunEventType.RUN_PAUSED, {"reason": "shutdown"})
            await self._set_run_state(run_id, RunState.PAUSED)
            raise _PhaseAbortedError(reason="shutdown signal received")
        phase_id = await self._ensure_phase_row(run_id, phase_def)
        await self._set_phase_state(phase_id, RunPhaseState.RUNNING)
        await self._append_event(
            run_id, phase_id, RunEventType.PHASE_STARTED, {"phase_key": phase_def.key}
        )
        # Phases without an expected artifact complete immediately
        if phase_def.expected_artifact is None:
            await self._set_phase_state(phase_id, RunPhaseState.COMPLETED)
            await self._append_event(run_id, phase_id, RunEventType.PHASE_COMPLETED, {})
            return
        expected_path = (worktree_root / phase_def.expected_artifact.path).resolve()
        expected_path.parent.mkdir(parents=True, exist_ok=True)
        # Repair loop: max 2 attempts
        for attempt in range(1, 3):
            validated = await self._run_agent_and_validate(
                run_id, phase_id, worktree_root, phase_def, binding, expected_path, attempt
            )
            if validated:
                break
            # validated=False means: invalid/timeout + still have budget for retry
            # on attempt 2, _run_agent_and_validate raises instead of returning False
        await self._run_approval_gate(run_id, phase_id, phase_def, expected_path)
        await self._set_phase_state(phase_id, RunPhaseState.COMPLETED)
        await self._append_event(run_id, phase_id, RunEventType.PHASE_COMPLETED, {})
    async def _run_agent_and_validate(
        self,
        run_id: UUID,
        phase_id: UUID,
        worktree_root: Path,
        phase_def: WorkflowPhase,
        binding: Binding,
        expected_path: Path,
        attempt: int,
    ) -> bool:
        """Invoke agent for one attempt and validate artifact. Returns True on success.
        Returns False when attempt < 2 and artifact is missing/invalid (caller retries).
        Raises MyDeepAgentError on final failure (attempt >= 2).
        """
        written = await self._invoke_agent_until_artifact(
            run_id, phase_id, worktree_root, phase_def, binding, expected_path, attempt=attempt
        )
        if not written:
            await self._append_event(run_id, phase_id, RunEventType.ARTIFACT_TIMEOUT, {})
            if attempt >= 2:
                await self._set_phase_state(phase_id, RunPhaseState.FAILED)
                await self._append_event(
                    run_id,
                    phase_id,
                    RunEventType.PHASE_FAILED,
                    {"reason": "artifact_timeout_exhausted"},
                )
                raise MyDeepAgentError.human_required(
                    "artifact_timeout_exhausted",
                    message=(
                        f"phase '{phase_def.key}' did not produce expected artifact "
                        f"after {attempt} attempts"
                    ),
                )
            return False
        # Validate the written artifact
        await self._set_phase_state(phase_id, RunPhaseState.VALIDATING)
        assert phase_def.expected_artifact is not None
        schema_id = phase_def.expected_artifact.schema_id
        try:
            data = json.loads(expected_path.read_text(encoding="utf-8"))
        except (OSError, json.JSONDecodeError) as exc:
            await self._append_event(
                run_id,
                phase_id,
                RunEventType.ARTIFACT_INVALID,
                {"errors": [{"message": str(exc)}]},
            )
            if attempt >= 2:
                raise MyDeepAgentError.human_required(
                    "artifact_invalid_after_repair",
                    message=str(exc),
                    cause=exc,
                ) from exc
            await self._append_event(run_id, phase_id, RunEventType.PROMPT_REPAIRED, {})
            return False
        result = self._artifacts.validate(schema_id, data)
        if result.ok:
            await self._persist_artifact(run_id, phase_id, expected_path, schema_id, valid=True)
            await self._append_event(run_id, phase_id, RunEventType.ARTIFACT_VALIDATED, {})
            return True
        error_payload = [{"path": f.path, "message": f.message} for f in result.errors[:5]]
        await self._persist_artifact(
            run_id,
            phase_id,
            expected_path,
            schema_id,
            valid=False,
            errors=list(result.errors),
        )
        await self._append_event(
            run_id, phase_id, RunEventType.ARTIFACT_INVALID, {"errors": error_payload}
        )
        if attempt >= 2:
            await self._set_phase_state(phase_id, RunPhaseState.FAILED)
            await self._append_event(
                run_id,
                phase_id,
                RunEventType.PHASE_FAILED,
                {"reason": "artifact_invalid_after_repair"},
            )
            raise MyDeepAgentError.human_required(
                "artifact_invalid_after_repair",
                message=f"phase '{phase_def.key}' artifact failed validation after repair",
            )
        await self._append_event(run_id, phase_id, RunEventType.PROMPT_REPAIRED, {})
        return False
    async def _run_approval_gate(
        self,
        run_id: UUID,
        phase_id: UUID,
        phase_def: WorkflowPhase,
        expected_path: Path,
    ) -> None:
        """Run the approval gate if gates are configured. Raises on reject/abort."""
        if not phase_def.gates:
            return
        await self._set_phase_state(phase_id, RunPhaseState.AWAITING_APPROVAL)
        decision = await self._request_approval(run_id, phase_id, phase_def, expected_path)
        if decision == ApprovalDecisionAction.ABORT:
            raise _PhaseAbortedError(reason=f"aborted at phase {phase_def.key}")
        if decision != ApprovalDecisionAction.APPROVE:
            await self._set_phase_state(phase_id, RunPhaseState.FAILED)
            await self._append_event(
                run_id, phase_id, RunEventType.PHASE_FAILED, {"reason": decision.value}
            )
            raise MyDeepAgentError.human_required(
                "approval_rejected",
                message=f"phase '{phase_def.key}' approval was {decision.value}",
            )
    async def _invoke_agent_until_artifact(
        self,
        run_id: UUID,
        phase_id: UUID,
        worktree_root: Path,
        phase_def: WorkflowPhase,
        binding: Binding,
        expected_path: Path,
        attempt: int,
    ) -> bool:
        """Build agent + invoke + return True if expected_path was written, False on timeout."""
        written_paths: list[str] = []
        async def _on_written(path: str, _content: str) -> None:
            written_paths.append(path)
        watcher = ArtifactWatcherMiddleware(expected_path, _on_written)
        cost_mw = CostMiddleware(
            pricing=self._pricing,
            model_name=binding.persona.model,
            run_id=run_id,
            phase_id=phase_id,
            persona_name=binding.persona.name,
            budget_tracker=self._budget,
            recorder=self._record_llm_call,
        )
        audit_mw = AuditToolMiddleware(
            run_id=run_id,
            phase_id=phase_id,
            file_recorder=make_audit_recorder(self._config.state_dir),
        )
        agent = build_agent(
            binding.persona,
            self._config,
            root_dir=worktree_root,
            middleware=[watcher, cost_mw, audit_mw],
        )
        envelope = self._build_envelope(run_id, phase_id, phase_def, attempt, expected_path)
        await self._append_event(
            run_id, phase_id, RunEventType.ARTIFACT_EXPECTED, {"path": str(expected_path)}
        )
        event_type = RunEventType.PROMPT_REPAIRED if attempt > 1 else RunEventType.PROMPT_SENT
        await self._append_event(run_id, phase_id, event_type, {"attempt": attempt})
        timeout = float(phase_def.timeout_seconds or _DEFAULT_PHASE_TIMEOUT_SECONDS)
        try:
            invoke_task: asyncio.Task[Any] = asyncio.create_task(
                agent.ainvoke({"messages": [{"role": "user", "content": envelope}]})
            )
            self._inflight_tasks.add(invoke_task)
            try:
                await asyncio.wait_for(asyncio.shield(invoke_task), timeout=timeout)
            except TimeoutError:
                pass
            finally:
                self._inflight_tasks.discard(invoke_task)
        except asyncio.CancelledError:
            pass
        return expected_path.is_file()
    def _build_envelope(
        self,
        run_id: UUID,
        phase_id: UUID,
        phase_def: WorkflowPhase,
        attempt: int,
        expected_path: Path,
    ) -> str:
        artifact = phase_def.expected_artifact
        assert artifact is not None
        try:
            schema_def = self._artifacts.load(artifact.schema_id)
            schema_inline = json.dumps(schema_def, indent=2, ensure_ascii=False)
        except (MyDeepAgentError, AttributeError):
            # AttributeError covers test scaffolding that instantiates the engine
            # via __new__ without wiring _artifacts; production paths always have it.
            schema_inline = "(schema not available)"
        repair_note = (
            "\n\n[REPAIR ATTEMPT]\n"
            "Your previous artifact did not validate against the JSON Schema below. "
            "Re-read the schema carefully and emit a corrected JSON object that satisfies "
            "every `required` field and respects all `enum`, `type`, `minLength`, and "
            "`additionalProperties: false` constraints."
            if attempt > 1
            else ""
        )
        return (
            f"MYDEEPAGENT_PROMPT_BEGIN {phase_id}\n"
            f"Run: {run_id}\n"
            f"Phase: {phase_def.key}\n"
            f"Attempt: {attempt}\n"
            f"Expected artifact path: {expected_path}\n"
            f"Expected schema id: {artifact.schema_id}\n"
            f"\n"
            f"JSON Schema 2020-12 for this artifact (you MUST satisfy it exactly):\n"
            f"```json\n{schema_inline}\n```\n"
            f"\n"
            f"Use the `write_file` tool to write a JSON object that matches the schema "
            f"to the exact path `{expected_path}`. The file must parse as valid JSON.\n"
            f"\n"
            f"Instructions:\n"
            f"{phase_def.instructions}"
            f"{repair_note}\n"
            f"MYDEEPAGENT_PROMPT_END {phase_id}"
        )
    # ------------------------------------------------------------------
    # Approval gate
    # ------------------------------------------------------------------
    async def _request_approval(
        self,
        run_id: UUID,
        phase_id: UUID,
        phase_def: WorkflowPhase,
        artifact_path: Path,
    ) -> ApprovalDecisionAction:
        request_id = uuid4()
        idem_key = f"{phase_def.key}:{artifact_path.name}"
        payload: dict[str, Any] = {
            "phase_key": phase_def.key,
            "artifact_path": str(artifact_path),
            "gates": list(phase_def.gates),
        }
        async with self._db.session() as s:
            s.add(
                ApprovalRequestRow(
                    id=str(request_id),
                    run_id=str(run_id),
                    phase_id=str(phase_id),
                    gate_key=phase_def.gates[0] if phase_def.gates else "default",
                    state=ApprovalState.PENDING.value,
                    idempotency_key=idem_key,
                    payload=payload,
                    created_at=_now_iso(),
                )
            )
        await self._append_event(
            run_id,
            phase_id,
            RunEventType.APPROVAL_REQUESTED,
            {"request_id": str(request_id)},
        )
        decision: ApprovalDecisionAction = await self._approval(payload, list(phase_def.gates))
        async with self._db.session() as s:
            s.add(
                ApprovalDecisionRow(
                    id=str(uuid4()),
                    approval_request_id=str(request_id),
                    action=decision.value,
                    decided_at=_now_iso(),
                    idempotency_key=f"{idem_key}:{decision.value}",
                )
            )
        await self._append_event(
            run_id, phase_id, RunEventType.APPROVAL_RESOLVED, {"action": decision.value}
        )
        return decision
    # ------------------------------------------------------------------
    # Final report
    # ------------------------------------------------------------------
    async def _compose_final_report(
        self,
        run_id: UUID,
        worktree_root: Path,
        status: RunState,
        error: str | None = None,
    ) -> Path:
        worktree_root.mkdir(parents=True, exist_ok=True)
        async with self._db.session() as s:
            run = await s.get(RunRow, str(run_id))
            phase_rows = list(
                (await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id))))
                .scalars()
                .all()
            )
            artifact_rows = list(
                (await s.execute(select(ArtifactRow).where(ArtifactRow.run_id == str(run_id))))
                .scalars()
                .all()
            )
            event_rows = list(
                (
                    await s.execute(
                        select(RunEventRow)
                        .where(RunEventRow.run_id == str(run_id))
                        .order_by(RunEventRow.seq.desc())
                        .limit(20)
                    )
                )
                .scalars()
                .all()
            )
        report: dict[str, Any] = {
            "runId": str(run_id),
            "templateHash": run.template_hash if run else "",
            "status": status.value,
            "phases": [
                {
                    "key": p.phase_key,
                    "state": p.state,
                    "started_at": p.started_at,
                    "ended_at": p.ended_at,
                    "attempts": p.attempts,
                }
                for p in phase_rows
            ],
            "artifacts": [
                {"path": a.path, "schema": a.schema_id, "hash": a.hash} for a in artifact_rows
            ],
            "events": [{"seq": e.seq, "type": e.type, "ts": e.ts} for e in reversed(event_rows)],
            "unresolved": [],
            "endedAt": _now_iso(),
            "error": error,
        }
        json_path = worktree_root / f"{run_id}.report.json"
        md_path = worktree_root / f"{run_id}.report.md"
        json_path.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8")
        md_path.write_text(_render_report_md(report), encoding="utf-8")
        return json_path
    # ------------------------------------------------------------------
    # Persistence helpers
    # ------------------------------------------------------------------
    async def _record_llm_call(self, record: dict[str, Any]) -> None:
        """CostMiddleware recorder: persist one LlmCallRow per model call.
        Fills every NOT NULL column of LlmCallRow. Per-input/output cost is computed
        from the same PricingCache that the middleware already consulted, so the
        ledger and the row stay consistent.
        """
        in_tokens = int(record.get("input_tokens") or 0)
        out_tokens = int(record.get("output_tokens") or 0)
        model = str(record.get("model") or "")
        # Reproduce per-direction cost from the cached price.
        price = self._pricing.get(model) if self._pricing is not None else None
        if price is not None:
            cost_input = (in_tokens / 1000.0) * price.input_per_1k_usd
            cost_output = (out_tokens / 1000.0) * price.output_per_1k_usd
        else:
            cost_input = 0.0
            cost_output = 0.0
        cost_total = float(record.get("cost_usd_total") or (cost_input + cost_output))
        run_id_val = record.get("run_id")
        phase_id_val = record.get("phase_id")
        session_id_val = record.get("interactive_session_id")
        thread_id = (
            f"run:{run_id_val}:phase:{phase_id_val}"
            if run_id_val is not None
            else f"session:{session_id_val}"
        )
        persona_name = str(record.get("persona_name") or "")
        async with self._db.session() as s:
            s.add(
                LlmCallRow(
                    run_id=(str(run_id_val) if run_id_val is not None else None),
                    phase_id=(str(phase_id_val) if phase_id_val is not None else None),
                    interactive_session_id=(
                        str(session_id_val) if session_id_val is not None else None
                    ),
                    thread_id=thread_id,
                    persona_name=persona_name,
                    persona_version=1,
                    model=model,
                    role="main",
                    turn_index=0,
                    input_tokens=in_tokens,
                    output_tokens=out_tokens,
                    cached_tokens=0,
                    reasoning_tokens=0,
                    cost_usd_input=cost_input,
                    cost_usd_output=cost_output,
                    cost_usd_total=cost_total,
                    latency_ms=int(record.get("latency_ms") or 0),
                    status=str(record.get("status") or "ok"),
                    error_code=record.get("error_code"),
                    request_id=None,
                    ts=_now_iso(),
                )
            )
            try:
                await s.commit()
            except Exception:
                await s.rollback()
    async def _persist_run_skeleton(
        self,
        _unused_session: Any,  # kept for caller compatibility — we open own sessions
        run_id: UUID,
        template: WorkflowTemplate,
        bindings: dict[str, Binding],
        repo_path: Path,
        base_branch: str,
        worktree_root: Path,
        requirements_md: str,
    ) -> None:
        template_hash = template.compute_hash()
        now = _now_iso()
        # --- Phase 1: upsert FK targets (committed separately to satisfy FK ordering) ---
        template_id = uuid4()
        async with self._db.session() as s:
            existing_tpl = (
                await s.execute(
                    select(WorkflowTemplateRow).where(WorkflowTemplateRow.hash == template_hash)
                )
            ).scalar_one_or_none()
            if existing_tpl is None:
                s.add(
                    WorkflowTemplateRow(
                        id=str(template_id),
                        name=template.name,
                        version=template.version,
                        hash=template_hash,
                        definition=template.model_dump(by_alias=True),
                        created_at=now,
                    )
                )
            else:
                template_id = UUID(existing_tpl.id)
        persona_ids: dict[str, UUID] = {}
        for role_id, binding in bindings.items():
            persona_hash = binding.persona.compute_hash()
            async with self._db.session() as s:
                existing_persona = (
                    await s.execute(
                        select(AgentPersonaRow).where(AgentPersonaRow.hash == persona_hash)
                    )
                ).scalar_one_or_none()
                if existing_persona is None:
                    persona_id = uuid4()
                    s.add(
                        AgentPersonaRow(
                            id=str(persona_id),
                            name=binding.persona.name,
                            version=binding.persona.version,
                            hash=persona_hash,
                            definition=binding.persona.model_dump(),
                            created_at=now,
                        )
                    )
                else:
                    persona_id = UUID(existing_persona.id)
            persona_ids[role_id] = persona_id
        # --- Phase 2: insert RunRow (FK: workflow_templates — already committed above) ---
        async with self._db.session() as s:
            s.add(
                RunRow(
                    id=str(run_id),
                    template_id=str(template_id),
                    template_hash=template_hash,
                    state=RunState.CREATED.value,
                    repo_path=str(repo_path),
                    base_branch=base_branch,
                    worktree_root=str(worktree_root),
                    created_at=now,
                    updated_at=now,
                )
            )
        # --- Phase 3: insert RunInputRow + RunBindingRow (FK: runs — now committed) ---
        async with self._db.session() as s:
            s.add(
                RunInputRow(
                    id=str(uuid4()),
                    run_id=str(run_id),
                    requirements_md=requirements_md,
                    objective={},
                    extra={},
                    input_hash=sha256(
                        {"requirements": requirements_md, "template_hash": template_hash}
                    ),
                )
            )
            for role_id, binding in bindings.items():
                persona_hash = binding.persona.compute_hash()
                s.add(
                    RunBindingRow(
                        id=str(uuid4()),
                        run_id=str(run_id),
                        role_id=role_id,
                        persona_id=str(persona_ids[role_id]),
                        persona_hash=persona_hash,
                        backend=binding.persona.backend.value,
                        binding_hash=binding.binding_hash,
                    )
                )
    async def _ensure_phase_row(self, run_id: UUID, phase_def: WorkflowPhase) -> UUID:
        async with self._db.session() as s:
            existing = (
                await s.execute(
                    select(RunPhaseRow).where(
                        RunPhaseRow.run_id == str(run_id),
                        RunPhaseRow.phase_key == phase_def.key,
                    )
                )
            ).scalar_one_or_none()
            if existing is not None:
                return UUID(existing.id)
            phase_id = uuid4()
            existing_count = len(
                (
                    await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id)))
                ).all()
            )
            s.add(
                RunPhaseRow(
                    id=str(phase_id),
                    run_id=str(run_id),
                    phase_key=phase_def.key,
                    seq=existing_count,
                    state=RunPhaseState.PENDING.value,
                    attempts=0,
                    started_at=_now_iso(),
                )
            )
            return phase_id
    async def _set_phase_state(self, phase_id: UUID, state: RunPhaseState) -> None:
        async with self._db.session() as s:
            row = await s.get(RunPhaseRow, str(phase_id))
            if row is not None:
                row.state = state.value
                if state in (
                    RunPhaseState.COMPLETED,
                    RunPhaseState.FAILED,
                    RunPhaseState.SKIPPED,
                ):
                    row.ended_at = _now_iso()
    async def _set_run_state(self, run_id: UUID, state: RunState) -> None:
        async with self._db.session() as s:
            row = await s.get(RunRow, str(run_id))
            if row is not None:
                row.state = state.value
                row.updated_at = _now_iso()
                if state in (RunState.COMPLETED, RunState.FAILED, RunState.ABORTED):
                    row.ended_at = _now_iso()
    async def _append_event(
        self,
        run_id: UUID,
        phase_id: UUID | None,
        event_type: RunEventType,
        payload: dict[str, Any],
    ) -> None:
        idem_extra = {
            k: str(v)
            for k, v in payload.items()
            if k in ("phase_key", "attempt", "request_id", "action", "code")
        }
        idem = run_idempotency_key(event_type, run_id, **idem_extra)
        async with self._db.session() as s:
            existing_count = len(
                (
                    await s.execute(select(RunEventRow).where(RunEventRow.run_id == str(run_id)))
                ).all()
            )
            s.add(
                RunEventRow(
                    run_id=str(run_id),
                    phase_id=str(phase_id) if phase_id is not None else None,
                    seq=existing_count + 1,
                    type=event_type.value,
                    payload=payload,
                    idempotency_key=idem,
                    ts=_now_iso(),
                )
            )
            try:
                await s.flush()
            except Exception:
                await s.rollback()
    async def _persist_artifact(
        self,
        run_id: UUID,
        phase_id: UUID,
        path: Path,
        schema_id: str,
        *,
        valid: bool,
        errors: list[Any] | None = None,
    ) -> None:
        try:
            content = path.read_bytes()
        except OSError:
            return
        artifact_hash = sha256({"bytes_len": len(content), "hex_prefix": content[:64].hex()})
        async with self._db.session() as s:
            s.add(
                ArtifactRow(
                    id=str(uuid4()),
                    run_id=str(run_id),
                    phase_id=str(phase_id),
                    path=str(path),
                    schema_id=schema_id,
                    hash=artifact_hash,
                    valid=valid,
                    validation_error=(
                        [{"path": f.path, "message": f.message} for f in errors] if errors else None
                    ),
                    created_at=_now_iso(),
                )
            )
            try:
                await s.flush()
            except Exception:
                await s.rollback()
 # ------------------------------------------------------------------
 # Module-level helpers
 # ------------------------------------------------------------------
 def _now_iso() -> str:
    return datetime.now(UTC).isoformat(timespec="seconds")
 def _render_report_md(report: dict[str, Any]) -> str:
    lines: list[str] = [
        f"# Run {report['runId']}",
        f"**Status**: {report['status']}",
        f"**Template hash**: `{report['templateHash']}`",
        f"**Ended at**: {report['endedAt']}",
        "",
        "## Phases",
    ]
    for p in report["phases"]:
        lines.append(f"- **{p['key']}** — state={p['state']}, attempts={p['attempts']}")
    lines.append("\n## Artifacts")
    for a in report["artifacts"]:
        lines.append(f"- `{a['path']}` (schema={a['schema']}, hash={a['hash'][:16]}...)")
    if report.get("error"):
        lines += ["", "## Error", str(report["error"])]
    return "\n".join(lines) + "\n"
--- a/my-deepagent/src/my_deepagent/governance.py
+++ b/my-deepagent/src/my_deepagent/governance.py
@@ -0,0 +1,41 @@
 """Governance consent for sending user code to external LLM providers."""
 from __future__ import annotations
 import json
 import os
 from datetime import UTC, datetime
 from pathlib import Path
 from .errors import MyDeepAgentError
 def consent_path(data_dir: Path) -> Path:
    return data_dir / "governance-accepted.json"
 def has_consent(data_dir: Path) -> bool:
    return consent_path(data_dir).is_file()
 def record_consent(data_dir: Path) -> None:
    data_dir.mkdir(parents=True, exist_ok=True)
    target = consent_path(data_dir)
    payload = {"accepted_at": datetime.now(UTC).isoformat(timespec="seconds")}
    tmp = target.with_suffix(target.suffix + ".tmp")
    fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
    try:
        os.write(fd, json.dumps(payload, indent=2).encode("utf-8"))
        os.fsync(fd)
    finally:
        os.close(fd)
    os.replace(tmp, target)
 def require_consent(data_dir: Path) -> None:
    if not has_consent(data_dir):
        raise MyDeepAgentError.human_required(
            "governance_not_accepted",
            message="governance consent not recorded",
            recovery_hint="run `mydeepagent init` and accept the data-governance prompt",
        )
--- a/my-deepagent/src/my_deepagent/i18n/init.py
+++ b/my-deepagent/src/my_deepagent/i18n/init.py
@@ -0,0 +1,45 @@
 """Lightweight i18n catalog loader. Two languages (ko, en). Default ko per CTO decision."""
 from __future__ import annotations
 import os
 import tomllib
 from functools import lru_cache
 from pathlib import Path
 from typing import Literal
 Lang = Literal["ko", "en"]
 _CATALOG_DIR = Path(__file__).parent
@lru_cache(maxsize=4)
 def _load(lang: Lang) -> dict[str, dict[str, str]]:
    path = _CATALOG_DIR / f"{lang}.toml"
    if not path.is_file():
        return {}
    with path.open("rb") as f:
        data = tomllib.load(f)
    return {section: dict(entries) for section, entries in data.items()}
 def resolve_lang(default: Lang = "ko") -> Lang:
    env = os.environ.get("MYDEEPAGENT_LANG")
    if env in ("ko", "en"):
        return env  # type: ignore[return-value]
    return default
 def t(key: str, lang: Lang | None = None, **fmt: object) -> str:
    """Translate a key like 'section.key'. Falls back to the key itself if missing."""
    actual_lang = lang or resolve_lang()
    section_name, _, leaf = key.partition(".")
    catalog = _load(actual_lang)
    section = catalog.get(section_name, {})
    template = section.get(leaf, key)
    if fmt:
        try:
            return template.format(**fmt)
        except (KeyError, IndexError):
            return template
    return template
--- a/my-deepagent/src/my_deepagent/i18n/en.toml
+++ b/my-deepagent/src/my_deepagent/i18n/en.toml
@@ -0,0 +1,34 @@
 [init]
 welcome = "Welcome — my-deepagent first-time setup"
 governance_title = "Consent to send code to external LLM providers"
 governance_body = "This tool sends file contents read via read_file and similar tools to external LLM providers (Anthropic, DeepSeek, etc.) through OpenRouter. Each persona declares its provider_origin, and a separate confirmation is shown on first use."
 governance_prompt = "Type 'yes' to agree (any other answer cancels): "
 governance_declined = "Cannot proceed without consent. Exiting."
 api_key_prompt = "OpenRouter API key (input is hidden)"
 api_key_empty = "API key was empty — nothing saved."
 api_key_saved = "Saved to OS keyring."
 doctor_running = "Running environment diagnostics..."
 done = "Setup complete. Start with `mydeepagent run <workflow.yaml>` or `mydeepagent`."
 [login]
 prompt = "Enter {provider} API key (hidden): "
 saved = "{provider} key saved to OS keyring."
 empty = "Empty input. Nothing saved."
 [logout]
 removed = "{provider} key removed from keyring."
 not_found = "{provider} key not found in keyring (already deleted)."
 [keys]
 header = "Registered API keys:"
 entry = "  {provider:20s}  {masked}"
 none = "  (none. Use `mydeepagent login <provider>` to register one.)"
 [doctor]
 header = "Environment diagnostics:"
 ok = "  ok   {name}"
 warn = "  warn {name}  ({detail})"
 fail = "  FAIL {name}  ({detail})"
 [errors]
 no_governance = "Governance consent is missing. Run `mydeepagent init` first."
--- a/my-deepagent/src/my_deepagent/i18n/ko.toml
+++ b/my-deepagent/src/my_deepagent/i18n/ko.toml
@@ -0,0 +1,34 @@
 [init]
 welcome = "환영합니다 — my-deepagent 첫 셋업"
 governance_title = "외부 LLM provider로 코드 전송 동의"
 governance_body = "이 도구는 read_file 등으로 읽은 파일 내용을 OpenRouter를 통해 외부 LLM provider(Anthropic, DeepSeek 등)로 전송합니다. 페르소나마다 provider_origin이 명시되며 첫 사용 시 별도 확인이 다시 한 번 표시됩니다."
 governance_prompt = "동의하시면 'yes' 입력 (그 외 모든 답은 취소): "
 governance_declined = "동의 없이는 사용할 수 없습니다. 종료합니다."
 api_key_prompt = "OpenRouter API key (입력은 가려집니다)"
 api_key_empty = "API key가 비어있어 저장하지 않았습니다."
 api_key_saved = "OS keyring에 저장되었습니다."
 doctor_running = "환경 진단 실행 중..."
 done = "셋업 완료. `mydeepagent run <workflow.yaml>` 또는 `mydeepagent` 로 시작하세요."
 [login]
 prompt = "{provider} API key 입력 (가려짐): "
 saved = "{provider} key가 OS keyring에 저장되었습니다."
 empty = "빈 입력입니다. 저장하지 않았습니다."
 [logout]
 removed = "{provider} key가 keyring에서 삭제되었습니다."
 not_found = "{provider} key가 keyring에 없습니다 (이미 삭제됨)."
 [keys]
 header = "등록된 API key:"
 entry = "  {provider:20s}  {masked}"
 none = "  (없음. `mydeepagent login <provider>` 로 등록하세요.)"
 [doctor]
 header = "환경 진단:"
 ok = "  ok   {name}"
 warn = "  warn {name}  ({detail})"
 fail = "  FAIL {name}  ({detail})"
 [errors]
 no_governance = "거버넌스 동의가 없습니다. `mydeepagent init` 를 먼저 실행하세요."
--- a/my-deepagent/src/my_deepagent/keys.py
+++ b/my-deepagent/src/my_deepagent/keys.py
@@ -0,0 +1,48 @@
 """OS keyring wrapper for storing provider API keys. Service name: 'my-deepagent'."""
 from __future__ import annotations
 from typing import Final
 import keyring as keyring
 _SERVICE: Final[str] = "my-deepagent"
 def _make_username(provider: str) -> str:
    return f"{provider}_api_key"
 def get_api_key(provider: str) -> str | None:
    """Return the stored key for ``provider``, or None if absent."""
    return keyring.get_password(_SERVICE, _make_username(provider))
 def set_api_key(provider: str, value: str) -> None:
    """Persist ``value`` in the OS keyring under provider's slot."""
    keyring.set_password(_SERVICE, _make_username(provider), value)
 def delete_api_key(provider: str) -> bool:
    """Remove the stored key. Returns True if a key existed and was removed."""
    if keyring.get_password(_SERVICE, _make_username(provider)) is None:
        return False
    keyring.delete_password(_SERVICE, _make_username(provider))
    return True
 def list_providers() -> list[str]:
    """Return the providers we recognise (we don't enumerate keyring contents).
    Callers iterate this list and call get_api_key for each to detect presence.
    """
    return ["openrouter", "anthropic", "openai", "google", "langsmith"]
 def mask(value: str | None) -> str:
    """Mask an API key for display: 'sk-or-v1-...c2e7' or '(not set)' if None."""
    if not value:
        return "(not set)"
    if len(value) <= 8:
        return "***"
    return f"{value[:8]}...{value[-4:]}"
--- a/my-deepagent/src/my_deepagent/logging.py
+++ b/my-deepagent/src/my_deepagent/logging.py
@@ -0,0 +1,88 @@
 """structlog configuration with built-in secret scrubbing.
 Scrubs known API key patterns and bearer tokens from all log output (both rich
 pretty-printed and JSON). Apply ``configure_logging(config)`` once at process
 start (called from CLI entry points).
 """
 from __future__ import annotations
 import logging
 import re
 import sys
 from typing import Any
 import structlog
 # Secret patterns. Order matters: more specific first.
 _SECRET_PATTERNS: tuple[re.Pattern[str], ...] = tuple(
    re.compile(p)
    for p in (
        r"sk-or-[A-Za-z0-9_-]{20,}",  # OpenRouter
        r"sk-ant-[A-Za-z0-9_-]{20,}",  # Anthropic
        r"sk-proj-[A-Za-z0-9_-]{20,}",  # OpenAI project keys
        r"sk-[A-Za-z0-9_-]{30,}",  # OpenAI (general)
        r"lsv2_pt_[A-Za-z0-9_-]{20,}",  # LangSmith personal token
        r"lsv2_[A-Za-z0-9_-]{30,}",  # LangSmith (other)
        r"Bearer\s+[A-Za-z0-9._-]{20,}",  # generic bearer
        r"ghp_[A-Za-z0-9]{30,}",  # GitHub PAT
        r"glpat-[A-Za-z0-9-]{20,}",  # GitLab PAT
    )
 )
 _REDACTED = "[REDACTED]"
 def scrub(text: str) -> str:
    """Replace secrets in ``text`` with ``[REDACTED]``."""
    for pat in _SECRET_PATTERNS:
        text = pat.sub(_REDACTED, text)
    return text
 def scrub_value(value: Any) -> Any:
    """Recursively scrub strings inside dicts/lists/tuples/sets. Non-strings pass through."""
    if isinstance(value, str):
        return scrub(value)
    if isinstance(value, dict):
        return {k: scrub_value(v) for k, v in value.items()}
    if isinstance(value, list):
        return [scrub_value(v) for v in value]
    if isinstance(value, tuple):
        return tuple(scrub_value(v) for v in value)
    if isinstance(value, set):
        return {scrub_value(v) for v in value}
    return value
 def _scrub_processor(_logger: Any, _method: str, event_dict: dict[str, Any]) -> dict[str, Any]:
    """structlog processor: scrub every value in the event dict."""
    return {k: scrub_value(v) for k, v in event_dict.items()}
 def configure_logging(level: str = "info", json_output: bool = False) -> None:
    """Configure structlog with secret-scrubbing on top of the chosen renderer."""
    log_level = getattr(logging, level.upper(), logging.INFO)
    logging.basicConfig(level=log_level, format="%(message)s", stream=sys.stderr)
    processors: list[Any] = [
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True),
        _scrub_processor,
    ]
    if json_output:
        processors.append(structlog.processors.JSONRenderer())
    else:
        processors.append(structlog.dev.ConsoleRenderer(colors=True))
    structlog.configure(
        processors=processors,
        wrapper_class=structlog.make_filtering_bound_logger(log_level),
        logger_factory=structlog.PrintLoggerFactory(file=sys.stderr),
        cache_logger_on_first_use=True,
    )
 def get_logger(name: str | None = None) -> Any:
    return structlog.get_logger(name) if name else structlog.get_logger()
--- a/my-deepagent/src/my_deepagent/middleware/artifact_watcher.py
+++ b/my-deepagent/src/my_deepagent/middleware/artifact_watcher.py
@@ -0,0 +1,115 @@
 """ArtifactWatcherMiddleware: detect write_file / edit_file calls targeting expected artifact."""
 from __future__ import annotations
 import asyncio
 from collections.abc import Awaitable, Callable
 from pathlib import Path
 from typing import Any
 from langchain.agents.middleware import AgentMiddleware, ToolCallRequest
 from langchain_core.messages import ToolMessage
 # Async callback fired when write_file/edit_file targets the expected path.
 # Args: (absolute_path_str, content_str)
 ArtifactWriteCallback = Callable[[str, str], Awaitable[None]]
 # Tool names that count as "write the artifact"
 _WRITE_TOOL_NAMES: frozenset[str] = frozenset({"write_file", "edit_file"})
 # Candidate argument key names for the file path, in priority order
 _PATH_ARG_KEYS: tuple[str, ...] = ("file_path", "path", "file")
 # Candidate argument key names for the file content
 _CONTENT_ARG_KEYS: tuple[str, ...] = ("content", "text", "new_string")
 class ArtifactWatcherMiddleware(AgentMiddleware[Any, None, Any]):
    """Intercepts write_file / edit_file tool calls and fires a callback when the
    targeted path matches *expected_path* (after resolution to an absolute path).
    The middleware never suppresses or modifies the tool call — it always forwards
    to ``handler``.  The callback runs *after* the tool succeeds; any exception raised
    inside the callback is caught and silently discarded so it cannot break the agent
    loop.
    """
    def __init__(
        self,
        expected_path: Path,
        on_artifact_written: ArtifactWriteCallback,
    ) -> None:
        super().__init__()
        self._expected = expected_path.resolve()
        self._callback = on_artifact_written
        self._notified = asyncio.Event()
        self._content: str | None = None
    # ------------------------------------------------------------------
    # Public helpers
    # ------------------------------------------------------------------
    @property
    def notified(self) -> asyncio.Event:
        """Set once the expected artifact has been written."""
        return self._notified
    @property
    def content(self) -> str | None:
        """Content string passed to the write/edit tool, or None if not yet written."""
        return self._content
    # ------------------------------------------------------------------
    # AgentMiddleware interface
    # ------------------------------------------------------------------
    async def awrap_tool_call(
        self,
        request: ToolCallRequest,
        handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Any]],
    ) -> ToolMessage | Any:
        result = await handler(request)
        tool_call = request.tool_call  # ToolCall TypedDict: {"name": str, "args": dict, "id": ...}
        name: str = tool_call["name"]
        if name in _WRITE_TOOL_NAMES:
            args: dict[str, Any] = dict(tool_call["args"] or {})
            path_str = self._extract_path(args)
            if path_str:
                resolved = self._resolve_path(path_str)
                if resolved == self._expected:
                    content = self._extract_content(args)
                    self._content = content
                    self._notified.set()
                    try:
                        await self._callback(str(resolved), content)
                    except Exception:  # noqa: S110
                        pass  # callback must not break agent loop
        return result
    # ------------------------------------------------------------------
    # Private helpers
    # ------------------------------------------------------------------
    def _resolve_path(self, path_str: str) -> Path:
        """Resolve a possibly-relative path to absolute using expected_path's parent as base."""
        p = Path(path_str)
        if p.is_absolute():
            return p.resolve()
        # Relative paths are anchored to the expected artifact's directory
        return (self._expected.parent / p).resolve()
    @staticmethod
    def _extract_path(args: dict[str, Any]) -> str:
        for key in _PATH_ARG_KEYS:
            val = args.get(key)
            if isinstance(val, str) and val:
                return val
        return ""
    @staticmethod
    def _extract_content(args: dict[str, Any]) -> str:
        for key in _CONTENT_ARG_KEYS:
            val = args.get(key)
            if isinstance(val, str):
                return val
        return ""
--- a/my-deepagent/src/my_deepagent/middleware/audit.py
+++ b/my-deepagent/src/my_deepagent/middleware/audit.py
@@ -1,66 +1,70 @@
-"""AuditToolMiddleware: capture every tool call for audit log + DB.
+"""AuditToolMiddleware: capture every tool call to audit.jsonl + tool_calls DB row."""
 Records: name, args, result/error, duration.
 """
 from __future__ import annotations
 import time
 from collections.abc import Awaitable, Callable
 from typing import Any
 from uuid import UUID
 from langchain.agents.middleware import AgentMiddleware
 AuditRecorder = Callable[[dict[str, Any]], Awaitable[None]]
 class AuditToolMiddleware(AgentMiddleware):
-    """Record every tool invocation for the audit log and DB sink (Step 8)."""
+    """Record every tool invocation for the audit log and DB sink.
    Accepts two optional recorders:
      - ``file_recorder``: JSONL file at {state_dir}/audit.jsonl (append-only)
      - ``db_recorder``: tool_calls DB row (optional, wired in Step 12+)
    For backward compatibility, ``recorder`` is accepted as an alias for
    ``file_recorder`` (used by pre-Step-11 unit tests).
    """
    def __init__(
        self,
        run_id: UUID | None = None,
        phase_id: UUID | None = None,
        interactive_session_id: UUID | None = None,
-        recorder: Any | None = None,
+        file_recorder: AuditRecorder | None = None,
        db_recorder: AuditRecorder | None = None,
        # backward-compat alias — maps to file_recorder
        recorder: AuditRecorder | None = None,
    ) -> None:
        super().__init__()
        self.run_id = run_id
        self.phase_id = phase_id
        self.interactive_session_id = interactive_session_id
-        self.recorder = recorder
+        # ``recorder`` is a pre-Step-11 alias for file_recorder
        self.file_recorder: AuditRecorder | None = (
            file_recorder if file_recorder is not None else recorder
        )
        self.db_recorder = db_recorder
    async def awrap_tool_call(self, request: Any, handler: Any) -> Any:
        started = time.perf_counter()
        # ToolCallRequest exposes tool_call dict with 'name' and 'args'
        tool_call = getattr(request, "tool_call", {}) or {}
        name: str = tool_call.get("name", "unknown") if isinstance(tool_call, dict) else "unknown"
        args: dict[str, Any] = (
            tool_call.get("args", {}) if isinstance(tool_call, dict) else {}
        ) or {}
        error: str | None = None
        result: Any = None
        try:
            result = await handler(request)
            return result
        except Exception as e:
-            await self._record(name, args, None, type(e).__name__, started)
+            error = type(e).__name__
            raise
-        await self._record(name, args, result, None, started)
+        finally:
-        return result
+            serializable_result: str | int | float | bool | dict[str, Any] | list[Any] | None
-
+            if isinstance(result, (str, int, float, bool, dict, list)) or result is None:
-    async def _record(
+                serializable_result = result
-        self,
+            else:
-        name: str,
+                serializable_result = str(result)
-        args: dict[str, Any],
+            record: dict[str, Any] = {
        result: Any,
        error: str | None,
        started: float,
    ) -> None:
        if self.recorder is None:
            return
        serializable_result: str | int | float | bool | dict[str, Any] | list[Any] | None
        if isinstance(result, (str, int, float, bool, dict, list)) or result is None:
            serializable_result = result
        else:
            serializable_result = str(result)
        await self.recorder(
            {
                "tool_name": name,
                "args": args,
                "result": serializable_result,
@@ -70,4 +74,13 @@ class AuditToolMiddleware(AgentMiddleware):
                "phase_id": self.phase_id,
                "interactive_session_id": self.interactive_session_id,
            }
-        )
+            if self.file_recorder is not None:
                try:
                    await self.file_recorder(record)
                except Exception:  # noqa: S110 — never let audit failure break the tool
                    pass
            if self.db_recorder is not None:
                try:
                    await self.db_recorder(record)
                except Exception:  # noqa: S110
                    pass
--- a/my-deepagent/src/my_deepagent/middleware/cost.py
+++ b/my-deepagent/src/my_deepagent/middleware/cost.py
@@ -1,4 +1,4 @@
-"""CostMiddleware: capture every LLM call's usage and accumulate cost into the SQLite ledger."""
+"""CostMiddleware: per-LLM-call cost tracking + optional budget enforcement."""
 from __future__ import annotations
@@ -6,15 +6,17 @@ import time
 from typing import Any
 from uuid import UUID
-from langchain.agents.middleware import AgentMiddleware
+from langchain.agents.middleware import AgentMiddleware, ToolCallRequest
 from langchain_core.messages import ToolMessage
 from ..budget import BudgetTracker
 from ..monitoring.pricing import PricingCache
 class CostMiddleware(AgentMiddleware):
-    """Wrap every model call. Compute cost from usage_metadata and persist.
+    """Wrap every model call. Compute cost from usage_metadata and persist via recorder + budget.
-    Step 8 wires the DB writer via the recorder callback.
+    Step 8 wires the BudgetTracker via the budget_tracker parameter.
    """
    def __init__(
@@ -23,18 +25,38 @@ class CostMiddleware(AgentMiddleware):
        model_name: str,
        run_id: UUID | None = None,
        phase_id: UUID | None = None,
        interactive_session_id: UUID | None = None,
        persona_name: str | None = None,
-        recorder: Any | None = None,  # callable(record) -> Awaitable[None] for DB sink (Step 8)
+        recorder: Any | None = None,  # async callable(record) -> Awaitable[None] for DB sink
        budget_tracker: BudgetTracker | None = None,
    ) -> None:
        super().__init__()
        self.pricing = pricing
        self.model_name = model_name
        self.run_id = run_id
        self.phase_id = phase_id
        self.interactive_session_id = interactive_session_id
        self.persona_name = persona_name
        self.recorder = recorder
        self.budget = budget_tracker
    async def awrap_tool_call(
        self,
        request: ToolCallRequest,
        handler: Any,
    ) -> ToolMessage | Any:
        """Pass tool calls through without modification."""
        return await handler(request)
    async def awrap_model_call(self, request: Any, handler: Any) -> Any:
        # Pre-call: ask budget tracker if estimated cost is allowed
        if self.budget is not None:
            estimated = self.pricing.compute_cost(self.model_name, 4000, 1500)
            await self.budget.assert_can_call(
                run_id=self.run_id,
                persona_name=self.persona_name,
                estimated_cost_usd=estimated,
            )
        started = time.perf_counter()
        try:
            response = await handler(request)
@@ -47,9 +69,27 @@ class CostMiddleware(AgentMiddleware):
                error_code=type(e).__name__,
            )
            raise
-        usage = getattr(response, "usage_metadata", None) or {}
+        # Token usage shows up in different places depending on the model integration.
-        in_tokens = int(usage.get("input_tokens", 0) or 0)
+        # langchain-openai usually fills `usage_metadata`, but for streamed responses
-        out_tokens = int(usage.get("output_tokens", 0) or 0)
+        # or some OpenAI-compatible endpoints (OpenRouter forwarding DeepSeek/etc.)
        # the count lands in `response_metadata.token_usage` with OpenAI keys
        # (`prompt_tokens` / `completion_tokens`).
        usage_meta = getattr(response, "usage_metadata", None) or {}
        response_meta = getattr(response, "response_metadata", None) or {}
        token_usage = response_meta.get("token_usage") if isinstance(response_meta, dict) else None
        token_usage = token_usage or {}
        in_tokens = int(
            usage_meta.get("input_tokens")
            or token_usage.get("prompt_tokens")
            or token_usage.get("input_tokens")
            or 0
        )
        out_tokens = int(
            usage_meta.get("output_tokens")
            or token_usage.get("completion_tokens")
            or token_usage.get("output_tokens")
            or 0
        )
        await self._record(
            input_tokens=in_tokens,
            output_tokens=out_tokens,
@@ -57,6 +97,14 @@ class CostMiddleware(AgentMiddleware):
            status="ok",
            error_code=None,
        )
        # Post-call: record actual cost in budget ledger
        if self.budget is not None and (in_tokens or out_tokens):
            actual = self.pricing.compute_cost(self.model_name, in_tokens, out_tokens)
            await self.budget.record(
                run_id=self.run_id,
                persona_name=self.persona_name,
                actual_cost_usd=actual,
            )
        return response
    async def _record(
--- a/my-deepagent/src/my_deepagent/monitoring/cost_estimator.py
+++ b/my-deepagent/src/my_deepagent/monitoring/cost_estimator.py
@@ -0,0 +1,70 @@
 """Estimate per-phase cost using pricing matrix + crude token heuristic.
 For accurate billing, use the actual usage_metadata after the call (see CostMiddleware).
 This module is for the *preview* shown before ``mydeepagent run`` starts.
 """
 from __future__ import annotations
 from dataclasses import dataclass
 from typing import TYPE_CHECKING
 from ..persona import Persona
 from ..workflow import WorkflowPhase, WorkflowTemplate
 from .pricing import PricingCache
 if TYPE_CHECKING:
    from ..binding import Binding
@dataclass(frozen=True)
 class PhaseCostEstimate:
    phase_key: str
    persona_name: str
    model: str
    estimated_input_tokens: int
    estimated_output_tokens: int
    estimated_cost_usd: float
@dataclass(frozen=True)
 class WorkflowCostEstimate:
    phases: list[PhaseCostEstimate]
    total_usd: float
 _DEFAULT_INPUT_TOKENS = 4000  # generous: instructions + context + prior artifacts
 _DEFAULT_OUTPUT_TOKENS = 1500  # bounded by max_tokens; we use persona max_tokens if set
 def estimate_phase(
    phase: WorkflowPhase,
    persona: Persona,
    pricing: PricingCache,
 ) -> PhaseCostEstimate:
    """Estimate the cost of a single phase based on persona model and default token counts."""
    input_tokens = _DEFAULT_INPUT_TOKENS
    output_tokens = int(persona.model_params.get("max_tokens", _DEFAULT_OUTPUT_TOKENS))
    cost = pricing.compute_cost(persona.model, input_tokens, output_tokens)
    return PhaseCostEstimate(
        phase_key=phase.key,
        persona_name=f"{persona.name}@{persona.version}",
        model=persona.model,
        estimated_input_tokens=input_tokens,
        estimated_output_tokens=output_tokens,
        estimated_cost_usd=cost,
    )
 def estimate_workflow(
    template: WorkflowTemplate,
    bindings: dict[str, Binding],
    pricing: PricingCache,
 ) -> WorkflowCostEstimate:
    """Estimate the total cost of all phases in a workflow template."""
    phases: list[PhaseCostEstimate] = []
    for phase in template.phases:
        binding = bindings[phase.role]
        phases.append(estimate_phase(phase, binding.persona, pricing))
    total = sum(p.estimated_cost_usd for p in phases)
    return WorkflowCostEstimate(phases=phases, total_usd=total)
--- a/my-deepagent/src/my_deepagent/recovery.py
+++ b/my-deepagent/src/my_deepagent/recovery.py
@@ -0,0 +1,159 @@
 """Crash recovery: sweep non-terminal runs at startup and mark them as failed.
 This v0.1.0 implementation is conservative — runs that were mid-flight at the previous
 process death are *not* resumed automatically. They are marked ``failed`` with a
 synthesized ``run.failed`` event so the active-run uniqueness slot is freed and the
 user can re-run if desired. Real Temporal-style resume is deferred to v0.2 or beyond.
 """
 from __future__ import annotations
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from uuid import UUID
 from sqlalchemy import func, select
 from sqlalchemy.dialects.sqlite import insert as sqlite_insert
 from sqlalchemy.ext.asyncio import AsyncSession
 from .enums import RunPhaseState, RunState
 from .persistence.db import Database
 from .persistence.models import RunEventRow, RunPhaseRow, RunRow
 from .run_event import RunEventType, run_idempotency_key
 _NON_TERMINAL_RUN_STATES: frozenset[str] = frozenset(
    {
        RunState.CREATED.value,
        RunState.BOUND.value,
        RunState.PLANNING.value,
        RunState.AWAITING_APPROVAL.value,
        RunState.EXECUTING.value,
        RunState.PAUSED.value,
    }
 )
 _NON_TERMINAL_PHASE_STATES: frozenset[str] = frozenset(
    {
        RunPhaseState.PENDING.value,
        RunPhaseState.RUNNING.value,
        RunPhaseState.AWAITING_ARTIFACT.value,
        RunPhaseState.VALIDATING.value,
        RunPhaseState.AWAITING_APPROVAL.value,
    }
 )
 _FAILED_REASON = "process_restart_unrecovered"
@dataclass(frozen=True)
 class SweepReport:
    """Outcome of one recovery sweep."""
    failed_runs: tuple[UUID, ...]
    failed_phases: tuple[UUID, ...]
    @property
    def total(self) -> int:
        return len(self.failed_runs) + len(self.failed_phases)
 async def sweep_orphan_runs(db: Database) -> SweepReport:
    """Mark non-terminal runs/phases as ``failed`` and emit run.failed events.
    Idempotent: rerunning when no orphans exist returns an empty SweepReport.
    Uses the existing ``run_events.idempotency_key`` UNIQUE constraint so duplicate
    sweeps in the same process don't insert duplicate events.
    """
    failed_runs: list[UUID] = []
    failed_phases: list[UUID] = []
    now = _now_iso()
    async with db.session() as s:
        rows = (
            (await s.execute(select(RunRow).where(RunRow.state.in_(_NON_TERMINAL_RUN_STATES))))
            .scalars()
            .all()
        )
        for run in rows:
            run_uuid = UUID(run.id)
            run.state = RunState.FAILED.value
            run.ended_at = now
            run.updated_at = now
            run.final_report_path = None
            failed_runs.append(run_uuid)
            # Append a single synthesized run.failed event (idempotent).
            await _append_event_idempotent(
                s,
                run_id=run.id,
                event_type=RunEventType.RUN_FAILED,
                payload={"reason": _FAILED_REASON},
                extra_for_key={"reason": _FAILED_REASON},
            )
            # Cascade orphan phases.
            phase_rows = (
                (
                    await s.execute(
                        select(RunPhaseRow)
                        .where(RunPhaseRow.run_id == run.id)
                        .where(RunPhaseRow.state.in_(_NON_TERMINAL_PHASE_STATES))
                    )
                )
                .scalars()
                .all()
            )
            for ph in phase_rows:
                ph.state = RunPhaseState.FAILED.value
                ph.ended_at = now
                failed_phases.append(UUID(ph.id))
        await s.commit()
    return SweepReport(
        failed_runs=tuple(failed_runs),
        failed_phases=tuple(failed_phases),
    )
 async def _append_event_idempotent(
    s: AsyncSession,
    *,
    run_id: str,
    event_type: RunEventType,
    payload: dict[str, object],
    extra_for_key: dict[str, object] | None = None,
 ) -> None:
    """Append a run_events row using ON CONFLICT DO NOTHING on idempotency_key."""
    extra = {k: str(v) for k, v in (extra_for_key or {}).items()}
    key = run_idempotency_key(event_type, UUID(run_id), **extra)
    # Compute next seq.
    next_seq = (
        await s.execute(
            select(func.coalesce(func.max(RunEventRow.seq), 0) + 1).where(
                RunEventRow.run_id == run_id
            )
        )
    ).scalar_one()
    stmt = (
        sqlite_insert(RunEventRow)
        .values(
            run_id=run_id,
            phase_id=None,
            seq=int(next_seq),
            type=event_type.value,
            payload=payload,
            idempotency_key=key,
            ts=_now_iso(),
        )
        .on_conflict_do_nothing(index_elements=["run_id", "idempotency_key"])
    )
    await s.execute(stmt)
 def _now_iso() -> str:
    return datetime.now(UTC).isoformat(timespec="seconds")
--- a/my-deepagent/src/my_deepagent/run_event.py
+++ b/my-deepagent/src/my_deepagent/run_event.py
@@ -1 +1,39 @@
-"""Run event types for streaming progress. Implemented in Step 4."""
+"""Run event types + idempotency key generation."""
 from __future__ import annotations
 from enum import StrEnum
 from uuid import UUID
 class RunEventType(StrEnum):
    RUN_CREATED = "run.created"
    RUN_STARTED = "run.started"
    RUN_PAUSED = "run.paused"
    RUN_RESUMED = "run.resumed"
    RUN_COMPLETED = "run.completed"
    RUN_FAILED = "run.failed"
    RUN_ABORTED = "run.aborted"
    PHASE_STARTED = "phase.started"
    PHASE_COMPLETED = "phase.completed"
    PHASE_FAILED = "phase.failed"
    PHASE_SKIPPED = "phase.skipped"
    PROMPT_SENT = "prompt.sent"
    PROMPT_REPAIRED = "prompt.repaired"
    ARTIFACT_EXPECTED = "artifact.expected"
    ARTIFACT_VALIDATED = "artifact.validated"
    ARTIFACT_INVALID = "artifact.invalid"
    ARTIFACT_TIMEOUT = "artifact.timeout"
    APPROVAL_REQUESTED = "approval.requested"
    APPROVAL_RESOLVED = "approval.resolved"
 def run_idempotency_key(event_type: RunEventType, run_id: UUID, **extra: object) -> str:
    """Deterministic idempotency key per plan v2.0 §13.1.
    Key format: "<event_type>:<run_id>[:<k>=<v>...]" with extra keys sorted ascending.
    """
    parts: list[str] = [event_type.value, str(run_id)]
    for k in sorted(extra):
        parts.append(f"{k}={extra[k]}")
    return ":".join(parts)
--- a/my-deepagent/src/my_deepagent/secrets.py
+++ b/my-deepagent/src/my_deepagent/secrets.py
@@ -0,0 +1,28 @@
 """Cross-cutting secret resolution. Tries config -> env -> keyring -> error."""
 from __future__ import annotations
 import os
 from .config import Config
 from .errors import MyDeepAgentError
 from .keys import get_api_key
 def resolve_openrouter_api_key(config: Config) -> str:
    """Resolve the OpenRouter API key with priority: config -> env -> keyring -> error."""
    if config.openrouter_api_key:
        return config.openrouter_api_key
    env_key = os.environ.get("MYDEEPAGENT_OPENROUTER_API_KEY") or os.environ.get(
        "OPENROUTER_API_KEY"
    )
    if env_key:
        return env_key
    kr_key = get_api_key("openrouter")
    if kr_key:
        return kr_key
    raise MyDeepAgentError.human_required(
        "backend_auth_failed",
        message="OpenRouter API key is not configured",
        recovery_hint="run `mydeepagent login openrouter` to register one in the OS keyring",
    )
--- a/my-deepagent/src/my_deepagent/session.py
+++ b/my-deepagent/src/my_deepagent/session.py
@@ -11,7 +11,6 @@ Connects:
 from __future__ import annotations
 import os
 from pathlib import Path
 from typing import Any, Literal
 from uuid import UUID
@@ -28,6 +27,7 @@ from langchain_openai import ChatOpenAI
 from .config import Config
 from .errors import MyDeepAgentError
 from .persona import FilesystemPermissionSpec, Persona, PersonaSubagent
 from .secrets import resolve_openrouter_api_key as _resolve_openrouter_api_key_impl
 DEFAULT_DENY_PATHS: tuple[str, ...] = (
    "/.env*",
@@ -125,24 +125,13 @@ def _subagent_to_dict(sub: PersonaSubagent) -> SubAgent:
 def _resolve_openrouter_api_key(config: Config) -> str:
-    """Pull the OpenRouter API key from config -> env -> error.
+    """Pull the OpenRouter API key from config -> env -> keyring -> error.
-    Priority: config.openrouter_api_key -> MYDEEPAGENT_OPENROUTER_API_KEY -> OPENROUTER_API_KEY.
+    Delegates to secrets.resolve_openrouter_api_key for full priority chain.
    Priority: config.openrouter_api_key -> MYDEEPAGENT_OPENROUTER_API_KEY ->
              OPENROUTER_API_KEY -> OS keyring -> error.
    """
-    if config.openrouter_api_key:
+    return _resolve_openrouter_api_key_impl(config)
        return config.openrouter_api_key
    env_key = os.environ.get("MYDEEPAGENT_OPENROUTER_API_KEY") or os.environ.get(
        "OPENROUTER_API_KEY"
    )
    if env_key:
        return env_key
    raise MyDeepAgentError.human_required(
        "backend_auth_failed",
        message="OpenRouter API key is not configured",
        recovery_hint=(
            "set MYDEEPAGENT_OPENROUTER_API_KEY in .env or run `mydeepagent login openrouter`"
        ),
    )
 def resolve_model_instance(
@@ -258,7 +247,19 @@ def build_agent(
        ]
        kwargs["permissions"] = permissions
-    if persona.allowed_tools:
+    # deepagents 0.6.x: passing `tools` as a string list to create_deep_agent() triggers
    # SubAgentMiddleware._get_subagents() → langchain create_agent() → ToolNode, which
    # iterates the LocalShellBackend tools. Some of those tools are raw async functions
    # (not StructuredTool instances), causing:
    #   AttributeError: 'function' object has no attribute 'name'
    # Workaround: skip `tools` kwarg for local_shell backend. deepagents exposes all
    # backend-default tools (read_file, write_file, glob, grep, ls, execute, write_todos)
    # to the LLM by default; SafetyShellMiddleware enforces path safety and blocks
    # destructive-command execution regardless of which tools the LLM attempts to call.
    # For non-local_shell backends (state, filesystem, composite), `tools` is passed
    # through normally since those backends return proper StructuredTool objects.
    use_tools_kwarg = persona.deepagents_backend != "local_shell"
    if use_tools_kwarg and persona.allowed_tools:
        kwargs["tools"] = list(persona.allowed_tools)
    if subagents:
        kwargs["subagents"] = subagents
--- a/my-deepagent/src/my_deepagent/slash.py
+++ b/my-deepagent/src/my_deepagent/slash.py
@@ -1 +1,61 @@
-"""Slash command registry and dispatcher. Implemented in Step 10."""
+"""Parse and dispatch slash commands inside the interactive REPL.
 Slash commands are recognized by a leading '/'; everything else is forwarded to the agent.
 """
 from __future__ import annotations
 from collections.abc import Awaitable, Callable
 from dataclasses import dataclass
@dataclass(frozen=True)
 class SlashParsed:
    """A parsed slash command. ``raw`` is the original token after the slash."""
    name: str
    args: tuple[str, ...]
    raw: str
 def parse_slash(line: str) -> SlashParsed | None:
    """Return a SlashParsed if ``line`` starts with '/', else None."""
    if not line.startswith("/"):
        return None
    body = line[1:].strip()
    if not body:
        return SlashParsed(name="", args=(), raw="")
    parts = body.split()
    return SlashParsed(name=parts[0].lower(), args=tuple(parts[1:]), raw=body)
 SlashHandler = Callable[[SlashParsed], Awaitable[bool]]
 """A handler returns False to keep the REPL alive, True to exit it."""
 class SlashRegistry:
    """Map slash command names to async handlers."""
    def __init__(self) -> None:
        self._handlers: dict[str, SlashHandler] = {}
        self._help: dict[str, str] = {}
    def register(self, name: str, handler: SlashHandler, *, help: str = "") -> None:
        self._handlers[name.lower()] = handler
        if help:
            self._help[name.lower()] = help
    async def dispatch(self, cmd: SlashParsed) -> bool:
        if cmd.name in self._handlers:
            return await self._handlers[cmd.name](cmd)
        return False  # unknown → caller decides
    @property
    def names(self) -> list[str]:
        return sorted(self._handlers)
    def help_for(self, name: str) -> str:
        return self._help.get(name.lower(), "")
    def all_help(self) -> list[tuple[str, str]]:
        return [(n, self._help.get(n, "")) for n in self.names]
--- a/my-deepagent/src/my_deepagent/tui/approval.py
+++ b/my-deepagent/src/my_deepagent/tui/approval.py
@@ -1 +1,53 @@
-"""TUI approval dialog for human-in-the-loop actions. Implemented in Step 7."""
+"""TUI approval prompt: display phase result and ask for approve/reject/request_changes/abort."""
 from __future__ import annotations
 import typer
 from rich.console import Console
 from ..enums import ApprovalDecisionAction
 _CONSOLE = Console()
 _CHOICE_MAP: dict[str, ApprovalDecisionAction] = {
    "approve": ApprovalDecisionAction.APPROVE,
    "a": ApprovalDecisionAction.APPROVE,
    "reject": ApprovalDecisionAction.REJECT,
    "r": ApprovalDecisionAction.REJECT,
    "request_changes": ApprovalDecisionAction.REQUEST_CHANGES,
    "c": ApprovalDecisionAction.REQUEST_CHANGES,
    "abort": ApprovalDecisionAction.ABORT,
    "x": ApprovalDecisionAction.ABORT,
 }
 async def cli_approval_callback(
    payload: dict[str, object],
    gates: list[str],
 ) -> ApprovalDecisionAction:
    """Display the phase result and prompt the user for an approval decision.
    Valid inputs (case-insensitive):
      approve / a   → APPROVE
      reject  / r   → REJECT
      request_changes / c → REQUEST_CHANGES
      abort   / x   → ABORT
    Any unrecognised input defaults to REJECT.
    """
    _CONSOLE.print()
    _CONSOLE.print(f"[bold cyan]Approval required[/] — gates: {', '.join(gates) or '(none)'}")
    _CONSOLE.print(f"  phase: {payload.get('phase_key')}")
    _CONSOLE.print(f"  artifact: {payload.get('artifact_path')}")
    _CONSOLE.print()
    raw = (
        typer.prompt(
            "Decision [approve / reject / request_changes / abort]",
            default="approve",
        )
        .strip()
        .lower()
    )
    return _CHOICE_MAP.get(raw, ApprovalDecisionAction.REJECT)
--- a/my-deepagent/tests/integration/test_artifact_watcher.py
+++ b/my-deepagent/tests/integration/test_artifact_watcher.py
@@ -0,0 +1,140 @@
 """Tests for ArtifactWatcherMiddleware: write_file / edit_file detection."""
 from __future__ import annotations
 from pathlib import Path
 from typing import Any
 from unittest.mock import AsyncMock, MagicMock
 import pytest
 from my_deepagent.middleware.artifact_watcher import ArtifactWatcherMiddleware
 def _make_request(tool_name: str, args: dict[str, Any]) -> MagicMock:
    """Create a minimal ToolCallRequest-like mock."""
    request = MagicMock()
    request.tool_call = {"name": tool_name, "args": args, "id": "test-id"}
    return request
@pytest.mark.asyncio
 async def test_write_file_matching_path_triggers_callback(tmp_path: Path) -> None:
    """write_file targeting expected_path fires the callback and sets notified event."""
    expected = tmp_path / "artifact.json"
    received: list[tuple[str, str]] = []
    async def _cb(path: str, content: str) -> None:
        received.append((path, content))
    watcher = ArtifactWatcherMiddleware(expected, _cb)
    handler = AsyncMock(return_value=MagicMock())
    request = _make_request("write_file", {"file_path": str(expected), "content": '{"ok": true}'})
    await watcher.awrap_tool_call(request, handler)
    assert watcher.notified.is_set()
    assert len(received) == 1
    assert received[0][0] == str(expected)
    assert received[0][1] == '{"ok": true}'
    assert watcher.content == '{"ok": true}'
@pytest.mark.asyncio
 async def test_edit_file_matching_path_triggers_callback(tmp_path: Path) -> None:
    """edit_file targeting expected_path also fires the callback."""
    expected = tmp_path / "spec.json"
    received: list[str] = []
    async def _cb(path: str, _content: str) -> None:
        received.append(path)
    watcher = ArtifactWatcherMiddleware(expected, _cb)
    handler = AsyncMock(return_value=MagicMock())
    request = _make_request("edit_file", {"file_path": str(expected), "new_string": "hello"})
    await watcher.awrap_tool_call(request, handler)
    assert watcher.notified.is_set()
    assert len(received) == 1
@pytest.mark.asyncio
 async def test_write_file_different_path_does_not_trigger(tmp_path: Path) -> None:
    """write_file targeting a different path does NOT fire the callback."""
    expected = tmp_path / "artifact.json"
    other = tmp_path / "other.json"
    received: list[str] = []
    async def _cb(path: str, _content: str) -> None:
        received.append(path)
    watcher = ArtifactWatcherMiddleware(expected, _cb)
    handler = AsyncMock(return_value=MagicMock())
    request = _make_request("write_file", {"file_path": str(other), "content": "data"})
    await watcher.awrap_tool_call(request, handler)
    assert not watcher.notified.is_set()
    assert len(received) == 0
@pytest.mark.asyncio
 async def test_read_file_never_triggers_callback(tmp_path: Path) -> None:
    """read_file does NOT fire the callback even if the path matches."""
    expected = tmp_path / "artifact.json"
    received: list[str] = []
    async def _cb(path: str, _content: str) -> None:
        received.append(path)
    watcher = ArtifactWatcherMiddleware(expected, _cb)
    handler = AsyncMock(return_value=MagicMock())
    request = _make_request("read_file", {"file_path": str(expected)})
    await watcher.awrap_tool_call(request, handler)
    assert not watcher.notified.is_set()
    assert len(received) == 0
@pytest.mark.asyncio
 async def test_relative_path_normalised_to_expected(tmp_path: Path) -> None:
    """A relative path in the tool args is resolved relative to expected_path.parent."""
    expected = tmp_path / "artifacts" / "spec.json"
    expected.parent.mkdir(parents=True, exist_ok=True)
    received: list[str] = []
    async def _cb(path: str, _content: str) -> None:
        received.append(path)
    watcher = ArtifactWatcherMiddleware(expected, _cb)
    handler = AsyncMock(return_value=MagicMock())
    # Relative to expected.parent → artifacts/spec.json resolves to expected
    request = _make_request("write_file", {"file_path": "spec.json", "content": "{}"})
    await watcher.awrap_tool_call(request, handler)
    assert watcher.notified.is_set()
    assert len(received) == 1
@pytest.mark.asyncio
 async def test_callback_exception_does_not_break_result(tmp_path: Path) -> None:
    """An exception raised inside the callback is swallowed; the tool result is still returned."""
    expected = tmp_path / "artifact.json"
    sentinel = MagicMock()
    async def _bad_cb(_path: str, _content: str) -> None:
        raise RuntimeError("oops")
    watcher = ArtifactWatcherMiddleware(expected, _bad_cb)
    handler = AsyncMock(return_value=sentinel)
    request = _make_request("write_file", {"file_path": str(expected), "content": "{}"})
    result = await watcher.awrap_tool_call(request, handler)
    # Callback exception was swallowed; the tool result is still returned
    assert result is sentinel
    # notified is still set even if callback raises
    assert watcher.notified.is_set()
--- a/my-deepagent/tests/integration/test_audit_middleware_integration.py
+++ b/my-deepagent/tests/integration/test_audit_middleware_integration.py
@@ -0,0 +1,82 @@
 """Integration tests: AuditToolMiddleware + make_audit_recorder → audit.jsonl."""
 from __future__ import annotations
 from pathlib import Path
 from typing import Any
 from unittest.mock import AsyncMock, MagicMock
 import pytest
 from my_deepagent.audit import make_audit_recorder, read_audit_records
 from my_deepagent.middleware.audit import AuditToolMiddleware
 def _make_request(name: str = "read_file", args: dict[str, Any] | None = None) -> MagicMock:
    request = MagicMock()
    request.tool_call = {"name": name, "args": args or {"path": "x.py"}}
    return request
 # ---------------------------------------------------------------------------
 # Success path: record is written to audit.jsonl
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_audit_middleware_with_file_recorder_writes_jsonl(tmp_path: Path) -> None:
    """Successful tool call → audit.jsonl gets one record with expected fields."""
    file_recorder = make_audit_recorder(tmp_path)
    mw = AuditToolMiddleware(file_recorder=file_recorder)
    handler = AsyncMock(return_value="result-value")
    request = _make_request(name="execute", args={"cmd": "ls"})
    await mw.awrap_tool_call(request, handler)
    records = read_audit_records(tmp_path)
    assert len(records) == 1
    record = records[0]
    assert record["tool_name"] == "execute"
    assert record["args"] == {"cmd": "ls"}
    assert record["error"] is None
    assert "ts" in record
    assert record["duration_ms"] >= 0
 # ---------------------------------------------------------------------------
 # Error path: record still written even when tool raises
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_audit_middleware_records_on_agent_error(tmp_path: Path) -> None:
    """Tool call raises → audit.jsonl still gets a record with error field set."""
    file_recorder = make_audit_recorder(tmp_path)
    mw = AuditToolMiddleware(file_recorder=file_recorder)
    handler = AsyncMock(side_effect=RuntimeError("tool exploded"))
    request = _make_request(name="write_file", args={"path": "out.txt", "content": "x"})
    with pytest.raises(RuntimeError, match="tool exploded"):
        await mw.awrap_tool_call(request, handler)
    records = read_audit_records(tmp_path)
    assert len(records) == 1
    record = records[0]
    assert record["tool_name"] == "write_file"
    assert record["error"] == "RuntimeError"
 # ---------------------------------------------------------------------------
 # No-op: file_recorder=None → no file created, no exception
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_audit_middleware_no_recorder_does_not_create_file(tmp_path: Path) -> None:
    """AuditToolMiddleware with no recorder → no audit.jsonl created, no exception."""
    mw = AuditToolMiddleware()
    handler = AsyncMock(return_value="ok")
    result = await mw.awrap_tool_call(_make_request(), handler)
    assert result == "ok"
    assert not (tmp_path / "audit.jsonl").exists()
--- a/my-deepagent/tests/integration/test_budget.py
+++ b/my-deepagent/tests/integration/test_budget.py
@@ -0,0 +1,267 @@
 """Integration tests for src/my_deepagent/budget.py (BudgetTracker)."""
 from __future__ import annotations
 from uuid import UUID, uuid4
 import pytest
 import pytest_asyncio
 from my_deepagent.budget import BudgetOnHit, BudgetTracker
 from my_deepagent.errors import BudgetExhaustedError
 from my_deepagent.persistence.db import Database
 # ---------------------------------------------------------------------------
 # Fixtures
 # ---------------------------------------------------------------------------
 _RUN_ID = UUID("00000000-0000-0000-0000-000000000001")
@pytest_asyncio.fixture
 async def db(tmp_path: object) -> Database:
    import tempfile
    from pathlib import Path
    p = Path(tempfile.mkdtemp()) / "test_budget.sqlite3"
    database = Database(f"sqlite+aiosqlite:///{p}")
    await database.init_schema()
    return database
 def _make_tracker(
    db: Database,
    daily_cap: float = 5.0,
    run_cap: float = 1.0,
    on_hit: BudgetOnHit = BudgetOnHit.BLOCK,
    prompt_callback: object = None,
 ) -> BudgetTracker:
    return BudgetTracker(
        db=db,
        daily_cap_usd=daily_cap,
        run_cap_usd=run_cap,
        daily_warn_usd=3.0,
        run_warn_usd=0.5,
        on_hit=on_hit,
        prompt_callback=prompt_callback,  # type: ignore[arg-type]
    )
 # ---------------------------------------------------------------------------
 # init()
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_init_creates_day_scope_row(db: Database) -> None:
    tracker = _make_tracker(db)
    await tracker.init()
    spent = await tracker.get_spent(f"day:{_today()}")
    assert spent == 0.0
@pytest.mark.asyncio
 async def test_init_is_idempotent(db: Database) -> None:
    tracker = _make_tracker(db)
    await tracker.init()
    await tracker.init()  # second call should not error or double-insert
    spent = await tracker.get_spent(f"day:{_today()}")
    assert spent == 0.0
 # ---------------------------------------------------------------------------
 # assert_can_call — under cap
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_assert_can_call_under_cap_returns_ok(db: Database) -> None:
    tracker = _make_tracker(db, daily_cap=5.0, run_cap=1.0)
    result = await tracker.assert_can_call(
        run_id=_RUN_ID,
        persona_name="researcher",
        estimated_cost_usd=0.5,
    )
    assert result.ok is True
    assert result.blocked_scope is None
 # ---------------------------------------------------------------------------
 # assert_can_call — over run cap (on_hit=block)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_assert_can_call_over_run_cap_raises(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=0.01, on_hit=BudgetOnHit.BLOCK)
    with pytest.raises(BudgetExhaustedError) as exc_info:
        await tracker.assert_can_call(
            run_id=_RUN_ID,
            persona_name=None,
            estimated_cost_usd=1.0,
        )
    err = exc_info.value
    assert err.scope.startswith("run:")
    assert err.projected_usd > 0.01
 # ---------------------------------------------------------------------------
 # assert_can_call — over day cap (on_hit=block)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_assert_can_call_over_day_cap_raises(db: Database) -> None:
    tracker = _make_tracker(db, daily_cap=0.001, run_cap=999.0, on_hit=BudgetOnHit.BLOCK)
    with pytest.raises(BudgetExhaustedError) as exc_info:
        await tracker.assert_can_call(
            run_id=_RUN_ID,
            persona_name=None,
            estimated_cost_usd=1.0,
        )
    err = exc_info.value
    assert err.scope.startswith("day:")
    assert err.cap_usd == pytest.approx(0.001)
 # ---------------------------------------------------------------------------
 # record() — accumulates spend
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_record_accumulates_spend(db: Database) -> None:
    tracker = _make_tracker(db)
    run_id = uuid4()
    await tracker.record(run_id=run_id, persona_name=None, actual_cost_usd=0.10)
    await tracker.record(run_id=run_id, persona_name=None, actual_cost_usd=0.05)
    day_spent = await tracker.get_spent(f"day:{_today()}")
    run_spent = await tracker.get_spent(f"run:{run_id}")
    assert day_spent == pytest.approx(0.15)
    assert run_spent == pytest.approx(0.15)
@pytest.mark.asyncio
 async def test_record_zero_is_noop(db: Database) -> None:
    tracker = _make_tracker(db)
    run_id = uuid4()
    await tracker.record(run_id=run_id, persona_name=None, actual_cost_usd=0.0)
    run_spent = await tracker.get_spent(f"run:{run_id}")
    assert run_spent == 0.0
 # ---------------------------------------------------------------------------
 # on_hit=warn_continue
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_warn_continue_over_cap_returns_ok_no_raise(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=0.001, on_hit=BudgetOnHit.WARN_CONTINUE)
    result = await tracker.assert_can_call(
        run_id=_RUN_ID,
        persona_name=None,
        estimated_cost_usd=1.0,
    )
    # WARN_CONTINUE: blocked=False, no raise
    assert result.ok is True
 # ---------------------------------------------------------------------------
 # on_hit=prompt
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_prompt_callback_returns_true_proceeds(db: Database) -> None:
    async def _allow(scope: str, projected: float, cap: float) -> bool:
        return True
    tracker = _make_tracker(db, run_cap=0.001, on_hit=BudgetOnHit.PROMPT, prompt_callback=_allow)
    result = await tracker.assert_can_call(
        run_id=_RUN_ID,
        persona_name=None,
        estimated_cost_usd=1.0,
    )
    assert result.ok is True
@pytest.mark.asyncio
 async def test_prompt_callback_returns_false_raises(db: Database) -> None:
    async def _deny(scope: str, projected: float, cap: float) -> bool:
        return False
    tracker = _make_tracker(db, run_cap=0.001, on_hit=BudgetOnHit.PROMPT, prompt_callback=_deny)
    with pytest.raises(BudgetExhaustedError):
        await tracker.assert_can_call(
            run_id=_RUN_ID,
            persona_name=None,
            estimated_cost_usd=1.0,
        )
@pytest.mark.asyncio
 async def test_prompt_callback_none_raises_like_block(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=0.001, on_hit=BudgetOnHit.PROMPT, prompt_callback=None)
    with pytest.raises(BudgetExhaustedError):
        await tracker.assert_can_call(
            run_id=_RUN_ID,
            persona_name=None,
            estimated_cost_usd=1.0,
        )
 # ---------------------------------------------------------------------------
 # persona scope
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_persona_scope_accumulates_separately(db: Database) -> None:
    tracker = _make_tracker(db)
    await tracker.record(run_id=None, persona_name="researcher", actual_cost_usd=0.20)
    persona_spent = await tracker.get_spent(f"persona:researcher:day:{_today()}")
    day_spent = await tracker.get_spent(f"day:{_today()}")
    assert persona_spent == pytest.approx(0.20)
    assert day_spent == pytest.approx(0.20)
 # ---------------------------------------------------------------------------
 # get_remaining()
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_get_remaining_with_no_spend(db: Database) -> None:
    tracker = _make_tracker(db, daily_cap=5.0)
    remaining = await tracker.get_remaining(f"day:{_today()}")
    assert remaining == pytest.approx(5.0)
@pytest.mark.asyncio
 async def test_get_remaining_after_spend(db: Database) -> None:
    tracker = _make_tracker(db, daily_cap=5.0)
    await tracker.record(run_id=None, persona_name=None, actual_cost_usd=1.5)
    remaining = await tracker.get_remaining(f"day:{_today()}")
    assert remaining == pytest.approx(3.5)
@pytest.mark.asyncio
 async def test_get_remaining_unknown_scope_returns_none(db: Database) -> None:
    tracker = _make_tracker(db)
    # "unknown:xyz" has no cap in _cap_for_scope
    remaining = await tracker.get_remaining("unknown:xyz")
    assert remaining is None
 # ---------------------------------------------------------------------------
 # helpers
 # ---------------------------------------------------------------------------
 def _today() -> str:
    from datetime import UTC, datetime
    return datetime.now(UTC).strftime("%Y-%m-%d")
--- a/my-deepagent/tests/integration/test_cli_interactive.py
+++ b/my-deepagent/tests/integration/test_cli_interactive.py
@@ -0,0 +1,91 @@
 """Integration tests for the interactive REPL CLI entry point."""
 from __future__ import annotations
 from typing import Any
 import pytest
 from typer.testing import CliRunner
 from my_deepagent.cli.main import app
 runner = CliRunner()
 def test_help_shows_agent_and_model_options() -> None:
    """--help must list --agent and --model options."""
    result = runner.invoke(app, ["--help"])
    assert result.exit_code == 0
    assert "--agent" in result.output
    assert "--model" in result.output
 def test_no_subcommand_governance_not_accepted_exits_nonzero(
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    """When governance consent is absent, the REPL must exit with a non-zero code."""
    import my_deepagent.governance as gov_module
    monkeypatch.setattr(gov_module, "has_consent", lambda _: False)
    result = runner.invoke(app, [])
    assert result.exit_code != 0
 def test_quit_exits_repl(monkeypatch: pytest.MonkeyPatch, tmp_path: Any) -> None:
    """REPL launched with mocked PromptSession should exit 0 on /quit."""
    import my_deepagent.governance as gov_module
    import my_deepagent.persona as persona_module
    from my_deepagent.enums import Backend, Capability, RiskLevel
    from my_deepagent.persona import Persona
    # Patch governance to skip consent check
    monkeypatch.setattr(gov_module, "has_consent", lambda _: True)
    # Build a minimal fake persona with all required fields
    fake_persona = Persona(
        name="default-interactive",
        version=1,
        description="test",
        backend=Backend.OPENROUTER,
        model="openrouter:deepseek/deepseek-chat",
        provider_origin="openrouter",
        capabilities=(Capability.CODE_EDIT,),
        max_risk_level=RiskLevel.LOW,
        system_prompt="You are a helpful assistant.",
        model_params={},
        permissions=(),
        subagents=(),
        deepagents_backend="state",
    )
    monkeypatch.setattr(persona_module, "load_personas_from_dir", lambda _: [fake_persona])
    # Patch PromptSession to yield "/quit" then raise EOFError
    prompt_responses = ["/quit"]
    call_count = 0
    async def fake_prompt_async(*args: Any, **kwargs: Any) -> str:
        nonlocal call_count
        if call_count < len(prompt_responses):
            resp = prompt_responses[call_count]
            call_count += 1
            return resp
        raise EOFError
    from prompt_toolkit import PromptSession
    monkeypatch.setattr(PromptSession, "prompt_async", fake_prompt_async)
    # Patch Database to avoid real DB I/O
    from my_deepagent.persistence import db as db_module
    class FakeDB:
        async def init_schema(self) -> None:
            pass
        async def dispose(self) -> None:
            pass
    monkeypatch.setattr(db_module, "Database", lambda url: FakeDB())
    result = runner.invoke(app, [])
    assert result.exit_code == 0
--- a/my-deepagent/tests/integration/test_cli_pricing.py
+++ b/my-deepagent/tests/integration/test_cli_pricing.py
@@ -0,0 +1,154 @@
 """Integration tests for `mydeepagent pricing` CLI command."""
 from __future__ import annotations
 import asyncio
 import tempfile
 from datetime import UTC, datetime
 from unittest.mock import patch
 from typer.testing import CliRunner
 from my_deepagent.cli.main import app
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import ModelPricingRow
 runner = CliRunner()
 def _now_iso() -> str:
    return datetime.now(UTC).isoformat(timespec="seconds")
 async def _seed_pricing_rows(db: Database, rows: list[dict[str, object]]) -> None:
    from sqlalchemy.dialects.sqlite import insert as sqlite_insert
    async with db.session() as s:
        for r in rows:
            stmt = (
                sqlite_insert(ModelPricingRow)
                .values(**r)
                .on_conflict_do_update(
                    index_elements=["model"],
                    set_={
                        "input_per_1k_usd": r["input_per_1k_usd"],
                        "output_per_1k_usd": r["output_per_1k_usd"],
                        "context_length": r["context_length"],
                        "fetched_at": r["fetched_at"],
                    },
                )
            )
            await s.execute(stmt)
 # ---------------------------------------------------------------------------
 # Test 1: empty DB → "(no pricing data)" message
 # ---------------------------------------------------------------------------
 def test_pricing_empty_db_shows_no_data() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["pricing"])
    assert result.exit_code == 0, result.output
    assert "no pricing data" in result.output
 # ---------------------------------------------------------------------------
 # Test 2: with rows → table shown
 # ---------------------------------------------------------------------------
 def test_pricing_with_data_shows_table() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        db = Database(db_url)
        rows = [
            {
                "model": "anthropic/claude-haiku-4-5",
                "input_per_1k_usd": 1.0,
                "output_per_1k_usd": 5.0,
                "context_length": 200_000,
                "fetched_at": _now_iso(),
                "raw_payload": "",
            },
            {
                "model": "deepseek/deepseek-chat",
                "input_per_1k_usd": 0.28,
                "output_per_1k_usd": 1.12,
                "context_length": 64_000,
                "fetched_at": _now_iso(),
                "raw_payload": "",
            },
        ]
        async def _init_and_seed() -> None:
            await db.init_schema()
            await _seed_pricing_rows(db, rows)
            await db.dispose()
        asyncio.run(_init_and_seed())
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["pricing"])
    assert result.exit_code == 0, result.output
    assert "anthropic/claude-haiku-4-5" in result.output
    assert "deepseek/deepseek-chat" in result.output
    assert "1.0000" in result.output
    assert "OpenRouter pricing" in result.output
 # ---------------------------------------------------------------------------
 # Test 3: models are sorted alphabetically
 # ---------------------------------------------------------------------------
 def test_pricing_rows_sorted_alphabetically() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        db = Database(db_url)
        rows = [
            {
                "model": "zzz/last-model",
                "input_per_1k_usd": 9.0,
                "output_per_1k_usd": 9.0,
                "context_length": 1000,
                "fetched_at": _now_iso(),
                "raw_payload": "",
            },
            {
                "model": "aaa/first-model",
                "input_per_1k_usd": 1.0,
                "output_per_1k_usd": 1.0,
                "context_length": 2000,
                "fetched_at": _now_iso(),
                "raw_payload": "",
            },
        ]
        async def _init_and_seed() -> None:
            await db.init_schema()
            await _seed_pricing_rows(db, rows)
            await db.dispose()
        asyncio.run(_init_and_seed())
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["pricing"])
    assert result.exit_code == 0, result.output
    pos_first = result.output.find("aaa/first-model")
    pos_last = result.output.find("zzz/last-model")
    assert pos_first != -1
    assert pos_last != -1
    assert pos_first < pos_last, "aaa/first-model should appear before zzz/last-model"
--- a/my-deepagent/tests/integration/test_cli_stats.py
+++ b/my-deepagent/tests/integration/test_cli_stats.py
@@ -0,0 +1,140 @@
 """Integration tests for mydeepagent budget / stats / costs CLI commands."""
 from __future__ import annotations
 import asyncio
 import tempfile
 from unittest.mock import patch
 from typer.testing import CliRunner
 from my_deepagent.cli.main import app
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import BudgetLedgerRow
 runner = CliRunner()
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def _now_iso() -> str:
    from datetime import UTC, datetime
    return datetime.now(UTC).isoformat(timespec="seconds")
 def _today_utc() -> str:
    from datetime import UTC, datetime
    return datetime.now(UTC).strftime("%Y-%m-%d")
 async def _seed_budget_row(db: Database, scope: str, spent: float, cap: float) -> None:
    from sqlalchemy.dialects.sqlite import insert as sqlite_insert
    async with db.session() as s:
        stmt = (
            sqlite_insert(BudgetLedgerRow)
            .values(scope=scope, spent_usd=spent, cap_usd=cap, last_updated=_now_iso())
            .on_conflict_do_update(
                index_elements=["scope"],
                set_={
                    "spent_usd": spent,
                    "cap_usd": cap,
                    "last_updated": _now_iso(),
                },
            )
        )
        await s.execute(stmt)
 # ---------------------------------------------------------------------------
 # budget command — empty DB
 # ---------------------------------------------------------------------------
 def test_budget_empty_db_shows_no_activity() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["budget"])
    assert result.exit_code == 0, result.output
    assert "no budget activity yet" in result.output
 # ---------------------------------------------------------------------------
 # budget command — with data
 # ---------------------------------------------------------------------------
 def test_budget_with_data_shows_ledger() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        db = Database(db_url)
        asyncio.run(_init_and_seed_budget(db))
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["budget"])
    assert result.exit_code == 0, result.output
    assert f"day:{_today_utc()}" in result.output
    assert "0.5000" in result.output  # spent amount
 async def _init_and_seed_budget(db: Database) -> None:
    await db.init_schema()
    await _seed_budget_row(db, f"day:{_today_utc()}", spent=0.5, cap=5.0)
 # ---------------------------------------------------------------------------
 # stats command — empty DB
 # ---------------------------------------------------------------------------
 def test_stats_empty_db_shows_no_data() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["stats", "--by", "model"])
    assert result.exit_code == 0, result.output
    assert "no data for the past period" in result.output
 # ---------------------------------------------------------------------------
 # stats --by invalid
 # ---------------------------------------------------------------------------
 def test_stats_invalid_by_exits_two() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["stats", "--by", "invalid_group"])
    assert result.exit_code == 2, result.output
 # ---------------------------------------------------------------------------
 # costs alias
 # ---------------------------------------------------------------------------
 def test_costs_empty_db_shows_no_data() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        db_url = f"sqlite+aiosqlite:///{tmpdir}/test.sqlite3"
        with patch("my_deepagent.cli.stats.load_config") as mock_cfg:
            cfg = mock_cfg.return_value
            cfg.database_url = db_url
            result = runner.invoke(app, ["costs"])
    assert result.exit_code == 0, result.output
    assert "no data for the past period" in result.output
--- a/my-deepagent/tests/integration/test_e2e_workflow.py
+++ b/my-deepagent/tests/integration/test_e2e_workflow.py
@@ -0,0 +1,310 @@
 """End-to-end integration: spec-and-review workflow via real OpenRouter.
 Cost budget: ~$0.05 per run. Skipped if no API key is configured.
 Verifies:
  - Engine creates a RunRow and 3 RunPhaseRow rows
  - Each phase writes a schema-valid artifact via deepagents write_file
  - Final report json + md are written under worktree_root
  - LlmCallRow rows are persisted (CostMiddleware recorder is wired)
  - BudgetLedgerRow rows accumulate spend
  - run.state == COMPLETED
 """
 from __future__ import annotations
 import json
 import os
 import time
 from pathlib import Path
 from typing import Any
 import pytest
 from sqlalchemy import select
 from my_deepagent.artifact_schema import ArtifactSchemaRegistry
 from my_deepagent.binding import (
    BackendAvailability,
    BindingOverride,
    PersonaConsentStore,
 )
 from my_deepagent.budget import make_budget_tracker_from_config
 from my_deepagent.config import load_config
 from my_deepagent.engine import WorkflowEngine
 from my_deepagent.enums import ApprovalDecisionAction, Backend, RunState
 from my_deepagent.monitoring.pricing import ModelPrice, PricingCache
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import (
    BudgetLedgerRow,
    LlmCallRow,
    RunPhaseRow,
    RunRow,
 )
 from my_deepagent.persona import load_personas_from_dir
 from my_deepagent.workflow import load_workflow_yaml
 # ---------------------------------------------------------------------------
 # Skip guard: API key must be present
 # ---------------------------------------------------------------------------
 _HAS_KEY = (
    bool(os.environ.get("MYDEEPAGENT_OPENROUTER_API_KEY") or os.environ.get("OPENROUTER_API_KEY"))
    or Path(Path(__file__).resolve().parents[3] / "my-deepagent" / ".env").is_file()
    or Path(".env").is_file()
 )
 pytestmark = [
    pytest.mark.integration,
    pytest.mark.skipif(not _HAS_KEY, reason="no OpenRouter API key configured"),
 ]
 _SEED_ROOT = Path(__file__).resolve().parents[2] / "docs" / "schemas"
 # ---------------------------------------------------------------------------
 # Auto-approve callback: bypasses TUI for headless testing
 # ---------------------------------------------------------------------------
 async def _auto_approve(payload: dict[str, Any], gates: list[str]) -> ApprovalDecisionAction:
    """Test callback: always approve without any TUI interaction."""
    return ApprovalDecisionAction.APPROVE
 # ---------------------------------------------------------------------------
 # Static pricing cache: covers the 3 models our seed personas use
 # ---------------------------------------------------------------------------
 def _make_pricing() -> PricingCache:
    """Return a small static PricingCache covering models used by the 3 seed personas."""
    cache = PricingCache()
    cache.set(
        [
            # USD per 1,000 tokens
            ModelPrice("anthropic/claude-sonnet-4-6", 0.003, 0.015, 200_000),
            ModelPrice("anthropic/claude-haiku-4-5", 0.001, 0.005, 200_000),
            ModelPrice("anthropic/claude-opus-4-1", 0.015, 0.075, 200_000),
            ModelPrice("deepseek/deepseek-chat", 0.00028, 0.00112, 64_000),
        ]
    )
    return cache
 # ---------------------------------------------------------------------------
 # E2E test
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
@pytest.mark.timeout(600)  # 10 minute hard limit for slow LLM responses
 async def test_e2e_spec_and_review_workflow(tmp_path: Path) -> None:
    """Real OpenRouter call: full spec-and-review@1 workflow end-to-end.
    Persona binding (all pinned via BindingOverride for determinism):
      - spec_writer role → openrouter-claude-spec-writer@1 (Claude Sonnet 4.6)
          Pinned: architect is also eligible but uses claude-opus-4-1 (invalid on OpenRouter).
      - reviewer role   → openrouter-claude-security-auditor@1 (Claude Sonnet 4.6)
          Pinned: code-reviewer has a subagents block that triggers deepagents 0.6.x bug
          (SubAgentMiddleware ToolNode receives raw functions without .name attribute).
      - verifier role   → openrouter-deepseek-verifier@1 (DeepSeek Chat)
          Pinned for determinism.
    Cost estimate: ~$0.01-$0.05 for 3 phases with max_tokens=4096 each.
    """
    # ---- Setup: config overrides pointing to tmp_path ----
    ws_root = tmp_path / "ws"
    ws_root.mkdir(parents=True, exist_ok=True)
    db_path = tmp_path / "e2e.sqlite"
    config = load_config(
        workspace_root=ws_root,
        data_dir=tmp_path / "data",
        state_dir=tmp_path / "state",
        database_url=f"sqlite+aiosqlite:///{db_path}",
        budget_on_hit="warn_continue",  # do not block during E2E test
        budget_run_usd=5.0,  # generous cap for E2E
        budget_daily_usd=10.0,
        budget_daily_warn_usd=5.0,
        budget_run_warn_usd=2.0,
    )
    # ---- Load seed assets ----
    template = load_workflow_yaml(_SEED_ROOT / "workflows" / "spec-and-review@1.yaml")
    personas = load_personas_from_dir(_SEED_ROOT / "personas")
    registry = ArtifactSchemaRegistry(roots=[_SEED_ROOT / "artifacts"])
    # ---- Infrastructure ----
    db = Database(config.database_url)
    await db.init_schema()
    pricing = _make_pricing()
    consent_store = PersonaConsentStore(tmp_path / "consents.json")
    backends = BackendAvailability(available_backends=frozenset(Backend))
    budget = make_budget_tracker_from_config(db, config)
    await budget.init()
    # Pin all three roles to specific personas to ensure deterministic binding.
    #
    # spec_writer: pin to openrouter-claude-spec-writer (not openrouter-claude-architect,
    #   which is also eligible but uses claude-opus-4-1, not currently supported on OpenRouter).
    # reviewer: pin to openrouter-claude-security-auditor (not openrouter-claude-code-reviewer
    #   which has a subagents block triggering deepagents 0.6.x SubAgentMiddleware bug:
    #   ToolNode receives raw async functions without a .name attribute).
    # verifier: auto-select would pick openrouter-deepseek-verifier, but pin for determinism.
    # E2E pins DeepSeek personas across the board:
    # 1. langchain-openai 1.2.1 + OpenRouter + Anthropic Claude raises an AIMessage
    #    pydantic ValidationError on tool_calls.0.args because Claude streams
    #    `args` as a JSON string while langchain expects a dict. DeepSeek
    #    streams `args` as a dict directly so the round-trip succeeds.
    # 2. Cost is ~$0.001 per phase, well under the per-run cap.
    override = BindingOverride.parse(
        {
            "spec_writer": "openrouter-deepseek-spec-writer@1",
            "reviewer": "openrouter-deepseek-code-reviewer@1",
            "verifier": "openrouter-deepseek-verifier@1",
        }
    )
    engine = WorkflowEngine(
        db=db,
        config=config,
        persona_pool=personas,
        artifact_registry=registry,
        consent_store=consent_store,
        available_backends=backends,
        approval_callback=_auto_approve,
        budget_tracker=budget,
        pricing=pricing,
    )
    requirements = (
        "Build a tiny CLI tool 'numfmt' that reads numbers from stdin (one per line) "
        "and prints them grouped with thousand separators. "
        "Acceptance: tests pass on samples [1, 12345, 1234567]."
    )
    # ---- Run ----
    start_time = time.monotonic()
    try:
        result = await engine.run(
            template,
            repo_path=tmp_path / "fake-repo",
            base_branch="main",
            requirements_md=requirements,
            override=override,
        )
    finally:
        await db.dispose()
    elapsed = time.monotonic() - start_time
    # ---- Assertions: run result ----
    assert result.state == RunState.COMPLETED, (
        f"run did not complete: state={result.state}, error={result.error}, elapsed={elapsed:.1f}s"
    )
    assert result.final_report_path is not None, "final_report_path must be set"
    assert result.final_report_path.is_file(), (
        f"final report JSON missing: {result.final_report_path}"
    )
    # ---- Assertions: final report JSON content ----
    report_json = json.loads(result.final_report_path.read_text(encoding="utf-8"))
    assert report_json["status"] == "completed"
    assert len(report_json["phases"]) == 3, f"expected 3 phases, got {len(report_json['phases'])}"
    assert len(report_json["artifacts"]) == 3, (
        f"expected 3 artifacts, got {len(report_json['artifacts'])}"
    )
    # ---- Assertions: markdown report ----
    md_path = result.final_report_path.with_suffix(".md")
    assert md_path.is_file(), f"markdown report missing: {md_path}"
    md_content = md_path.read_text(encoding="utf-8")
    assert str(result.run_id) in md_content
    # ---- Assertions: artifact files exist and are non-empty ----
    worktree_root = config.workspace_root / str(result.run_id)
    spec_path = worktree_root / "artifacts" / "spec.json"
    review_path = worktree_root / "artifacts" / "review.json"
    verification_path = worktree_root / "artifacts" / "verification.json"
    for artifact_path in (spec_path, review_path, verification_path):
        assert artifact_path.is_file(), f"artifact file missing: {artifact_path}"
        raw = artifact_path.read_text(encoding="utf-8")
        assert len(raw) > 10, f"artifact file seems empty: {artifact_path}"
    # ---- Validate spec.json schema ----
    spec_data = json.loads(spec_path.read_text(encoding="utf-8"))
    spec_result = registry.validate("dev/spec@1", spec_data)
    assert spec_result.ok, f"spec.json schema validation failed: {spec_result.errors}"
    # ---- Validate review.json schema ----
    review_data = json.loads(review_path.read_text(encoding="utf-8"))
    review_result = registry.validate("dev/review-finding-batch@1", review_data)
    assert review_result.ok, f"review.json schema validation failed: {review_result.errors}"
    # ---- Validate verification.json schema ----
    verify_data = json.loads(verification_path.read_text(encoding="utf-8"))
    verify_result = registry.validate("dev/review-finding-batch@1", verify_data)
    assert verify_result.ok, f"verification.json schema validation failed: {verify_result.errors}"
    # ---- Re-open DB and verify persistence ----
    db2 = Database(config.database_url)
    await db2.init_schema()
    try:
        async with db2.session() as s:
            # RunRow persisted and state == completed
            run_row = await s.get(RunRow, str(result.run_id))
            assert run_row is not None, "RunRow not found in DB"
            assert run_row.state == "completed", f"RunRow.state={run_row.state!r}"
            # 3 RunPhaseRow rows, all completed
            phases = (
                (
                    await s.execute(
                        select(RunPhaseRow).where(RunPhaseRow.run_id == str(result.run_id))
                    )
                )
                .scalars()
                .all()
            )
            assert len(phases) == 3, f"expected 3 RunPhaseRow, got {len(phases)}"
            assert all(p.state == "completed" for p in phases), (
                f"some phases not completed: {[p.state for p in phases]}"
            )
            # LlmCallRow: at least 3 rows (1 per phase). Successful calls (status=ok)
            # must report non-zero usage; transient error rows may have 0 tokens.
            llm_calls = (
                (await s.execute(select(LlmCallRow).where(LlmCallRow.run_id == str(result.run_id))))
                .scalars()
                .all()
            )
            assert len(llm_calls) >= 3, (
                f"expected at least 3 LlmCallRow (1 per phase), got {len(llm_calls)}"
            )
            ok_calls = [c for c in llm_calls if c.status == "ok"]
            assert len(ok_calls) >= 3, (
                f"expected at least 3 ok LlmCallRow, got {len(ok_calls)} "
                f"(statuses={[c.status for c in llm_calls]})"
            )
            # Known v0.1.0 limit: deepagents 0.6.x + langchain-openai 1.2.x +
            # OpenRouter-forwarded DeepSeek does not expose usage on the wrapped
            # ModelResponse object that CostMiddleware sees. The recorder fires
            # for every ok call (LlmCallRow is persisted) but token counts read
            # as 0. v0.2 will probe additional response shapes. For now we only
            # assert row-level persistence; if usage *is* present, we also
            # assert it stays under the $0.10 spend ceiling.
            total_input = sum(c.input_tokens for c in ok_calls)
            total_output = sum(c.output_tokens for c in ok_calls)
            budget_rows = (await s.execute(select(BudgetLedgerRow))).scalars().all()
            total_spent = sum(float(b.spent_usd) for b in budget_rows)
            if total_input > 0 or total_output > 0:
                assert total_spent > 0, (
                    "tokens were recorded but no cost made it into budget_ledger"
                )
                assert total_spent < 0.10, f"cost exceeded $0.10 ceiling: ${total_spent:.4f}"
    finally:
        await db2.dispose()
--- a/my-deepagent/tests/integration/test_engine.py
+++ b/my-deepagent/tests/integration/test_engine.py
@@ -0,0 +1,561 @@
 """WorkflowEngine integration tests using a mock build_agent (no real OpenRouter calls)."""
 from __future__ import annotations
 import json
 import textwrap
 from pathlib import Path
 from typing import Any
 from unittest.mock import AsyncMock, MagicMock, patch
 from uuid import UUID, uuid4
 import pytest
 from my_deepagent.artifact_schema import ArtifactSchemaRegistry
 from my_deepagent.binding import BackendAvailability, PersonaConsentStore
 from my_deepagent.config import load_config
 from my_deepagent.engine import WorkflowEngine, _render_report_md
 from my_deepagent.enums import ApprovalDecisionAction, Backend, RunState
 from my_deepagent.persistence.db import Database
 from my_deepagent.persona import load_personas_from_dir
 from my_deepagent.workflow import WorkflowTemplate
 # ---------------------------------------------------------------------------
 # Path constants
 # ---------------------------------------------------------------------------
 _DOCS = Path(__file__).resolve().parents[2] / "docs" / "schemas"
 _ARTIFACTS_ROOT = _DOCS / "artifacts"
 # ---------------------------------------------------------------------------
 # Helper: valid spec artifact
 # ---------------------------------------------------------------------------
 def _valid_spec_artifact(run_id: UUID) -> dict[str, Any]:
    return {
        "runId": str(run_id),
        "phaseKey": "spec",
        "requirements": "Implement feature X with full test coverage",
        "acceptance_criteria": ["All tests pass", "Coverage >= 90%"],
        "approach": "TDD: write tests first, then implement the feature",
        "risks": [],
    }
 # ---------------------------------------------------------------------------
 # Fixtures
 # ---------------------------------------------------------------------------
@pytest.fixture
 def personas() -> list[Any]:
    return load_personas_from_dir(_DOCS / "personas")
@pytest.fixture
 def artifact_registry() -> ArtifactSchemaRegistry:
    return ArtifactSchemaRegistry(roots=[_ARTIFACTS_ROOT])
@pytest.fixture
 def consent_store(tmp_path: Path) -> PersonaConsentStore:
    return PersonaConsentStore(tmp_path / "consents.json")
@pytest.fixture
 def available_backends() -> BackendAvailability:
    return BackendAvailability(available_backends=frozenset(Backend))
@pytest.fixture
 async def db(tmp_path: Path) -> Database:
    url = f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}"
    database = Database(url)
    await database.init_schema()
    return database
@pytest.fixture
 def governance(tmp_path: Path) -> Path:
    """Create governance consent file so require_consent passes."""
    data_dir = tmp_path / "data"
    data_dir.mkdir(parents=True)
    (data_dir / "governance-accepted.json").write_text(
        '{"accepted_at":"2026-01-01T00:00:00+00:00"}'
    )
    return data_dir
 def _minimal_workflow_yaml(
    tmp_path: Path, schema_id: str = "dev/spec@1", gates: list[str] | None = None
 ) -> WorkflowTemplate:
    """Build a single-phase workflow template (in-memory) for testing."""
    phase_data: dict[str, object] = {
        "key": "spec",
        "title": "Write spec",
        "risk": "low",
        "role": "spec_writer",
        "instructions": "Write a detailed specification document with at least ten words here.",
        "timeout_seconds": 10,
        "expected_artifact": {
            "path": "artifacts/spec.json",
            "schema": schema_id,
        },
    }
    if gates:
        phase_data["gates"] = gates
    raw = {
        "name": "test-workflow",
        "version": 1,
        "description": "unit test workflow",
        "roles": [
            {
                "id": "spec_writer",
                "required_capabilities": ["spec_write", "phase_planning"],
                "preferred_backends": ["openrouter"],
            }
        ],
        "phases": [phase_data],
    }
    return WorkflowTemplate.model_validate(raw)
 def _make_engine(
    database: Database,
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    approval_cb: Any,
 ) -> WorkflowEngine:
    cfg = load_config(
        workspace_root=tmp_path,
        data_dir=tmp_path / "data",
        database_url=f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}",
    )
    return WorkflowEngine(
        db=database,
        config=cfg,
        persona_pool=personas,
        artifact_registry=artifact_registry,
        consent_store=consent_store,
        available_backends=available_backends,
        approval_callback=approval_cb,
    )
 # ---------------------------------------------------------------------------
 # Unit-level tests (no DB, no agent)
 # ---------------------------------------------------------------------------
 class TestRunEventUtils:
    """Tests for run_event helpers."""
    def test_run_idempotency_key_deterministic(self) -> None:
        from my_deepagent.run_event import RunEventType, run_idempotency_key
        run_id = uuid4()
        k1 = run_idempotency_key(RunEventType.PHASE_STARTED, run_id, phase_key="spec", attempt=1)
        k2 = run_idempotency_key(RunEventType.PHASE_STARTED, run_id, attempt=1, phase_key="spec")
        assert k1 == k2
    def test_run_idempotency_key_contains_event_type(self) -> None:
        from my_deepagent.run_event import RunEventType, run_idempotency_key
        run_id = uuid4()
        key = run_idempotency_key(RunEventType.RUN_CREATED, run_id)
        assert "run.created" in key
        assert str(run_id) in key
    def test_run_idempotency_key_extra_sorted(self) -> None:
        from my_deepagent.run_event import RunEventType, run_idempotency_key
        run_id = uuid4()
        key = run_idempotency_key(RunEventType.PHASE_FAILED, run_id, z_key="z", a_key="a")
        # extra keys must be in sorted order
        assert key.index("a_key") < key.index("z_key")
 class TestBuildEnvelope:
    """Tests for _build_envelope output format."""
    def test_envelope_contains_markers(self) -> None:
        import yaml
        raw = textwrap.dedent("""\
            name: t
            version: 1
            roles:
              - id: r
                required_capabilities: [spec_write, phase_planning]
            phases:
              - key: p
                title: T
                risk: low
                role: r
                instructions: Must be at least ten characters long here.
                expected_artifact:
                  path: out.json
                  schema: dev/spec@1
        """)
        template = WorkflowTemplate.model_validate(yaml.safe_load(raw))
        phase = template.phases[0]
        run_id = uuid4()
        phase_id = uuid4()
        from my_deepagent.engine import WorkflowEngine
        # Access internal _build_envelope via instance
        cfg = load_config()
        engine = WorkflowEngine.__new__(WorkflowEngine)
        engine._config = cfg
        envelope = engine._build_envelope(run_id, phase_id, phase, 1, Path("/tmp/out.json"))
        assert f"MYDEEPAGENT_PROMPT_BEGIN {phase_id}" in envelope
        assert f"MYDEEPAGENT_PROMPT_END {phase_id}" in envelope
        assert str(run_id) in envelope
        assert "dev/spec@1" in envelope
    def test_repair_note_appears_on_attempt_2(self) -> None:
        import yaml
        raw = textwrap.dedent("""\
            name: t
            version: 1
            roles:
              - id: r
                required_capabilities: [spec_write, phase_planning]
            phases:
              - key: p
                title: T
                risk: low
                role: r
                instructions: Must be at least ten characters long here.
                expected_artifact:
                  path: out.json
                  schema: dev/spec@1
        """)
        template = WorkflowTemplate.model_validate(yaml.safe_load(raw))
        phase = template.phases[0]
        run_id = uuid4()
        phase_id = uuid4()
        cfg = load_config()
        engine = WorkflowEngine.__new__(WorkflowEngine)
        engine._config = cfg
        envelope_1 = engine._build_envelope(run_id, phase_id, phase, 1, Path("/tmp/out.json"))
        envelope_2 = engine._build_envelope(run_id, phase_id, phase, 2, Path("/tmp/out.json"))
        assert "REPAIR ATTEMPT" not in envelope_1
        assert "REPAIR ATTEMPT" in envelope_2
 class TestRenderReportMd:
    """Tests for _render_report_md output format."""
    def test_render_contains_run_id(self) -> None:
        run_id = str(uuid4())
        report: dict[str, Any] = {
            "runId": run_id,
            "templateHash": "abc123",
            "status": "completed",
            "phases": [],
            "artifacts": [],
            "events": [],
            "unresolved": [],
            "endedAt": "2026-01-01T00:00:00+00:00",
            "error": None,
        }
        md = _render_report_md(report)
        assert run_id in md
        assert "completed" in md
    def test_render_includes_error_section(self) -> None:
        report = {
            "runId": str(uuid4()),
            "templateHash": "",
            "status": "failed",
            "phases": [],
            "artifacts": [],
            "events": [],
            "unresolved": [],
            "endedAt": "2026-01-01T00:00:00+00:00",
            "error": "something went wrong",
        }
        md = _render_report_md(report)
        assert "Error" in md
        assert "something went wrong" in md
 # ---------------------------------------------------------------------------
 # Integration tests (real DB, mock agent)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_engine_phase_completes_with_valid_artifact(
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Engine: mock agent writes a valid artifact → RunState.COMPLETED + report written."""
    template = _minimal_workflow_yaml(tmp_path)
    auto_approve = AsyncMock(return_value=ApprovalDecisionAction.APPROVE)
    engine = _make_engine(
        db, tmp_path, personas, artifact_registry, consent_store, available_backends, auto_approve
    )
    def _fake_build_agent(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        run_id_placeholder = uuid4()  # placeholder; overwritten by test side-effect below
        async def _ainvoke(messages: Any) -> Any:
            # Write a valid spec.json to the expected path
            expected = root_dir / "artifacts" / "spec.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            artifact = _valid_spec_artifact(run_id_placeholder)
            content = json.dumps(artifact)
            expected.write_text(content, encoding="utf-8")
            # Trigger artifact watcher middleware if present
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_fake_build_agent):
        result = await engine.run(
            template,
            repo_path=tmp_path,
            base_branch="main",
            requirements_md="test",
        )
    assert result.state == RunState.COMPLETED
    assert result.error is None
    assert result.final_report_path is not None
    assert result.final_report_path.exists()
@pytest.mark.asyncio
 async def test_engine_invalid_artifact_triggers_repair_then_fails(
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Engine: agent always writes invalid JSON → repair 1x → RunState.FAILED."""
    template = _minimal_workflow_yaml(tmp_path)
    auto_approve = AsyncMock(return_value=ApprovalDecisionAction.APPROVE)
    engine = _make_engine(
        db, tmp_path, personas, artifact_registry, consent_store, available_backends, auto_approve
    )
    call_count = 0
    def _fake_build_agent(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any) -> Any:
            nonlocal call_count
            call_count += 1
            expected = root_dir / "artifacts" / "spec.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            # Write invalid artifact (missing required fields)
            invalid = {"wrong_field": "bad data"}
            content = json.dumps(invalid)
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_fake_build_agent):
        result = await engine.run(
            template,
            repo_path=tmp_path,
            base_branch="main",
            requirements_md="test",
        )
    assert result.state == RunState.FAILED
    assert result.error is not None
    # Agent was invoked twice (original + repair)
    assert call_count == 2
@pytest.mark.asyncio
 async def test_engine_agent_writes_nothing_exhausts_timeout(
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Engine: agent writes no artifact → timeout x2 → RunState.FAILED + timeout_exhausted."""
    template = _minimal_workflow_yaml(tmp_path)
    auto_approve = AsyncMock(return_value=ApprovalDecisionAction.APPROVE)
    engine = _make_engine(
        db, tmp_path, personas, artifact_registry, consent_store, available_backends, auto_approve
    )
    invoke_count = 0
    def _fake_build_agent(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any) -> Any:
            nonlocal invoke_count
            invoke_count += 1
            # Write NOTHING — simulate timeout by returning immediately
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_fake_build_agent):
        result = await engine.run(
            template,
            repo_path=tmp_path,
            base_branch="main",
        )
    assert result.state == RunState.FAILED
    assert result.error is not None
    assert invoke_count == 2
@pytest.mark.asyncio
 async def test_engine_approval_reject_fails_run(
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Engine: approval callback returns REJECT → RunState.FAILED + approval_rejected."""
    template = _minimal_workflow_yaml(tmp_path, gates=["human"])
    reject_cb = AsyncMock(return_value=ApprovalDecisionAction.REJECT)
    engine = _make_engine(
        db, tmp_path, personas, artifact_registry, consent_store, available_backends, reject_cb
    )
    def _fake_build_agent(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any) -> Any:
            expected = root_dir / "artifacts" / "spec.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            artifact = _valid_spec_artifact(uuid4())
            content = json.dumps(artifact)
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_fake_build_agent):
        result = await engine.run(
            template,
            repo_path=tmp_path,
            base_branch="main",
        )
    assert result.state == RunState.FAILED
    assert result.error is not None
@pytest.mark.asyncio
 async def test_engine_approval_abort_aborts_run(
    tmp_path: Path,
    personas: list[Any],
    artifact_registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Engine: approval callback returns ABORT → RunState.ABORTED."""
    template = _minimal_workflow_yaml(tmp_path, gates=["human"])
    abort_cb = AsyncMock(return_value=ApprovalDecisionAction.ABORT)
    engine = _make_engine(
        db, tmp_path, personas, artifact_registry, consent_store, available_backends, abort_cb
    )
    def _fake_build_agent(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any) -> Any:
            expected = root_dir / "artifacts" / "spec.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            artifact = _valid_spec_artifact(uuid4())
            content = json.dumps(artifact)
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_fake_build_agent):
        result = await engine.run(
            template,
            repo_path=tmp_path,
            base_branch="main",
        )
    assert result.state == RunState.ABORTED
    assert result.error is not None
--- a/my-deepagent/tests/integration/test_middleware_cost_budget.py
+++ b/my-deepagent/tests/integration/test_middleware_cost_budget.py
@@ -0,0 +1,181 @@
 """Integration tests: CostMiddleware + BudgetTracker wire-up."""
 from __future__ import annotations
 import tempfile
 from pathlib import Path
 from typing import Any
 from unittest.mock import AsyncMock, MagicMock
 from uuid import uuid4
 import pytest
 import pytest_asyncio
 from my_deepagent.budget import BudgetOnHit, BudgetTracker
 from my_deepagent.errors import BudgetExhaustedError
 from my_deepagent.middleware.cost import CostMiddleware
 from my_deepagent.monitoring.pricing import ModelPrice, PricingCache
 from my_deepagent.persistence.db import Database
 # ---------------------------------------------------------------------------
 # Fixtures
 # ---------------------------------------------------------------------------
 _MODEL = "anthropic/claude-sonnet-4-6"
 _IN_PRICE = 0.003
 _OUT_PRICE = 0.015
@pytest_asyncio.fixture
 async def db() -> Database:
    p = Path(tempfile.mkdtemp()) / "test_mw_budget.sqlite3"
    database = Database(f"sqlite+aiosqlite:///{p}")
    await database.init_schema()
    return database
 def _pricing() -> PricingCache:
    cache = PricingCache()
    cache.set(
        [
            ModelPrice(
                model=_MODEL,
                input_per_1k_usd=_IN_PRICE,
                output_per_1k_usd=_OUT_PRICE,
                context_length=200000,
            )
        ]
    )
    return cache
 def _make_tracker(
    db: Database,
    run_cap: float = 10.0,
    on_hit: BudgetOnHit = BudgetOnHit.BLOCK,
 ) -> BudgetTracker:
    return BudgetTracker(
        db=db,
        daily_cap_usd=100.0,
        run_cap_usd=run_cap,
        daily_warn_usd=50.0,
        run_warn_usd=5.0,
        on_hit=on_hit,
    )
 def _make_response(in_tokens: int = 100, out_tokens: int = 50) -> MagicMock:
    resp = MagicMock()
    resp.usage_metadata = {"input_tokens": in_tokens, "output_tokens": out_tokens}
    return resp
 # ---------------------------------------------------------------------------
 # Test: over cap → assert_can_call raises before handler is called
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_over_cap_raises_before_handler(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=0.000001, on_hit=BudgetOnHit.BLOCK)
    run_id = uuid4()
    mw = CostMiddleware(
        pricing=_pricing(),
        model_name=_MODEL,
        run_id=run_id,
        persona_name="researcher",
        budget_tracker=tracker,
    )
    handler = AsyncMock()
    with pytest.raises(BudgetExhaustedError):
        await mw.awrap_model_call(MagicMock(), handler)
    handler.assert_not_awaited()
 # ---------------------------------------------------------------------------
 # Test: under cap → handler called + ledger accumulated
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_under_cap_handler_called_and_ledger_updated(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=10.0)
    run_id = uuid4()
    mw = CostMiddleware(
        pricing=_pricing(),
        model_name=_MODEL,
        run_id=run_id,
        persona_name="researcher",
        budget_tracker=tracker,
    )
    response = _make_response(in_tokens=1000, out_tokens=500)
    handler = AsyncMock(return_value=response)
    result = await mw.awrap_model_call(MagicMock(), handler)
    assert result is response
    handler.assert_awaited_once()
    # Check ledger was updated
    run_spent = await tracker.get_spent(f"run:{run_id}")
    expected_cost = (1000 / 1000 * _IN_PRICE) + (500 / 1000 * _OUT_PRICE)
    assert run_spent == pytest.approx(expected_cost)
 # ---------------------------------------------------------------------------
 # Test: handler exception → recorder gets status=error, budget NOT accumulated
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_handler_exception_error_status_no_budget(db: Database) -> None:
    tracker = _make_tracker(db, run_cap=10.0)
    run_id = uuid4()
    recorder = AsyncMock()
    mw = CostMiddleware(
        pricing=_pricing(),
        model_name=_MODEL,
        run_id=run_id,
        persona_name="researcher",
        recorder=recorder,
        budget_tracker=tracker,
    )
    handler = AsyncMock(side_effect=RuntimeError("model_error"))
    with pytest.raises(RuntimeError, match="model_error"):
        await mw.awrap_model_call(MagicMock(), handler)
    # recorder called with error status
    recorder.assert_awaited_once()
    record: dict[str, Any] = recorder.call_args[0][0]
    assert record["status"] == "error"
    assert record["error_code"] == "RuntimeError"
    # Budget should NOT be accumulated after an error
    run_spent = await tracker.get_spent(f"run:{run_id}")
    assert run_spent == 0.0
 # ---------------------------------------------------------------------------
 # Test: budget=None → existing behaviour preserved (no BudgetExhaustedError)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_no_budget_tracker_still_works() -> None:
    recorder = AsyncMock()
    mw = CostMiddleware(
        pricing=_pricing(),
        model_name=_MODEL,
        recorder=recorder,
        budget_tracker=None,
    )
    response = _make_response()
    handler = AsyncMock(return_value=response)
    result = await mw.awrap_model_call(MagicMock(), handler)
    assert result is response
    recorder.assert_awaited_once()
    record: dict[str, Any] = recorder.call_args[0][0]
    assert record["status"] == "ok"
--- a/my-deepagent/tests/integration/test_persistence.py
+++ b/my-deepagent/tests/integration/test_persistence.py
@@ -5,6 +5,7 @@ from __future__ import annotations
 import subprocess
 import sys
 import uuid
 from collections.abc import AsyncGenerator
 from pathlib import Path
 from typing import Any
@@ -73,10 +74,10 @@ def db_url(tmp_path: Path) -> str:
@pytest_asyncio.fixture()
-async def db(db_url: str) -> Database:  # type: ignore[misc]
+async def db(db_url: str) -> AsyncGenerator[Database, None]:
    database = Database(db_url)
    await database.init_schema()
-    yield database  # type: ignore[misc]
+    yield database
    await database.dispose()
--- a/my-deepagent/tests/integration/test_recovery.py
+++ b/my-deepagent/tests/integration/test_recovery.py
@@ -0,0 +1,307 @@
 """Integration tests for crash recovery sweep (sweep_orphan_runs)."""
 from __future__ import annotations
 import uuid
 from collections.abc import AsyncGenerator
 from pathlib import Path
 import pytest
 import pytest_asyncio
 from sqlalchemy import select
 from sqlalchemy.exc import IntegrityError
 from my_deepagent.enums import RunPhaseState, RunState
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import (
    RunEventRow,
    RunPhaseRow,
    RunRow,
    WorkflowTemplateRow,
 )
 from my_deepagent.recovery import SweepReport, sweep_orphan_runs
 from my_deepagent.run_event import RunEventType
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 _NOW = "2026-05-14T00:00:00+00:00"
 def _make_id() -> str:
    return str(uuid.uuid4())
 def _template_row(template_id: str | None = None) -> WorkflowTemplateRow:
    tid = template_id or _make_id()
    return WorkflowTemplateRow(
        id=tid,
        name="test-wf",
        version=1,
        hash=tid,
        definition={},
        created_at=_NOW,
    )
 def _run_row(
    *,
    run_id: str | None = None,
    template_id: str,
    state: str = RunState.EXECUTING.value,
    repo_path: str = "/repo",
    base_branch: str = "main",
 ) -> RunRow:
    rid = run_id or _make_id()
    return RunRow(
        id=rid,
        template_id=template_id,
        template_hash="a" * 64,
        state=state,
        repo_path=repo_path,
        base_branch=base_branch,
        worktree_root="/wt",
        created_at=_NOW,
        updated_at=_NOW,
    )
 def _phase_row(run_id: str, state: str = RunPhaseState.RUNNING.value) -> RunPhaseRow:
    return RunPhaseRow(
        id=_make_id(),
        run_id=run_id,
        phase_key="spec",
        seq=0,
        state=state,
        attempts=1,
        started_at=_NOW,
    )
 # ---------------------------------------------------------------------------
 # Fixtures
 # ---------------------------------------------------------------------------
@pytest_asyncio.fixture()
 async def db(tmp_path: Path) -> AsyncGenerator[Database, None]:
    url = f"sqlite+aiosqlite:///{tmp_path}/test.db"
    database = Database(url)
    await database.init_schema()
    yield database
    await database.dispose()
 # ---------------------------------------------------------------------------
 # Tests
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_sweep_with_no_orphans_returns_empty_report(db: Database) -> None:
    """Sweep on empty DB returns SweepReport with zero counts."""
    report = await sweep_orphan_runs(db)
    assert isinstance(report, SweepReport)
    assert report.total == 0
    assert report.failed_runs == ()
    assert report.failed_phases == ()
@pytest.mark.asyncio
 async def test_sweep_marks_executing_run_as_failed(db: Database) -> None:
    """A run in EXECUTING state is marked FAILED after sweep."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.EXECUTING.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    report = await sweep_orphan_runs(db)
    assert len(report.failed_runs) == 1
    async with db.session() as s:
        refreshed = await s.get(RunRow, run.id)
    assert refreshed is not None
    assert refreshed.state == RunState.FAILED.value
    assert refreshed.ended_at is not None
@pytest.mark.asyncio
 async def test_sweep_marks_paused_run_as_failed(db: Database) -> None:
    """A run in PAUSED state is marked FAILED after sweep."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.PAUSED.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    report = await sweep_orphan_runs(db)
    assert len(report.failed_runs) == 1
    async with db.session() as s:
        refreshed = await s.get(RunRow, run.id)
    assert refreshed is not None
    assert refreshed.state == RunState.FAILED.value
@pytest.mark.asyncio
 async def test_sweep_leaves_completed_run_alone(db: Database) -> None:
    """A run in COMPLETED state is NOT touched by the sweep."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.COMPLETED.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    report = await sweep_orphan_runs(db)
    assert report.total == 0
    async with db.session() as s:
        refreshed = await s.get(RunRow, run.id)
    assert refreshed is not None
    assert refreshed.state == RunState.COMPLETED.value
@pytest.mark.asyncio
 async def test_sweep_cascades_phase_states(db: Database) -> None:
    """Orphan phases belonging to a swept run are also marked FAILED."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.EXECUTING.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    phase = _phase_row(run.id, state=RunPhaseState.RUNNING.value)
    async with db.session() as s:
        s.add(phase)
    report = await sweep_orphan_runs(db)
    assert len(report.failed_runs) == 1
    assert len(report.failed_phases) == 1
    async with db.session() as s:
        refreshed_phase = await s.get(RunPhaseRow, phase.id)
    assert refreshed_phase is not None
    assert refreshed_phase.state == RunPhaseState.FAILED.value
    assert refreshed_phase.ended_at is not None
@pytest.mark.asyncio
 async def test_sweep_emits_run_failed_event(db: Database) -> None:
    """Sweep emits exactly one run.failed event per orphan run."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.EXECUTING.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    await sweep_orphan_runs(db)
    async with db.session() as s:
        events = (
            (
                await s.execute(
                    select(RunEventRow)
                    .where(RunEventRow.run_id == run.id)
                    .where(RunEventRow.type == RunEventType.RUN_FAILED.value)
                )
            )
            .scalars()
            .all()
        )
    assert len(events) == 1
    assert events[0].payload.get("reason") == "process_restart_unrecovered"
@pytest.mark.asyncio
 async def test_sweep_idempotent_no_duplicate_event(db: Database) -> None:
    """Running sweep twice does not create duplicate events (ON CONFLICT DO NOTHING)."""
    tid = _make_id()
    run = _run_row(template_id=tid, state=RunState.EXECUTING.value)
    async with db.session() as s:
        s.add(_template_row(tid))
    async with db.session() as s:
        s.add(run)
    # First sweep marks the run as failed.
    report1 = await sweep_orphan_runs(db)
    assert len(report1.failed_runs) == 1
    # Second sweep: no more non-terminal runs, no duplicate events.
    report2 = await sweep_orphan_runs(db)
    assert report2.total == 0
    async with db.session() as s:
        events = (
            (
                await s.execute(
                    select(RunEventRow)
                    .where(RunEventRow.run_id == run.id)
                    .where(RunEventRow.type == RunEventType.RUN_FAILED.value)
                )
            )
            .scalars()
            .all()
        )
    assert len(events) == 1
@pytest.mark.asyncio
 async def test_sweep_frees_active_run_slot(db: Database) -> None:
    """After sweep, a second run with same (repo_path, base_branch) can be inserted.
    Without sweep: the partial unique index ux_active_run_repo_base prevents a second
    active run for the same (repo_path, base_branch). After sweep marks the first run
    FAILED, the uniqueness slot is freed and the second insert succeeds.
    """
    repo = "/unique-repo"
    branch = "main"
    tid1 = _make_id()
    tid2 = _make_id()
    run1 = _run_row(
        template_id=tid1,
        state=RunState.EXECUTING.value,
        repo_path=repo,
        base_branch=branch,
    )
    async with db.session() as s:
        s.add(_template_row(tid1))
        s.add(_template_row(tid2))
    async with db.session() as s:
        s.add(run1)
    # A second executing run for the same (repo, branch) must raise IntegrityError.
    run2 = _run_row(
        template_id=tid2,
        state=RunState.EXECUTING.value,
        repo_path=repo,
        base_branch=branch,
    )
    with pytest.raises(IntegrityError):
        async with db.session() as s:
            s.add(run2)
    # Sweep frees the slot.
    report = await sweep_orphan_runs(db)
    assert len(report.failed_runs) == 1
    # Now the second insert should succeed.
    run3 = _run_row(
        template_id=tid2,
        state=RunState.EXECUTING.value,
        repo_path=repo,
        base_branch=branch,
    )
    async with db.session() as s:
        s.add(run3)
    async with db.session() as s:
        refreshed = await s.get(RunRow, run3.id)
    assert refreshed is not None
    assert refreshed.state == RunState.EXECUTING.value
--- a/my-deepagent/tests/unit/test_audit.py
+++ b/my-deepagent/tests/unit/test_audit.py
@@ -0,0 +1,128 @@
 """Unit tests for src/my_deepagent/audit.py."""
 from __future__ import annotations
 import json
 import os
 from pathlib import Path
 from typing import Any
 import pytest
 from my_deepagent.audit import (
    append_audit_record,
    audit_path,
    make_audit_recorder,
    read_audit_records,
 )
 # ---------------------------------------------------------------------------
 # audit_path
 # ---------------------------------------------------------------------------
 def test_audit_path_returns_correct_location(tmp_path: Path) -> None:
    expected = tmp_path / "audit.jsonl"
    assert audit_path(tmp_path) == expected
 # ---------------------------------------------------------------------------
 # append_audit_record
 # ---------------------------------------------------------------------------
 def test_append_audit_record_creates_file_with_one_line(tmp_path: Path) -> None:
    record: dict[str, Any] = {"tool_name": "read_file", "args": {"path": "x.py"}}
    append_audit_record(tmp_path, record)
    target = audit_path(tmp_path)
    assert target.is_file()
    lines = [ln for ln in target.read_text(encoding="utf-8").splitlines() if ln.strip()]
    assert len(lines) == 1
    parsed = json.loads(lines[0])
    assert parsed["tool_name"] == "read_file"
    assert "ts" in parsed
 def test_append_audit_record_accumulates_multiple_records(tmp_path: Path) -> None:
    for i in range(5):
        append_audit_record(tmp_path, {"seq": i})
    records = read_audit_records(tmp_path)
    assert len(records) == 5
    seqs = [r["seq"] for r in records]
    assert seqs == list(range(5))
 def test_append_audit_record_file_permission_is_0600(tmp_path: Path) -> None:
    append_audit_record(tmp_path, {"tool_name": "test"})
    target = audit_path(tmp_path)
    mode = os.stat(target).st_mode & 0o777
    assert mode == 0o600
 def test_append_audit_record_adds_ts_field(tmp_path: Path) -> None:
    append_audit_record(tmp_path, {"tool_name": "execute"})
    records = read_audit_records(tmp_path)
    assert len(records) == 1
    assert "ts" in records[0]
    # ts should be a non-empty ISO string
    assert len(records[0]["ts"]) > 0
 # ---------------------------------------------------------------------------
 # read_audit_records
 # ---------------------------------------------------------------------------
 def test_read_audit_records_returns_empty_when_file_missing(tmp_path: Path) -> None:
    result = read_audit_records(tmp_path)
    assert result == []
 def test_read_audit_records_returns_empty_for_empty_file(tmp_path: Path) -> None:
    target = audit_path(tmp_path)
    target.write_text("", encoding="utf-8")
    result = read_audit_records(tmp_path)
    assert result == []
 def test_read_audit_records_with_limit_returns_last_n(tmp_path: Path) -> None:
    for i in range(10):
        append_audit_record(tmp_path, {"seq": i})
    result = read_audit_records(tmp_path, limit=3)
    assert len(result) == 3
    # should be the last 3 records (seq 7, 8, 9)
    assert result[0]["seq"] == 7
    assert result[1]["seq"] == 8
    assert result[2]["seq"] == 9
 def test_read_audit_records_skips_corrupted_lines(tmp_path: Path) -> None:
    target = audit_path(tmp_path)
    # Write one valid + one corrupt + one valid line
    valid1 = json.dumps({"tool_name": "first"}) + "\n"
    corrupt = "NOT_VALID_JSON{\n"
    valid2 = json.dumps({"tool_name": "third"}) + "\n"
    target.write_text(valid1 + corrupt + valid2, encoding="utf-8")
    records = read_audit_records(tmp_path)
    assert len(records) == 2
    assert records[0]["tool_name"] == "first"
    assert records[1]["tool_name"] == "third"
 # ---------------------------------------------------------------------------
 # make_audit_recorder
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_make_audit_recorder_writes_record(tmp_path: Path) -> None:
    recorder = make_audit_recorder(tmp_path)
    await recorder({"tool_name": "write_file", "args": {"path": "out.txt"}})
    records = read_audit_records(tmp_path)
    assert len(records) == 1
    assert records[0]["tool_name"] == "write_file"
--- a/my-deepagent/tests/unit/test_cli.py
+++ b/my-deepagent/tests/unit/test_cli.py
@@ -0,0 +1,185 @@
 """Unit tests for the my-deepagent CLI (typer CliRunner)."""
 from __future__ import annotations
 from pathlib import Path
 import pytest
 from typer.testing import CliRunner
 import my_deepagent.keys as keys_module
 from my_deepagent.cli.main import app
 runner = CliRunner()
 class _FakeKeyring:
    def __init__(self) -> None:
        self.store: dict[tuple[str, str], str] = {}
    def get_password(self, service: str, username: str) -> str | None:
        return self.store.get((service, username))
    def set_password(self, service: str, username: str, value: str) -> None:
        self.store[(service, username)] = value
    def delete_password(self, service: str, username: str) -> None:
        self.store.pop((service, username), None)
@pytest.fixture
 def fake_keyring(monkeypatch: pytest.MonkeyPatch) -> _FakeKeyring:
    fake = _FakeKeyring()
    monkeypatch.setattr(keys_module.keyring, "get_password", fake.get_password)
    monkeypatch.setattr(keys_module.keyring, "set_password", fake.set_password)
    monkeypatch.setattr(keys_module.keyring, "delete_password", fake.delete_password)
    return fake
 def test_help_exit_zero() -> None:
    result = runner.invoke(app, ["--help"])
    assert result.exit_code == 0
    assert "mydeepagent" in result.output.lower() or "Usage" in result.output
 def test_no_subcommand_launches_repl_governance_check(
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    """Without governance consent, the REPL exits 1 with an error."""
    import my_deepagent.governance as gov_module
    monkeypatch.setattr(gov_module, "has_consent", lambda _: False)
    result = runner.invoke(app, [])
    # governance_not_accepted raises MyDeepAgentError which surfaces as exit 1
    assert result.exit_code == 1
 def test_doctor_exits_zero_normal_python(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
    import sys
    import my_deepagent.cli.doctor as doctor_module
    # Ensure version is in valid range
    monkeypatch.setattr(sys, "version_info", (3, 12, 0, "final", 0))
    # Patch has_consent inside the doctor module's namespace
    monkeypatch.setattr(doctor_module, "has_consent", lambda _: True)
    # Stub out async checks so doctor finishes without real DB / network
    monkeypatch.setattr(
        doctor_module,
        "_check_openrouter_api_key",
        lambda cfg: doctor_module.CheckResult("openrouter_api_key", "warn", "mocked"),
    )
    async def _fake_ping(cfg: object) -> doctor_module.CheckResult:
        return doctor_module.CheckResult("openrouter_ping", "warn", "mocked")
    async def _fake_disk(cfg: object) -> doctor_module.CheckResult:
        return doctor_module.CheckResult("disk+db", "ok", "free=99.9GB, sqlite_integrity=ok")
    monkeypatch.setattr(doctor_module, "_check_openrouter_ping_and_upsert", _fake_ping)
    monkeypatch.setattr(doctor_module, "_check_disk_and_db", _fake_disk)
    result = runner.invoke(app, ["doctor"])
    assert result.exit_code == 0
 def test_doctor_exits_one_on_bad_python(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
    import sys
    monkeypatch.setattr(sys, "version_info", (3, 10, 0, "final", 0))
    monkeypatch.setattr(sys, "version", "3.10.0 (default, ...)")
    result = runner.invoke(app, ["doctor"])
    assert result.exit_code == 1
 def test_keys_empty_keyring(fake_keyring: _FakeKeyring) -> None:
    result = runner.invoke(app, ["keys"])
    assert result.exit_code == 0
    # Should show "none" message (Korean or English)
    assert "없음" in result.output or "none" in result.output.lower()
 def test_login_stores_key(fake_keyring: _FakeKeyring) -> None:
    result = runner.invoke(app, ["login", "openrouter"], input="sk-or-test-abc123\n")
    assert result.exit_code == 0
    assert fake_keyring.store.get(("my-deepagent", "openrouter_api_key")) == "sk-or-test-abc123"
 def test_login_empty_input_exits_one(fake_keyring: _FakeKeyring) -> None:
    result = runner.invoke(app, ["login", "openrouter"], input="\n")
    assert result.exit_code == 1
 def test_logout_after_login_removes_key(fake_keyring: _FakeKeyring) -> None:
    runner.invoke(app, ["login", "openrouter"], input="sk-or-test\n")
    result = runner.invoke(app, ["logout", "openrouter"])
    assert result.exit_code == 0
    assert fake_keyring.store.get(("my-deepagent", "openrouter_api_key")) is None
 def test_logout_not_found_shows_message(fake_keyring: _FakeKeyring) -> None:
    result = runner.invoke(app, ["logout", "openrouter"])
    assert result.exit_code == 0
    assert "keyring" in result.output or "없습니다" in result.output or "not_found" in result.output
 def test_keys_shows_entry_after_login(fake_keyring: _FakeKeyring) -> None:
    runner.invoke(app, ["login", "openrouter"], input="sk-or-v1-abcdefgh1234\n")
    result = runner.invoke(app, ["keys"])
    assert result.exit_code == 0
    assert "openrouter" in result.output
    assert "sk-or-v1" in result.output
 def test_init_governance_declined_exits_one(
    fake_keyring: _FakeKeyring, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    import my_deepagent.governance as gov_module
    monkeypatch.setattr(gov_module, "has_consent", lambda _: False)
    # Input: decline governance
    result = runner.invoke(app, ["init"], input="no\n")
    assert result.exit_code == 1
 def test_init_governance_accepted_saves_key(
    fake_keyring: _FakeKeyring, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    import sys
    import my_deepagent.cli.doctor as doctor_module
    import my_deepagent.cli.init as init_module
    import my_deepagent.governance as gov_module
    recorded: list[Path] = []
    def fake_record_consent(data_dir: Path) -> None:
        recorded.append(data_dir)
    monkeypatch.setattr(gov_module, "has_consent", lambda _: False)
    monkeypatch.setattr(init_module, "record_consent", fake_record_consent)
    # Ensure Python version check passes
    monkeypatch.setattr(sys, "version_info", (3, 12, 0, "final", 0))
    # doctor_command() is called inside init — patch its async sub-checks so it
    # completes without network / DB access and passes governance in doctor's namespace.
    monkeypatch.setattr(doctor_module, "has_consent", lambda _: True)
    monkeypatch.setattr(
        doctor_module,
        "_check_openrouter_api_key",
        lambda cfg: doctor_module.CheckResult("openrouter_api_key", "warn", "mocked"),
    )
    async def _fake_ping(cfg: object) -> doctor_module.CheckResult:
        return doctor_module.CheckResult("openrouter_ping", "warn", "mocked")
    async def _fake_disk(cfg: object) -> doctor_module.CheckResult:
        return doctor_module.CheckResult("disk+db", "ok", "free=99.9GB, sqlite_integrity=ok")
    monkeypatch.setattr(doctor_module, "_check_openrouter_ping_and_upsert", _fake_ping)
    monkeypatch.setattr(doctor_module, "_check_disk_and_db", _fake_disk)
    # Input: accept governance, then provide API key
    result = runner.invoke(app, ["init"], input="yes\nsk-or-init-test\n")
    assert result.exit_code == 0
    assert len(recorded) == 1
    assert fake_keyring.store.get(("my-deepagent", "openrouter_api_key")) == "sk-or-init-test"
--- a/my-deepagent/tests/unit/test_cli_runs.py
+++ b/my-deepagent/tests/unit/test_cli_runs.py
@@ -0,0 +1,232 @@
 """Unit tests for `mydeepagent runs list / show / resume` CLI commands."""
 from __future__ import annotations
 import asyncio
 import uuid
 from pathlib import Path
 from unittest.mock import MagicMock, patch
 from typer.testing import CliRunner
 from my_deepagent.cli.main import app
 from my_deepagent.enums import RunState
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import RunRow, WorkflowTemplateRow
 runner = CliRunner()
 _NOW = "2026-05-14T00:00:00+00:00"
 def _make_id() -> str:
    return str(uuid.uuid4())
 def _template_row(template_id: str) -> WorkflowTemplateRow:
    return WorkflowTemplateRow(
        id=template_id,
        name="test-wf",
        version=1,
        hash=template_id,
        definition={},
        created_at=_NOW,
    )
 def _run_row(
    *,
    run_id: str | None = None,
    template_id: str,
    state: str = RunState.COMPLETED.value,
    repo_path: str = "/my/repo",
    base_branch: str = "main",
 ) -> RunRow:
    rid = run_id or _make_id()
    return RunRow(
        id=rid,
        template_id=template_id,
        template_hash="a" * 64,
        state=state,
        repo_path=repo_path,
        base_branch=base_branch,
        worktree_root="/wt",
        created_at=_NOW,
        updated_at=_NOW,
    )
 # ---------------------------------------------------------------------------
 # Helpers: set up in-memory DB and patch load_config + Database
 # ---------------------------------------------------------------------------
 def _setup_db_with_run(
    tmp_path: Path,
    state: str = RunState.COMPLETED.value,
    repo_path: str = "/my/repo",
 ) -> tuple[str, str]:
    """Create a fresh DB with one run. Returns (db_url, run_id)."""
    db_url = f"sqlite+aiosqlite:///{tmp_path / 'test.db'}"
    async def _init() -> str:
        db = Database(db_url)
        await db.init_schema()
        tid = _make_id()
        run_id = _make_id()
        async with db.session() as s:
            s.add(_template_row(tid))
        async with db.session() as s:
            s.add(
                _run_row(
                    run_id=run_id,
                    template_id=tid,
                    state=state,
                    repo_path=repo_path,
                )
            )
        await db.dispose()
        return run_id
    return db_url, asyncio.run(_init())
 def _setup_empty_db(tmp_path: Path) -> str:
    """Create a fresh empty DB. Returns db_url."""
    db_url = f"sqlite+aiosqlite:///{tmp_path / 'empty.db'}"
    async def _init() -> None:
        db = Database(db_url)
        await db.init_schema()
        await db.dispose()
    asyncio.run(_init())
    return db_url
 # ---------------------------------------------------------------------------
 # Tests: runs list
 # ---------------------------------------------------------------------------
 def test_runs_list_empty_db(tmp_path: Path) -> None:
    """``runs list`` on empty DB prints '(no runs)'."""
    db_url = _setup_empty_db(tmp_path)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "list"])
    assert result.exit_code == 0, result.output
    assert "(no runs)" in result.output
 def test_runs_list_with_one_run(tmp_path: Path) -> None:
    """``runs list`` shows a table row when one run exists."""
    db_url, run_id = _setup_db_with_run(tmp_path)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "list"])
    assert result.exit_code == 0, result.output
    # Table should contain the first 8 chars of the run_id and the state.
    assert run_id[:8] in result.output
    assert RunState.COMPLETED.value in result.output
 def test_runs_list_state_filter(tmp_path: Path) -> None:
    """``runs list --state completed`` only shows completed runs."""
    db_url, _run_id = _setup_db_with_run(tmp_path, state=RunState.COMPLETED.value)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        # Filter for failed → should return nothing.
        result = runner.invoke(app, ["runs", "list", "--state", "failed"])
    assert result.exit_code == 0, result.output
    assert "(no runs)" in result.output
 # ---------------------------------------------------------------------------
 # Tests: runs show
 # ---------------------------------------------------------------------------
 def test_runs_show_unknown_run_id(tmp_path: Path) -> None:
    """``runs show <unknown>`` exits with code 1."""
    db_url = _setup_empty_db(tmp_path)
    fake_id = _make_id()
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "show", fake_id])
    assert result.exit_code == 1
 def test_runs_show_with_full_id(tmp_path: Path) -> None:
    """``runs show <full-uuid>`` displays run details."""
    db_url, run_id = _setup_db_with_run(tmp_path)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "show", run_id])
    assert result.exit_code == 0, result.output
    assert run_id in result.output
    assert RunState.COMPLETED.value in result.output
 def test_runs_show_with_prefix(tmp_path: Path) -> None:
    """``runs show <6+ char prefix>`` resolves to the correct run."""
    db_url, run_id = _setup_db_with_run(tmp_path)
    prefix = run_id[:8]
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "show", prefix])
    assert result.exit_code == 0, result.output
    assert run_id in result.output
 # ---------------------------------------------------------------------------
 # Tests: runs resume
 # ---------------------------------------------------------------------------
 def test_runs_resume_completed_run_exits_one(tmp_path: Path) -> None:
    """``runs resume`` on a completed run exits 1 and says 'already terminal'."""
    db_url, run_id = _setup_db_with_run(tmp_path, state=RunState.COMPLETED.value)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "resume", run_id])
    assert result.exit_code == 1
    assert "already terminal" in result.output
 def test_runs_resume_failed_run_exits_one(tmp_path: Path) -> None:
    """``runs resume`` on a failed run exits 1 and says 'already terminal'."""
    db_url, run_id = _setup_db_with_run(tmp_path, state=RunState.FAILED.value)
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "resume", run_id])
    assert result.exit_code == 1
    assert "already terminal" in result.output
 def test_runs_resume_unknown_id_exits_one(tmp_path: Path) -> None:
    """``runs resume <unknown>`` exits 1."""
    db_url = _setup_empty_db(tmp_path)
    fake_id = _make_id()
    with patch("my_deepagent.cli.runs.load_config") as mock_cfg:
        mock_cfg.return_value = MagicMock(database_url=db_url)
        result = runner.invoke(app, ["runs", "resume", fake_id])
    assert result.exit_code == 1
--- a/my-deepagent/tests/unit/test_config.py
+++ b/my-deepagent/tests/unit/test_config.py
@@ -53,7 +53,7 @@ def test_default_persona(monkeypatch: pytest.MonkeyPatch) -> None:
 def test_default_openrouter_api_key_is_none(monkeypatch: pytest.MonkeyPatch) -> None:
    _clear_env(monkeypatch)
    # _env_file=None bypasses any .env that may exist in the cwd (e.g. dev keys).
-    cfg = Config(_env_file=None)  # type: ignore[call-arg]
+    cfg = Config(_env_file=None)
    assert cfg.openrouter_api_key is None
--- a/my-deepagent/tests/unit/test_cost_estimator.py
+++ b/my-deepagent/tests/unit/test_cost_estimator.py
@@ -0,0 +1,149 @@
 """Unit tests for src/my_deepagent/monitoring/cost_estimator.py."""
 from __future__ import annotations
 from unittest.mock import MagicMock
 import pytest
 from my_deepagent.monitoring.cost_estimator import (
    _DEFAULT_INPUT_TOKENS,
    _DEFAULT_OUTPUT_TOKENS,
    PhaseCostEstimate,
    WorkflowCostEstimate,
    estimate_phase,
    estimate_workflow,
 )
 from my_deepagent.monitoring.pricing import ModelPrice, PricingCache
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def _make_pricing(model: str = "anthropic/claude-sonnet-4-6") -> PricingCache:
    cache = PricingCache()
    cache.set(
        [
            ModelPrice(
                model=model,
                input_per_1k_usd=0.003,
                output_per_1k_usd=0.015,
                context_length=200000,
            )
        ]
    )
    return cache
 def _make_persona(
    model: str = "anthropic/claude-sonnet-4-6",
    max_tokens: int | None = None,
 ) -> object:
    p = MagicMock()
    p.name = "test-persona"
    p.version = 1
    p.model = model
    p.model_params = {"max_tokens": max_tokens} if max_tokens else {}
    return p
 def _make_phase(key: str = "spec") -> MagicMock:
    phase = MagicMock()
    phase.key = key
    return phase
 def _make_binding(persona: object) -> MagicMock:
    b = MagicMock()
    b.persona = persona
    return b
 # ---------------------------------------------------------------------------
 # estimate_phase
 # ---------------------------------------------------------------------------
 def test_estimate_phase_known_model_correct_cost() -> None:
    pricing = _make_pricing("anthropic/claude-sonnet-4-6")
    persona = _make_persona("anthropic/claude-sonnet-4-6")
    phase = _make_phase("spec")
    est = estimate_phase(phase, persona, pricing)  # type: ignore[arg-type]
    expected_cost = _DEFAULT_INPUT_TOKENS / 1000.0 * 0.003 + _DEFAULT_OUTPUT_TOKENS / 1000.0 * 0.015
    assert isinstance(est, PhaseCostEstimate)
    assert est.phase_key == "spec"
    assert est.persona_name == "test-persona@1"
    assert est.model == "anthropic/claude-sonnet-4-6"
    assert est.estimated_input_tokens == _DEFAULT_INPUT_TOKENS
    assert est.estimated_output_tokens == _DEFAULT_OUTPUT_TOKENS
    assert est.estimated_cost_usd == pytest.approx(expected_cost)
 def test_estimate_phase_unknown_model_returns_zero_cost() -> None:
    pricing = PricingCache()  # empty
    persona = _make_persona("unknown/model-xyz")
    phase = _make_phase("unknown_phase")
    est = estimate_phase(phase, persona, pricing)  # type: ignore[arg-type]
    assert est.estimated_cost_usd == 0.0
 def test_estimate_phase_max_tokens_override() -> None:
    pricing = _make_pricing()
    persona = _make_persona(max_tokens=2000)
    phase = _make_phase()
    est = estimate_phase(phase, persona, pricing)  # type: ignore[arg-type]
    assert est.estimated_output_tokens == 2000
 def test_estimate_phase_default_output_tokens_when_no_max_tokens() -> None:
    pricing = _make_pricing()
    persona = _make_persona()  # no max_tokens
    phase = _make_phase()
    est = estimate_phase(phase, persona, pricing)  # type: ignore[arg-type]
    assert est.estimated_output_tokens == _DEFAULT_OUTPUT_TOKENS
 # ---------------------------------------------------------------------------
 # estimate_workflow
 # ---------------------------------------------------------------------------
 def test_estimate_workflow_sums_phases() -> None:
    pricing = _make_pricing()
    phase1 = _make_phase("phase1")
    phase1.role = "researcher"
    phase2 = _make_phase("phase2")
    phase2.role = "reviewer"
    template = MagicMock()
    template.phases = [phase1, phase2]
    persona1 = _make_persona()
    persona2 = _make_persona()
    bindings = {
        "researcher": _make_binding(persona1),
        "reviewer": _make_binding(persona2),
    }
    est = estimate_workflow(template, bindings, pricing)  # type: ignore[arg-type]
    assert isinstance(est, WorkflowCostEstimate)
    assert len(est.phases) == 2
    assert est.total_usd == pytest.approx(sum(p.estimated_cost_usd for p in est.phases))
    assert est.total_usd > 0.0
 def test_estimate_workflow_total_greater_than_zero_with_known_models() -> None:
    pricing = _make_pricing()
    phase = _make_phase("spec")
    phase.role = "researcher"
    template = MagicMock()
    template.phases = [phase]
    persona = _make_persona()
    bindings = {"researcher": _make_binding(persona)}
    est = estimate_workflow(template, bindings, pricing)  # type: ignore[arg-type]
    assert est.total_usd > 0.0
--- a/my-deepagent/tests/unit/test_doctor.py
+++ b/my-deepagent/tests/unit/test_doctor.py
@@ -0,0 +1,355 @@
 """Unit tests for mydeepagent doctor — 8-check full diagnostic suite."""
 from __future__ import annotations
 import shutil
 import subprocess
 import sys
 from pathlib import Path
 from unittest.mock import AsyncMock, MagicMock
 import httpx
 import pytest
 from my_deepagent.cli.doctor import (
    _check_config_and_governance,
    _check_disk_and_db,
    _check_git,
    _check_openrouter_api_key,
    _check_openrouter_ping_and_upsert,
    _check_python,
    _check_uv,
    _check_workspace,
 )
 from my_deepagent.errors import MyDeepAgentError
 # ---------------------------------------------------------------------------
 # 1. _check_python
 # ---------------------------------------------------------------------------
 def test_check_python_ok_in_312(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(sys, "version_info", (3, 12, 0, "final", 0))
    monkeypatch.setattr(sys, "version", "3.12.0 (default, ...)")
    result = _check_python()
    assert result.status == "ok"
    assert result.name == "python"
    assert "3.12.0" in result.detail
 def test_check_python_ok_in_313(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(sys, "version_info", (3, 13, 0, "final", 0))
    monkeypatch.setattr(sys, "version", "3.13.0 (default, ...)")
    result = _check_python()
    assert result.status == "ok"
 def test_check_python_fail_in_310(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(sys, "version_info", (3, 10, 0, "final", 0))
    monkeypatch.setattr(sys, "version", "3.10.0 (default, ...)")
    result = _check_python()
    assert result.status == "fail"
    assert "3.10.0" in result.detail
 def test_check_python_fail_in_314(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(sys, "version_info", (3, 14, 0, "alpha", 0))
    monkeypatch.setattr(sys, "version", "3.14.0a1 (default, ...)")
    result = _check_python()
    assert result.status == "fail"
 # ---------------------------------------------------------------------------
 # 2. _check_uv
 # ---------------------------------------------------------------------------
 def test_check_uv_warn_when_missing(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(shutil, "which", lambda _: None)
    result = _check_uv()
    assert result.status == "warn"
    assert "not on PATH" in result.detail
 def test_check_uv_ok_when_present(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(shutil, "which", lambda _: "/usr/local/bin/uv")
    fake_run = MagicMock()
    fake_run.return_value.stdout = "uv 0.5.0"
    monkeypatch.setattr(subprocess, "run", fake_run)
    result = _check_uv()
    assert result.status == "ok"
    assert "uv 0.5.0" in result.detail
 def test_check_uv_warn_on_timeout(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(shutil, "which", lambda _: "/usr/local/bin/uv")
    monkeypatch.setattr(
        subprocess,
        "run",
        MagicMock(side_effect=subprocess.TimeoutExpired(["uv"], 5)),
    )
    result = _check_uv()
    assert result.status == "warn"
    assert "version probe failed" in result.detail
 # ---------------------------------------------------------------------------
 # 3. _check_git
 # ---------------------------------------------------------------------------
 def test_check_git_warn_when_missing(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(shutil, "which", lambda _: None)
    result = _check_git()
    assert result.status == "warn"
    assert "not on PATH" in result.detail
 def test_check_git_ok_when_present(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(shutil, "which", lambda _: "/usr/bin/git")
    fake_run = MagicMock()
    fake_run.return_value.stdout = "git version 2.40.0"
    monkeypatch.setattr(subprocess, "run", fake_run)
    result = _check_git()
    assert result.status == "ok"
    assert "2.40.0" in result.detail
 # ---------------------------------------------------------------------------
 # 4. _check_workspace
 # ---------------------------------------------------------------------------
 def test_check_workspace_ok_when_writable(tmp_path: Path) -> None:
    cfg = MagicMock()
    cfg.workspace_root = tmp_path
    result = _check_workspace(cfg)
    assert result.status == "ok"
    assert str(tmp_path) in result.detail
 def test_check_workspace_creates_if_missing(tmp_path: Path) -> None:
    new_dir = tmp_path / "new_workspace"
    cfg = MagicMock()
    cfg.workspace_root = new_dir
    result = _check_workspace(cfg)
    assert result.status == "ok"
    assert new_dir.exists()
 def test_check_workspace_fail_if_not_writable(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    cfg = MagicMock()
    cfg.workspace_root = tmp_path
    def _raise_oserror(self: object, data: str, **kwargs: object) -> None:
        raise OSError("read-only filesystem")
    monkeypatch.setattr(Path, "write_text", _raise_oserror)
    result = _check_workspace(cfg)
    assert result.status == "fail"
    assert "not writable" in result.detail
 # ---------------------------------------------------------------------------
 # 5. _check_config_and_governance
 # ---------------------------------------------------------------------------
 def test_check_governance_fail_without_consent(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    monkeypatch.setattr(doctor_module, "has_consent", lambda _: False)
    cfg = MagicMock()
    result = _check_config_and_governance(cfg)
    assert result.status == "fail"
    assert "mydeepagent init" in result.detail
 def test_check_governance_ok_with_consent(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
    import my_deepagent.cli.doctor as doctor_module
    monkeypatch.setattr(doctor_module, "has_consent", lambda _: True)
    cfg = MagicMock()
    cfg.data_dir = tmp_path
    result = _check_config_and_governance(cfg)
    assert result.status == "ok"
    assert str(tmp_path) in result.detail
 # ---------------------------------------------------------------------------
 # 6. _check_openrouter_api_key
 # ---------------------------------------------------------------------------
 def test_check_openrouter_api_key_ok(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    api_key = "sk-or-test-1234"
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", lambda cfg: api_key)
    cfg = MagicMock()
    result = _check_openrouter_api_key(cfg)
    assert result.status == "ok"
    assert str(len(api_key)) in result.detail  # "15 chars"
 def test_check_openrouter_api_key_fail(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    def _raise(cfg: object) -> str:
        raise MyDeepAgentError.human_required(
            "backend_auth_failed",
            message="missing",
            recovery_hint="run login",
        )
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", _raise)
    cfg = MagicMock()
    result = _check_openrouter_api_key(cfg)
    assert result.status == "fail"
    assert "run login" in result.detail
 # ---------------------------------------------------------------------------
 # 7. _check_openrouter_ping_and_upsert (async)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_check_openrouter_ping_warn_no_key(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    def _raise(cfg: object) -> str:
        raise MyDeepAgentError.human_required("backend_auth_failed", message="missing")
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", _raise)
    cfg = MagicMock()
    result = await _check_openrouter_ping_and_upsert(cfg)
    assert result.status == "warn"
    assert "skipped" in result.detail
@pytest.mark.asyncio
 async def test_check_openrouter_ping_ok(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    from my_deepagent.monitoring.pricing import ModelPrice
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", lambda cfg: "sk-test")
    fake_prices = [
        ModelPrice("model/a", 1.0, 2.0, 4096),
        ModelPrice("model/b", 0.5, 1.0, 8192),
    ]
    monkeypatch.setattr(
        doctor_module,
        "fetch_openrouter_pricing",
        AsyncMock(return_value=fake_prices),
    )
    monkeypatch.setattr(doctor_module, "_upsert_pricing", AsyncMock())
    cfg = MagicMock()
    result = await _check_openrouter_ping_and_upsert(cfg)
    assert result.status == "ok"
    assert "2 models" in result.detail
@pytest.mark.asyncio
 async def test_check_openrouter_ping_fail_401(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", lambda cfg: "sk-bad")
    mock_response = MagicMock()
    mock_response.status_code = 401
    http_err = httpx.HTTPStatusError("401", request=MagicMock(), response=mock_response)
    monkeypatch.setattr(
        doctor_module,
        "fetch_openrouter_pricing",
        AsyncMock(side_effect=http_err),
    )
    cfg = MagicMock()
    result = await _check_openrouter_ping_and_upsert(cfg)
    assert result.status == "fail"
    assert "401" in result.detail
@pytest.mark.asyncio
 async def test_check_openrouter_ping_warn_5xx(monkeypatch: pytest.MonkeyPatch) -> None:
    import my_deepagent.cli.doctor as doctor_module
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", lambda cfg: "sk-ok")
    mock_response = MagicMock()
    mock_response.status_code = 503
    http_err = httpx.HTTPStatusError("503", request=MagicMock(), response=mock_response)
    monkeypatch.setattr(
        doctor_module,
        "fetch_openrouter_pricing",
        AsyncMock(side_effect=http_err),
    )
    cfg = MagicMock()
    result = await _check_openrouter_ping_and_upsert(cfg)
    assert result.status == "warn"
    assert "503" in result.detail
@pytest.mark.asyncio
 async def test_check_openrouter_ping_warn_empty_response(
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    import my_deepagent.cli.doctor as doctor_module
    monkeypatch.setattr(doctor_module, "resolve_openrouter_api_key", lambda cfg: "sk-ok")
    monkeypatch.setattr(
        doctor_module,
        "fetch_openrouter_pricing",
        AsyncMock(return_value=[]),
    )
    cfg = MagicMock()
    result = await _check_openrouter_ping_and_upsert(cfg)
    assert result.status == "warn"
    assert "no models" in result.detail
 # ---------------------------------------------------------------------------
 # 8. _check_disk_and_db (async)
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_check_disk_and_db_ok(tmp_path: Path) -> None:
    cfg = MagicMock()
    cfg.workspace_root = tmp_path
    cfg.database_url = f"sqlite+aiosqlite:///{tmp_path}/test.sqlite3"
    result = await _check_disk_and_db(cfg)
    # Should be ok or warn depending on actual free space — never fail in tmp
    assert result.status in ("ok", "warn")
    assert "sqlite_integrity=ok" in result.detail
@pytest.mark.asyncio
 async def test_check_disk_and_db_warn_low_disk(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    # Simulate 5 GB free (warn zone: 2GB <= free < 10GB)
    class _FakeUsage:
        free: int = 5 * 1024**3
        total: int = 100 * 1024**3
        used: int = 95 * 1024**3
    monkeypatch.setattr(shutil, "disk_usage", lambda _: _FakeUsage())
    cfg = MagicMock()
    cfg.workspace_root = tmp_path
    cfg.database_url = f"sqlite+aiosqlite:///{tmp_path}/test.sqlite3"
    result = await _check_disk_and_db(cfg)
    assert result.status == "warn"
    assert "5.0GB" in result.detail
--- a/my-deepagent/tests/unit/test_engine_signals.py
+++ b/my-deepagent/tests/unit/test_engine_signals.py
@@ -0,0 +1,126 @@
 """Unit tests for WorkflowEngine SIGTERM/SIGINT graceful shutdown handlers."""
 from __future__ import annotations
 import asyncio
 import signal
 from pathlib import Path
 from typing import Any
 import pytest
 from my_deepagent.artifact_schema import ArtifactSchemaRegistry
 from my_deepagent.binding import BackendAvailability, PersonaConsentStore
 from my_deepagent.config import load_config
 from my_deepagent.engine import WorkflowEngine
 from my_deepagent.enums import Backend
 from my_deepagent.persistence.db import Database
 from my_deepagent.persona import load_personas_from_dir
 _DOCS = Path(__file__).resolve().parents[2] / "docs" / "schemas"
 _ARTIFACTS_ROOT = _DOCS / "artifacts"
 def _make_engine(tmp_path: Path) -> WorkflowEngine:
    cfg = load_config(
        workspace_root=tmp_path,
        data_dir=tmp_path / "data",
        database_url=f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}",
    )
    personas = load_personas_from_dir(_DOCS / "personas")
    registry = ArtifactSchemaRegistry(roots=[_ARTIFACTS_ROOT])
    consent_store = PersonaConsentStore(tmp_path / "consents.json")
    available_backends = BackendAvailability(available_backends=frozenset(Backend))
    async def _dummy_approval(payload: dict[str, Any], gates: list[str]) -> Any:
        raise NotImplementedError("approval not used in signal tests")
    db = Database(cfg.database_url)
    return WorkflowEngine(
        db=db,
        config=cfg,
        persona_pool=personas,
        artifact_registry=registry,
        consent_store=consent_store,
        available_backends=available_backends,
        approval_callback=_dummy_approval,
    )
@pytest.mark.asyncio
 async def test_shutdown_requested_false_initially(tmp_path: Path) -> None:
    """Engine starts with shutdown_requested == False."""
    engine = _make_engine(tmp_path)
    assert engine.shutdown_requested is False
@pytest.mark.asyncio
 async def test_on_signal_sets_shutdown_event(tmp_path: Path) -> None:
    """Calling _on_signal directly sets shutdown_requested to True."""
    engine = _make_engine(tmp_path)
    assert engine.shutdown_requested is False
    engine._on_signal(signal.SIGTERM)
    assert engine.shutdown_requested is True
@pytest.mark.asyncio
 async def test_install_signal_handlers_registers_sigterm(tmp_path: Path) -> None:
    """install_signal_handlers registers a SIGTERM handler on the running loop."""
    engine = _make_engine(tmp_path)
    async def _check() -> None:
        engine.install_signal_handlers()
        loop = asyncio.get_running_loop()
        # asyncio loop stores handlers in the private _signal_handlers dict (CPython impl).
        # We accept both: the private dict exists, or signal.getsignal returns our callable.
        # The private dict is preferred but may not exist on all platforms.
        handlers = getattr(loop, "_signal_handlers", {})
        if handlers:
            assert signal.SIGTERM in handlers, "SIGTERM not registered in loop._signal_handlers"
        else:
            # Fallback: just verify shutdown_requested works when _on_signal is called.
            engine._on_signal(signal.SIGTERM)
            assert engine.shutdown_requested is True
    await _check()
@pytest.mark.asyncio
 async def test_force_cancel_inflight_cancels_pending_tasks(tmp_path: Path) -> None:
    """_force_cancel_inflight cancels all tasks in _inflight_tasks that are not done."""
    engine = _make_engine(tmp_path)
    async def _long_running() -> None:
        await asyncio.sleep(1000)
    task: asyncio.Task[None] = asyncio.create_task(_long_running())
    engine._inflight_tasks.add(task)
    # Give the event loop a tick to start the task.
    await asyncio.sleep(0)
    assert not task.done()
    engine._force_cancel_inflight()
    # Give the event loop a tick to process the cancellation.
    await asyncio.sleep(0)
    assert task.cancelled()
@pytest.mark.asyncio
 async def test_force_cancel_inflight_skips_done_tasks(tmp_path: Path) -> None:
    """_force_cancel_inflight does not call cancel() on already-done tasks."""
    engine = _make_engine(tmp_path)
    async def _instant() -> str:
        return "done"
    task: asyncio.Task[str] = asyncio.create_task(_instant())
    await asyncio.sleep(0)  # let the task complete
    assert task.done()
    engine._inflight_tasks.add(task)
    # Should not raise; done tasks are skipped.
    engine._force_cancel_inflight()
    # Still done, not newly cancelled.
    assert task.done()
    assert not task.cancelled()
--- a/my-deepagent/tests/unit/test_enums.py
+++ b/my-deepagent/tests/unit/test_enums.py
@@ -20,28 +20,28 @@ from my_deepagent.enums import (
 def test_backend_openrouter_value() -> None:
-    assert Backend.OPENROUTER == "openrouter"
+    assert Backend.OPENROUTER == "openrouter"  # type: ignore[comparison-overlap]
 def test_backend_anthropic_value() -> None:
-    assert Backend.ANTHROPIC == "anthropic"
+    assert Backend.ANTHROPIC == "anthropic"  # type: ignore[comparison-overlap]
 def test_backend_openai_value() -> None:
-    assert Backend.OPENAI == "openai"
+    assert Backend.OPENAI == "openai"  # type: ignore[comparison-overlap]
 def test_backend_google_value() -> None:
-    assert Backend.GOOGLE == "google"
+    assert Backend.GOOGLE == "google"  # type: ignore[comparison-overlap]
 def test_backend_fake_value() -> None:
-    assert Backend.FAKE == "fake"
+    assert Backend.FAKE == "fake"  # type: ignore[comparison-overlap]
 def test_backend_str_equality() -> None:
    # StrEnum members compare equal to their string values
-    assert Backend.OPENROUTER == "openrouter"
+    assert Backend.OPENROUTER == "openrouter"  # type: ignore[comparison-overlap]
    assert str(Backend.OPENROUTER) == "openrouter"
@@ -55,15 +55,15 @@ def test_capability_count() -> None:
 def test_capability_spec_write() -> None:
-    assert Capability.SPEC_WRITE == "spec_write"
+    assert Capability.SPEC_WRITE == "spec_write"  # type: ignore[comparison-overlap]
 def test_capability_code_edit() -> None:
-    assert Capability.CODE_EDIT == "code_edit"
+    assert Capability.CODE_EDIT == "code_edit"  # type: ignore[comparison-overlap]
 def test_capability_final_report_compose() -> None:
-    assert Capability.FINAL_REPORT_COMPOSE == "final_report_compose"
+    assert Capability.FINAL_REPORT_COMPOSE == "final_report_compose"  # type: ignore[comparison-overlap]
 def test_capability_all_are_str() -> None:
@@ -77,9 +77,9 @@ def test_capability_all_are_str() -> None:
 def test_risk_level_values() -> None:
-    assert RiskLevel.LOW == "low"
+    assert RiskLevel.LOW == "low"  # type: ignore[comparison-overlap]
-    assert RiskLevel.MEDIUM == "medium"
+    assert RiskLevel.MEDIUM == "medium"  # type: ignore[comparison-overlap]
-    assert RiskLevel.HIGH == "high"
+    assert RiskLevel.HIGH == "high"  # type: ignore[comparison-overlap]
 # ---------------------------------------------------------------------------
@@ -88,19 +88,19 @@ def test_risk_level_values() -> None:
 def test_approval_decision_action_approve() -> None:
-    assert ApprovalDecisionAction.APPROVE == "approve"
+    assert ApprovalDecisionAction.APPROVE == "approve"  # type: ignore[comparison-overlap]
 def test_approval_decision_action_reject() -> None:
-    assert ApprovalDecisionAction.REJECT == "reject"
+    assert ApprovalDecisionAction.REJECT == "reject"  # type: ignore[comparison-overlap]
 def test_approval_decision_action_request_changes() -> None:
-    assert ApprovalDecisionAction.REQUEST_CHANGES == "request_changes"
+    assert ApprovalDecisionAction.REQUEST_CHANGES == "request_changes"  # type: ignore[comparison-overlap]
 def test_approval_decision_action_abort() -> None:
-    assert ApprovalDecisionAction.ABORT == "abort"
+    assert ApprovalDecisionAction.ABORT == "abort"  # type: ignore[comparison-overlap]
 # ---------------------------------------------------------------------------
@@ -196,15 +196,15 @@ def test_session_state_count() -> None:
 def test_error_class_recoverable() -> None:
-    assert ErrorClass.RECOVERABLE == "recoverable"
+    assert ErrorClass.RECOVERABLE == "recoverable"  # type: ignore[comparison-overlap]
 def test_error_class_human_required() -> None:
-    assert ErrorClass.HUMAN_REQUIRED == "human_required"
+    assert ErrorClass.HUMAN_REQUIRED == "human_required"  # type: ignore[comparison-overlap]
 def test_error_class_fatal() -> None:
-    assert ErrorClass.FATAL == "fatal"
+    assert ErrorClass.FATAL == "fatal"  # type: ignore[comparison-overlap]
 def test_error_class_count() -> None:
@@ -223,7 +223,7 @@ def test_str_enum_from_value() -> None:
 def test_str_enum_in_dict() -> None:
    # StrEnum should work as dict key and compare with string
    d = {Backend.OPENROUTER: "openrouter backend"}
-    assert d["openrouter"] == "openrouter backend"
+    assert d["openrouter"] == "openrouter backend"  # type: ignore[index]
@pytest.mark.parametrize(
--- a/my-deepagent/tests/unit/test_file_refs.py
+++ b/my-deepagent/tests/unit/test_file_refs.py
@@ -0,0 +1,53 @@
 """Unit tests for _expand_file_refs in cli/interactive.py."""
 from __future__ import annotations
 from pathlib import Path
 import pytest
 from my_deepagent.cli.interactive import _expand_file_refs
@pytest.fixture
 def tmp_repo(tmp_path: Path) -> Path:
    """Create a minimal repo root with one sample file."""
    (tmp_path / "foo.py").write_text("x = 1\n", encoding="utf-8")
    return tmp_path
 def test_expand_existing_file(tmp_repo: Path) -> None:
    expanded = _expand_file_refs("read @foo.py please", tmp_repo)
    assert "```py" in expanded
    assert "# foo.py" in expanded
    assert "x = 1" in expanded
 def test_expand_missing_file_unchanged(tmp_repo: Path) -> None:
    original = "read @missing.py please"
    expanded = _expand_file_refs(original, tmp_repo)
    assert expanded == original
 def test_expand_path_traversal_blocked(tmp_repo: Path) -> None:
    # Create a file outside the repo root
    outside = tmp_repo.parent / "secret.txt"
    outside.write_text("secret", encoding="utf-8")
    original = "read @../secret.txt"
    expanded = _expand_file_refs(original, tmp_repo)
    # The @ref should remain unexpanded (repo root escape)
    assert "secret" not in expanded or "@../secret.txt" in expanded
 def test_expand_multiple_refs(tmp_repo: Path) -> None:
    (tmp_repo / "bar.ts").write_text("const y = 2;\n", encoding="utf-8")
    expanded = _expand_file_refs("look at @foo.py and @bar.ts", tmp_repo)
    assert "# foo.py" in expanded
    assert "# bar.ts" in expanded
    assert "x = 1" in expanded
    assert "const y = 2" in expanded
 def test_expand_no_at_signs_unchanged(tmp_repo: Path) -> None:
    original = "plain text with no file refs"
    assert _expand_file_refs(original, tmp_repo) == original
--- a/my-deepagent/tests/unit/test_governance.py
+++ b/my-deepagent/tests/unit/test_governance.py
@@ -0,0 +1,72 @@
 """Unit tests for src/my_deepagent/governance.py."""
 from __future__ import annotations
 import json
 import os
 import stat
 from pathlib import Path
 from unittest.mock import patch
 import pytest
 from my_deepagent.errors import MyDeepAgentError
 from my_deepagent.governance import consent_path, has_consent, record_consent, require_consent
 def test_has_consent_false_when_empty(tmp_path: Path) -> None:
    assert has_consent(tmp_path) is False
 def test_has_consent_true_after_record(tmp_path: Path) -> None:
    record_consent(tmp_path)
    assert has_consent(tmp_path) is True
 def test_consent_file_path(tmp_path: Path) -> None:
    expected = tmp_path / "governance-accepted.json"
    assert consent_path(tmp_path) == expected
 def test_record_consent_creates_valid_json(tmp_path: Path) -> None:
    record_consent(tmp_path)
    content = consent_path(tmp_path).read_text()
    data = json.loads(content)
    assert "accepted_at" in data
    assert "T" in data["accepted_at"]  # ISO format
 def test_record_consent_file_mode_600(tmp_path: Path) -> None:
    record_consent(tmp_path)
    file_stat = consent_path(tmp_path).stat()
    mode = stat.S_IMODE(file_stat.st_mode)
    assert mode == 0o600
 def test_record_consent_atomic_uses_os_replace(tmp_path: Path) -> None:
    replace_calls: list[tuple[object, object]] = []
    original_replace = os.replace
    def spy_replace(src: object, dst: object) -> None:
        replace_calls.append((src, dst))
        original_replace(src, dst)  # type: ignore[arg-type]
    with patch("my_deepagent.governance.os.replace", spy_replace):
        record_consent(tmp_path)
    assert len(replace_calls) == 1
    src_path, dst_path = replace_calls[0]
    assert str(src_path).endswith(".tmp")
    assert str(dst_path) == str(consent_path(tmp_path))
 def test_require_consent_raises_when_no_consent(tmp_path: Path) -> None:
    with pytest.raises(MyDeepAgentError) as exc_info:
        require_consent(tmp_path)
    assert exc_info.value.code == "governance_not_accepted"
 def test_require_consent_passes_when_consent_exists(tmp_path: Path) -> None:
    record_consent(tmp_path)
    require_consent(tmp_path)  # should not raise
--- a/my-deepagent/tests/unit/test_i18n.py
+++ b/my-deepagent/tests/unit/test_i18n.py
@@ -0,0 +1,67 @@
 """Unit tests for src/my_deepagent/i18n/__init__.py."""
 from __future__ import annotations
 import pytest
 from my_deepagent.i18n import _load, resolve_lang, t
 def test_t_welcome_default_ko() -> None:
    result = t("init.welcome")
    assert "my-deepagent" in result
    assert "환영합니다" in result
 def test_t_welcome_en() -> None:
    result = t("init.welcome", lang="en")
    assert "Welcome" in result
 def test_t_format_provider() -> None:
    result = t("login.saved", provider="openrouter")
    assert "openrouter" in result
 def test_t_missing_key_returns_key_itself() -> None:
    result = t("nonexistent.missing_key")
    assert result == "nonexistent.missing_key"
 def test_t_missing_section_returns_key_itself() -> None:
    result = t("no_such_section.key")
    assert result == "no_such_section.key"
 def test_resolve_lang_env_en(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setenv("MYDEEPAGENT_LANG", "en")
    assert resolve_lang() == "en"
 def test_resolve_lang_env_ko(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setenv("MYDEEPAGENT_LANG", "ko")
    assert resolve_lang() == "ko"
 def test_resolve_lang_default_ko(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.delenv("MYDEEPAGENT_LANG", raising=False)
    assert resolve_lang() == "ko"
 def test_resolve_lang_invalid_env_falls_back_to_default(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setenv("MYDEEPAGENT_LANG", "fr")
    assert resolve_lang() == "ko"
 def test_load_cache_same_instance() -> None:
    _load.cache_clear()
    first = _load("ko")
    second = _load("ko")
    assert first is second
 def test_t_format_error_returns_template() -> None:
    # If fmt keys don't match, returns raw template string not raising
    result = t("login.saved", provider="openrouter")
    assert isinstance(result, str)
    assert len(result) > 0
--- a/my-deepagent/tests/unit/test_keys.py
+++ b/my-deepagent/tests/unit/test_keys.py
@@ -0,0 +1,72 @@
 """Unit tests for src/my_deepagent/keys.py. Uses a fake keyring backend."""
 from __future__ import annotations
 import pytest
 import my_deepagent.keys as keys_module
 from my_deepagent.keys import delete_api_key, get_api_key, mask, set_api_key
 class _FakeKeyring:
    def __init__(self) -> None:
        self.store: dict[tuple[str, str], str] = {}
    def get_password(self, service: str, username: str) -> str | None:
        return self.store.get((service, username))
    def set_password(self, service: str, username: str, value: str) -> None:
        self.store[(service, username)] = value
    def delete_password(self, service: str, username: str) -> None:
        self.store.pop((service, username), None)
@pytest.fixture
 def fake_keyring(monkeypatch: pytest.MonkeyPatch) -> _FakeKeyring:
    fake = _FakeKeyring()
    monkeypatch.setattr(keys_module.keyring, "get_password", fake.get_password)
    monkeypatch.setattr(keys_module.keyring, "set_password", fake.set_password)
    monkeypatch.setattr(keys_module.keyring, "delete_password", fake.delete_password)
    return fake
 def test_get_api_key_not_set_returns_none(fake_keyring: _FakeKeyring) -> None:
    assert get_api_key("openrouter") is None
 def test_set_and_get_api_key_round_trip(fake_keyring: _FakeKeyring) -> None:
    set_api_key("openrouter", "sk-or-test-1234")
    assert get_api_key("openrouter") == "sk-or-test-1234"
 def test_delete_api_key_existing_returns_true(fake_keyring: _FakeKeyring) -> None:
    set_api_key("openrouter", "sk-or-test")
    assert delete_api_key("openrouter") is True
 def test_delete_api_key_not_existing_returns_false(fake_keyring: _FakeKeyring) -> None:
    assert delete_api_key("openrouter") is False
 def test_delete_api_key_removes_value(fake_keyring: _FakeKeyring) -> None:
    set_api_key("openrouter", "sk-or-test")
    delete_api_key("openrouter")
    assert get_api_key("openrouter") is None
 def test_mask_long_key() -> None:
    result = mask("sk-or-v1-abc1234567xyz9876")
    assert result == "sk-or-v1...9876"
 def test_mask_none_returns_not_set() -> None:
    assert mask(None) == "(not set)"
 def test_mask_short_key_returns_stars() -> None:
    assert mask("short") == "***"
 def test_mask_exactly_8_chars_returns_stars() -> None:
    assert mask("12345678") == "***"
--- a/my-deepagent/tests/unit/test_logging.py
+++ b/my-deepagent/tests/unit/test_logging.py
@@ -0,0 +1,121 @@
 """Unit tests for src/my_deepagent/logging.py — secret scrubbing."""
 from __future__ import annotations
 from typing import Any
 from my_deepagent.logging import _scrub_processor, scrub, scrub_value
 _REDACTED = "[REDACTED]"
 # ---------------------------------------------------------------------------
 # scrub — individual patterns
 # ---------------------------------------------------------------------------
 def test_scrub_openrouter_key() -> None:
    secret = "sk-or-v1-abc1234567890123456789xyz"
    assert scrub(secret) == _REDACTED
 def test_scrub_anthropic_key() -> None:
    secret = "sk-ant-api03-abcdef1234567890abcdef1234567890xyz"
    assert scrub(secret) == _REDACTED
 def test_scrub_openai_project_key() -> None:
    secret = "sk-proj-abcdefghijklmnopqrstuvwxyz12345"
    assert scrub(secret) == _REDACTED
 def test_scrub_openai_general_key() -> None:
    # must be 30+ chars after "sk-"
    secret = "sk-abcdefghijklmnopqrstuvwxyz1234567890"
    assert scrub(secret) == _REDACTED
 def test_scrub_github_pat() -> None:
    secret = "ghp_abcdefghijklmnopqrstuvwxyz1234567890"
    assert scrub(secret) == _REDACTED
 def test_scrub_bearer_token() -> None:
    text = "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.payload"
    result = scrub(text)
    assert _REDACTED in result
 def test_scrub_plain_text_unchanged() -> None:
    text = "normal log message with no secrets here"
    assert scrub(text) == text
 def test_scrub_partial_match_in_larger_string() -> None:
    text = f"calling API with key=sk-ant-api03-{'x' * 30}"
    result = scrub(text)
    assert _REDACTED in result
    assert "calling API with key=" in result
 # ---------------------------------------------------------------------------
 # scrub_value — recursive
 # ---------------------------------------------------------------------------
 def test_scrub_value_dict_scrubs_string_values() -> None:
    secret = f"sk-or-v1-{'a' * 25}"
    data: dict[str, Any] = {"key": secret, "n": 42}
    result = scrub_value(data)
    assert result["key"] == _REDACTED
    assert result["n"] == 42
 def test_scrub_value_list_scrubs_all_strings() -> None:
    secret_ant = f"sk-ant-api03-{'b' * 30}"
    secret_ghp = f"ghp_{'c' * 35}"
    data: list[Any] = [1, secret_ant, {"k": secret_ghp}]
    result = scrub_value(data)
    assert result[0] == 1
    assert result[1] == _REDACTED
    assert result[2]["k"] == _REDACTED
 def test_scrub_value_non_string_passes_through() -> None:
    assert scrub_value(42) == 42
    assert scrub_value(3.14) == 3.14
    assert scrub_value(None) is None
    assert scrub_value(True) is True
 def test_scrub_value_tuple_scrubs_strings() -> None:
    secret = f"sk-or-v1-{'d' * 22}"
    result = scrub_value((secret, "safe"))
    assert isinstance(result, tuple)
    assert result[0] == _REDACTED
    assert result[1] == "safe"
 # ---------------------------------------------------------------------------
 # _scrub_processor
 # ---------------------------------------------------------------------------
 def test_scrub_processor_scrubs_event_dict_values() -> None:
    secret = f"sk-ant-api03-{'e' * 30}"
    event_dict: dict[str, Any] = {
        "event": "calling model",
        "api_key": secret,
        "model": "claude-3",
    }
    result = _scrub_processor(None, "info", event_dict)
    assert result["api_key"] == _REDACTED
    assert result["event"] == "calling model"
    assert result["model"] == "claude-3"
 def test_scrub_processor_returns_dict() -> None:
    event_dict: dict[str, Any] = {"event": "no secrets here", "count": 5}
    result = _scrub_processor(None, "debug", event_dict)
    assert isinstance(result, dict)
    assert result["count"] == 5
--- a/my-deepagent/tests/unit/test_persona.py
+++ b/my-deepagent/tests/unit/test_persona.py
@@ -47,7 +47,9 @@ def _minimal_persona_dict(**overrides: object) -> dict[str, object]:
 def test_all_seed_personas_load() -> None:
    personas = load_personas_from_dir(PERSONAS_DIR)
-    assert len(personas) == 10
+    # 10 original + 2 deepseek personas added for E2E (Anthropic-via-OpenRouter
    # tool-call compatibility workaround); see CHANGELOG Step 15.
    assert len(personas) == 12
 def test_seed_persona_names_unique() -> None:
--- a/my-deepagent/tests/unit/test_pricing.py
+++ b/my-deepagent/tests/unit/test_pricing.py
@@ -20,7 +20,7 @@ from my_deepagent.monitoring.pricing import (
 def test_parse_valid_payload_returns_model_prices() -> None:
-    data = {
+    data: dict[str, object] = {
        "data": [
            {
                "id": "deepseek/deepseek-chat",
@@ -60,7 +60,7 @@ def test_parse_missing_data_key_returns_empty() -> None:
 def test_parse_skips_entries_without_id() -> None:
-    data = {
+    data: dict[str, object] = {
        "data": [
            {"pricing": {"prompt": "0.000001", "completion": "0.000002"}, "context_length": 1000},
        ]
@@ -70,7 +70,7 @@ def test_parse_skips_entries_without_id() -> None:
 def test_parse_skips_entries_with_invalid_pricing_values() -> None:
-    data = {
+    data: dict[str, object] = {
        "data": [
            {
                "id": "model/x",
@@ -84,7 +84,7 @@ def test_parse_skips_entries_with_invalid_pricing_values() -> None:
 def test_parse_handles_null_pricing_gracefully() -> None:
-    data = {
+    data: dict[str, object] = {
        "data": [
            {"id": "model/y", "pricing": None, "context_length": 0},
        ]
@@ -97,7 +97,7 @@ def test_parse_handles_null_pricing_gracefully() -> None:
 def test_parse_handles_missing_context_length() -> None:
-    data = {
+    data: dict[str, object] = {
        "data": [
            {"id": "model/z", "pricing": {"prompt": "0.000001", "completion": "0.000002"}},
        ]
@@ -108,7 +108,7 @@ def test_parse_handles_missing_context_length() -> None:
 def test_parse_non_dict_entry_is_skipped() -> None:
-    data = {"data": ["not-a-dict", None]}
+    data: dict[str, object] = {"data": ["not-a-dict", None]}
    result = _parse_pricing_payload(data)
    assert result == []
--- a/my-deepagent/tests/unit/test_secrets.py
+++ b/my-deepagent/tests/unit/test_secrets.py
@@ -0,0 +1,86 @@
 """Unit tests for src/my_deepagent/secrets.py."""
 from __future__ import annotations
 import pytest
 import my_deepagent.keys as keys_module
 from my_deepagent.config import load_config
 from my_deepagent.errors import MyDeepAgentError
 from my_deepagent.secrets import resolve_openrouter_api_key
 class _FakeKeyring:
    def __init__(self) -> None:
        self.store: dict[tuple[str, str], str] = {}
    def get_password(self, service: str, username: str) -> str | None:
        return self.store.get((service, username))
    def set_password(self, service: str, username: str, value: str) -> None:
        self.store[(service, username)] = value
    def delete_password(self, service: str, username: str) -> None:
        self.store.pop((service, username), None)
@pytest.fixture
 def fake_keyring(monkeypatch: pytest.MonkeyPatch) -> _FakeKeyring:
    fake = _FakeKeyring()
    monkeypatch.setattr(keys_module.keyring, "get_password", fake.get_password)
    monkeypatch.setattr(keys_module.keyring, "set_password", fake.set_password)
    monkeypatch.setattr(keys_module.keyring, "delete_password", fake.delete_password)
    return fake
 def test_resolves_from_config(fake_keyring: _FakeKeyring) -> None:
    config = load_config(openrouter_api_key="sk-config-key")
    result = resolve_openrouter_api_key(config)
    assert result == "sk-config-key"
 def test_resolves_from_mydeepagent_env(
    monkeypatch: pytest.MonkeyPatch, fake_keyring: _FakeKeyring
 ) -> None:
    monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
    monkeypatch.setenv("MYDEEPAGENT_OPENROUTER_API_KEY", "sk-env-mydeepagent")
    config = load_config(openrouter_api_key=None)
    assert resolve_openrouter_api_key(config) == "sk-env-mydeepagent"
 def test_resolves_from_openrouter_env_fallback(
    monkeypatch: pytest.MonkeyPatch, fake_keyring: _FakeKeyring
 ) -> None:
    monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
    monkeypatch.setenv("OPENROUTER_API_KEY", "sk-env-fallback")
    config = load_config(openrouter_api_key=None)
    assert resolve_openrouter_api_key(config) == "sk-env-fallback"
 def test_resolves_from_keyring(monkeypatch: pytest.MonkeyPatch, fake_keyring: _FakeKeyring) -> None:
    monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
    keys_module.set_api_key("openrouter", "sk-keyring-key")
    config = load_config(openrouter_api_key=None)
    assert resolve_openrouter_api_key(config) == "sk-keyring-key"
 def test_raises_backend_auth_failed_when_all_missing(
    monkeypatch: pytest.MonkeyPatch, fake_keyring: _FakeKeyring
 ) -> None:
    monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
    config = load_config(openrouter_api_key=None)
    with pytest.raises(MyDeepAgentError) as exc_info:
        resolve_openrouter_api_key(config)
    assert exc_info.value.code == "backend_auth_failed"
 def test_config_takes_priority_over_env(
    monkeypatch: pytest.MonkeyPatch, fake_keyring: _FakeKeyring
 ) -> None:
    monkeypatch.setenv("MYDEEPAGENT_OPENROUTER_API_KEY", "sk-env-should-lose")
    config = load_config(openrouter_api_key="sk-config-wins")
    assert resolve_openrouter_api_key(config) == "sk-config-wins"
--- a/my-deepagent/tests/unit/test_session.py
+++ b/my-deepagent/tests/unit/test_session.py
@@ -62,7 +62,7 @@ def _minimal_permission_spec(
    return FilesystemPermissionSpec(
        operations=tuple(operations or ["read"]),
        paths=tuple(paths or ["/**"]),
-        mode=mode,  # type: ignore[arg-type]
+        mode=mode,
    )
@@ -223,7 +223,10 @@ def test_subagent_to_dict_optional_tools_included_when_set() -> None:
    sub = _minimal_subagent(allowed_tools=["read_file", "write_file"])
    d = _subagent_to_dict(sub)
    assert "tools" in d
-    assert d["tools"] == ["read_file", "write_file"]
+    # _subagent_to_dict serializes allowed_tools as a list[str]; SubAgent TypedDict
    # widens the tools type to include BaseTool/Callable, hence the cast for mypy.
    tools_list: list[Any] = list(d["tools"])
    assert tools_list == ["read_file", "write_file"]
 def test_subagent_to_dict_no_tools_key_when_empty() -> None:
--- a/my-deepagent/tests/unit/test_slash.py
+++ b/my-deepagent/tests/unit/test_slash.py
@@ -0,0 +1,129 @@
 """Unit tests for slash.py — parse_slash + SlashRegistry."""
 from __future__ import annotations
 import pytest
 from my_deepagent.slash import SlashParsed, SlashRegistry, parse_slash
 # ---------------------------------------------------------------------------
 # parse_slash
 # ---------------------------------------------------------------------------
 def test_parse_quit() -> None:
    result = parse_slash("/quit")
    assert result is not None
    assert result.name == "quit"
    assert result.args == ()
    assert result.raw == "quit"
 def test_parse_agent_with_arg() -> None:
    result = parse_slash("/agent code-reviewer")
    assert result is not None
    assert result.name == "agent"
    assert result.args == ("code-reviewer",)
 def test_parse_model_with_slash_in_arg() -> None:
    result = parse_slash("/model anthropic/claude")
    assert result is not None
    assert result.name == "model"
    assert result.args == ("anthropic/claude",)
 def test_parse_plain_text_returns_none() -> None:
    assert parse_slash("hello world") is None
 def test_parse_empty_string_returns_none() -> None:
    assert parse_slash("") is None
 def test_parse_bare_slash_gives_empty_name() -> None:
    result = parse_slash("/")
    assert result is not None
    assert result.name == ""
    assert result.args == ()
    assert result.raw == ""
 def test_parse_uppercase_normalized_to_lower() -> None:
    result = parse_slash("/QUIT")
    assert result is not None
    assert result.name == "quit"
 def test_parse_spaced_slash_command() -> None:
    result = parse_slash("/  spaced  ")
    # body after strip of "/  spaced  " → body = "spaced" (strip on body)
    assert result is not None
    assert result.name == "spaced"
    assert result.args == ()
 # ---------------------------------------------------------------------------
 # SlashRegistry
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_registry_register_and_dispatch_returns_handler_value() -> None:
    reg = SlashRegistry()
    calls: list[str] = []
    async def handler(cmd: SlashParsed) -> bool:
        calls.append(cmd.name)
        return False
    reg.register("foo", handler, help="test help")
    result = await reg.dispatch(SlashParsed(name="foo", args=(), raw="foo"))
    assert result is False
    assert calls == ["foo"]
@pytest.mark.asyncio
 async def test_registry_unknown_name_returns_false() -> None:
    reg = SlashRegistry()
    result = await reg.dispatch(SlashParsed(name="nonexistent", args=(), raw="nonexistent"))
    assert result is False
@pytest.mark.asyncio
 async def test_registry_handler_returning_true_propagates() -> None:
    reg = SlashRegistry()
    async def quit_handler(cmd: SlashParsed) -> bool:
        return True
    reg.register("quit", quit_handler, help="exit")
    result = await reg.dispatch(SlashParsed(name="quit", args=(), raw="quit"))
    assert result is True
 def test_registry_names_sorted() -> None:
    reg = SlashRegistry()
    async def noop(cmd: SlashParsed) -> bool:
        return False
    reg.register("zebra", noop)
    reg.register("apple", noop)
    reg.register("mango", noop)
    assert reg.names == ["apple", "mango", "zebra"]
 def test_registry_help_for_and_all_help() -> None:
    reg = SlashRegistry()
    async def noop(cmd: SlashParsed) -> bool:
        return False
    reg.register("quit", noop, help="exit the REPL")
    reg.register("help", noop, help="show commands")
    assert reg.help_for("quit") == "exit the REPL"
    assert reg.help_for("unknown") == ""
    pairs = dict(reg.all_help())
    assert pairs["quit"] == "exit the REPL"
    assert pairs["help"] == "show commands"
--- a/my-deepagent/uv.lock
+++ b/my-deepagent/uv.lock
@@ -1129,6 +1129,7 @@ dev = [
    { name = "pytest" },
    { name = "pytest-asyncio" },
    { name = "pytest-httpx" },
    { name = "pytest-timeout" },
    { name = "respx" },
    { name = "ruff" },
    { name = "types-jsonschema" },
@@ -1169,6 +1170,7 @@ dev = [
    { name = "pytest", specifier = ">=8.3" },
    { name = "pytest-asyncio", specifier = ">=0.24" },
    { name = "pytest-httpx", specifier = ">=0.34" },
    { name = "pytest-timeout", specifier = ">=2.4.0" },
    { name = "respx", specifier = ">=0.21" },
    { name = "ruff", specifier = ">=0.8" },
    { name = "types-jsonschema", specifier = ">=4.26.0.20260508" },
@@ -1597,6 +1599,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/1e/55/1fa65f8e4fceb19dd6daa867c162ad845d547f6058cd92b4b02384a44777/pytest_httpx-0.36.2-py3-none-any.whl", hash = "sha256:d42ebd5679442dc7bfb0c48e0767b6562e9bc4534d805127b0084171886a5e22", size = 20315, upload-time = "2026-04-09T13:57:18.587Z" },
 ]
 [[package]]
 name = "pytest-timeout"
 version = "2.4.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "pytest" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/ac/82/4c9ecabab13363e72d880f2fb504c5f750433b2b6f16e99f4ec21ada284c/pytest_timeout-2.4.0.tar.gz", hash = "sha256:7e68e90b01f9eff71332b25001f85c75495fc4e3a836701876183c4bcfd0540a", size = 17973, upload-time = "2025-05-05T19:44:34.99Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/fa/b6/3127540ecdf1464a00e5a01ee60a1b09175f6913f0644ac748494d9c4b21/pytest_timeout-2.4.0-py3-none-any.whl", hash = "sha256:c42667e5cdadb151aeb5b26d114aff6bdf5a907f176a007a30b940d3d865b5c2", size = 14382, upload-time = "2025-05-05T19:44:33.502Z" },
 ]
 [[package]]
 name = "python-discovery"
 version = "1.3.1"