Files
dev-puppeteer/my-deepagent/CHANGELOG.md
chungyeong f8335e4515 feat(my-deepagent): v0.3 PR #1 — interactive session persistence + LangGraph saver wiring
v0.3의 토대. REPL/GUI 둘 다 장기 대화를 영속해서 `mydeepagent --session <id>`
또는 `GET /api/sessions/{id}`로 어디서든 이어 진행 가능. Claude Code의
`claude --resume` 등가 능력.

Data model
- `persistence/models.py`:
  - 신규 `MessageRow` 테이블 — (session_id, seq) UNIQUE, role/content/
    tool_calls/token_count/is_summary/archived/ts. LangGraph checkpoint =
    source of truth, 이 테이블은 GUI/CLI 빠른 조회 mirror. divergence
    rebuild 매커니즘 없음 (단순성 우선).
  - `InteractiveSessionRow` 컬럼 8개 추가:
      total_input_tokens, total_output_tokens (PR #2 tiktoken으로 정밀화 예정),
      model, project_key (sha256(realpath(repo_path))[:16]),
      title (첫 user msg 50자), plan_mode (PR #5), parent_session_id (PR #6),
      depth (PR #6 sub-agent depth ≤ 3).
- `alembic/versions/684e70f4536a_*.py` (신규):
  - `op.batch_alter_table` 사용 — SQLite ALTER constraint 미지원 우회. Postgres는
    native DDL.
  - 자동생성이 제안한 LangGraph 테이블 (`checkpoints` 등) drop 라인은 의도적으로
    제거 (langgraph-checkpoint-postgres가 자체 관리).
  - server_default 박아서 기존 row 안전.

CLI
- `cli/interactive.py`:
  - REPL 진입 시 `get_checkpointer_ctx(config.database_url)` 컨텍스트 열고
    REPL 전체 동안 유지. `build_agent(..., checkpointer=saver)`로 deepagents에
    LangGraph saver wire. v0.2 PR #10의 CostMiddleware / AuditToolMiddleware
    보존.
  - `_invoke_and_stream`이 ainvoke 전후 명시적 MessageRow insert
    (user → ainvoke → assistant). last_message_at + total_*_tokens 누적 +
    첫 user msg로 title 자동 setter.
  - `InteractiveSession.thread_suffix` 도입. /model / /agent / /clear 호출
    시 suffix bump → LangGraph thread_id = `{session_id}:{suffix}` 로 새
    deepagents 컨텍스트 시작 (compaction과 같은 패턴, PR #2 재사용).
  - 신규 `--session <id|prefix>` 옵션: 기존 row 로드 (ended이면 거부) 또는
    신규 row insert (AgentPersonaRow upsert + project_key 박음).
  - `/clear` 슬래시 갱신: messages.archived=True + 새 thread 시작. 세션 자체
    는 살아있음 — `sessions show <id> --all`로 조회 가능.
- `cli/sessions.py` (신규): `mydeepagent sessions list/show/resume/end`.
  show <id> [--all]이 archived 메시지까지. 6+ char prefix + 중복 시 명시
  에러.
- `cli/main.py`: --session 옵션 + sessions 서브명령 + interactive_command
  시그니처 확장.

HTTP API
- `api/models.py`: SessionSummary / MessageInfo / SessionDetail /
  CreateSessionRequest / PostMessageRequest / SessionAck DTO 신규 (모두
  extra="forbid").
- `api/routes/sessions.py` (신규):
    GET  /api/sessions?limit=&state=
    GET  /api/sessions/{id}?all=true     (마지막 200 메시지)
    POST /api/sessions                    (persona_name, model_override, repo_path)
    POST /api/sessions/{id}/messages      (사용자 메시지 append, 동기 persist;
                                            PR #7 GUI에서 background ainvoke 추가)
    GET  /api/sessions/{id}/stream        (SSE — 0.5s polling, last-event-id 헤더
                                            + ?last_seq 둘 다 지원)
    POST /api/sessions/{id}/end
- `api/app.py`: sessions 라우터 마운트.

Tests
- `tests/integration/test_session_persist.py` (5 시나리오):
    1. create + post → row + 메시지 + title + token 누적 영속
    2. list가 신규 3 세션 모두 포함
    3. prefix resolution + 404
    4. end 후 메시지 거부 (409)
    5. ?all=true가 archived 메시지 surfacing

Gates
- ruff check + ruff format + mypy --strict: PASS (124 source files)
- pytest non-E2E: 608 PASS (25.86 s) — v0.2 PR #3 후 603에서 +5 신규
- pytest E2E real OpenRouter on Postgres: PASS 82.07 s (베이스라인 60–122s
  범위 내; DR-3 +20% 임계점 통과)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:06:21 +09:00

409 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
## [Unreleased]
### Added
- **v0.3 PR #1 — interactive session persistence + LangGraph saver wiring**.
v0.3의 토대. REPL/GUI 모두 장기 대화 영속 가능하도록 데이터 모델·CLI·HTTP
API를 함께 도입. Claude Code의 `claude --resume` 등가.
- `persistence/models.py`:
- 신규 `MessageRow` 테이블 — `(session_id, seq)` UNIQUE, role/content/
tool_calls/token_count/is_summary/archived/ts. LangGraph checkpoint가
source of truth이고 이 테이블은 GUI/CLI 빠른 조회 mirror (divergence
rebuild 가정 없음).
- `InteractiveSessionRow`에 컬럼 8개 추가: `total_input_tokens`,
`total_output_tokens`, `model`, `project_key`, `title`, `plan_mode`
(PR #5용), `parent_session_id` + `depth` (PR #6용, self FK CASCADE).
- `alembic/versions/684e70f4536a_v0_3_pr_1_session_messages_8_columns.py`
(신규): `op.batch_alter_table` 사용 — SQLite ALTER constraint 미지원을
우회. 자동생성이 제안한 LangGraph 자체 테이블 (`checkpoints` /
`checkpoint_writes` / `checkpoint_blobs` / `checkpoint_migrations`)
drop 라인은 의도적으로 제거 (langgraph-checkpoint-postgres가 자체 관리).
`server_default` 박아서 기존 row가 NULL/0/false로 안전하게 채워짐.
- `cli/interactive.py`:
- REPL 진입 시 `get_checkpointer_ctx(config.database_url)` 컨텍스트 열고
REPL 전체 수명 동안 유지. `build_agent(..., checkpointer=saver)`
deepagents에 LangGraph saver wire. v0.2 PR #10에서 추가됐던
`CostMiddleware` / `AuditToolMiddleware` 보존.
- `_invoke_and_stream`이 ainvoke 전후로 `MessageRow` 명시적 insert
(`role=user` → ainvoke → `role=assistant`). `last_message_at` +
`total_*_tokens` 누적 + 첫 user 메시지로 `title` 자동 setter (50자
truncate).
- `InteractiveSession` 클래스에 `thread_suffix` 도입. `/model` / `/agent`
/ `/clear` 호출 시 suffix bump → LangGraph thread_id = `{session_id}:{suffix}`
로 새 deepagents 컨텍스트 시작 (compaction과 같은 패턴, PR #2에서 재사용
예정).
- 신규 `--session <id|prefix>` 옵션 처리: 기존 row 로드 (`state == "ended"`이면
거부) 또는 신규 row insert (`AgentPersonaRow` upsert + `project_key` =
`sha256(realpath(repo_path))[:16]`).
- `/clear` 슬래시 갱신: 현재 세션의 모든 `MessageRow.archived=True` + 새
thread 시작. 세션 자체는 살아있음 (`sessions show <id> --all`로 조회
가능).
- `cli/sessions.py` (신규): `mydeepagent sessions list/show/resume/end`.
`show <id> [--all]`이 archived 메시지까지 표시. 6+ char prefix 매칭 +
중복 시 명시적 에러.
- `cli/main.py`: `--session` 옵션 + `sessions` 서브명령 + `interactive_command`
시그니처 확장.
- `api/models.py`: `SessionSummary` / `MessageInfo` / `SessionDetail` /
`CreateSessionRequest` / `PostMessageRequest` / `SessionAck` DTO 신규
(모두 `extra="forbid"`).
- `api/routes/sessions.py` (신규):
- `GET /api/sessions?limit=&state=` — list
- `GET /api/sessions/{id}?all=true` — detail + 마지막 200 메시지
- `POST /api/sessions` — 신규 세션 생성 (persona_name / model_override /
repo_path)
- `POST /api/sessions/{id}/messages` — 사용자 메시지 append (v0.3 PR #1
범위에선 동기 persist만; PR #7 Web GUI에서 background ainvoke 추가
예정)
- `GET /api/sessions/{id}/stream` — SSE. 0.5s polling, `last-event-id`
헤더 + `?last_seq=` 둘 다 지원. 종결 시 `event: done` 보내고 close.
- `POST /api/sessions/{id}/end` — 세션 종결 마킹.
- `api/app.py`: sessions 라우터 마운트 (`/api/sessions`).
- `tests/integration/test_session_persist.py` (신규, 5 케이스): create +
post + persist / list 멤버십 / prefix resolution + 404 / end 후 메시지
거부 / archived 메시지 ?all=true로 surfacing.
- 회귀: ruff/mypy --strict / pytest 608 PASS / E2E real OpenRouter on
Postgres 82.07s (베이스라인 60122s 범위 내).
### Fixed
- **bugfix(engine): two production bugs surfaced by manual Web-GUI verification
(`mydeepagent serve` + real OpenRouter run via /api/runs)**.
1. `_compose_final_report` wrote the report files to disk and returned the
path but **did not update `RunRow.final_report_path`**. CLI users
received the path via the `RunResult` return value and never noticed;
API/GUI users read from the DB and got `null`. Fix: after writing the
JSON / MD report, update the RunRow column in a fresh session inside
`_compose_final_report` itself so both consumers see the path.
2. `_run_approval_gate` built `idempotency_key = f"{phase_key}:{artifact_name}"`
for `approval_requests`. The column has a UNIQUE constraint, so the
**second run** of the same workflow against a non-empty DB raised
`asyncpg.UniqueViolationError` on the first approval gate — the
background task died, the run stayed `executing` forever, the GUI
never updated. Confirmed by reading the server log after the GUI got
stuck on a 2nd run. Fix: prefix with `run_id`
(`f"{run_id}:{phase_key}:{artifact_name}"`) so each run has its own
key namespace; same-run replay still collides idempotently as intended.
E2E + integration tests did not catch this because each test uses a
fresh sqlite tmp_path or per-test Postgres DB — no second run ever
hit the same table.
### Added
- **v0.2 PR #3 — FastAPI + SSE + minimal Web GUI (`mydeepagent serve`)**.
Localhost Web UI for run start / list / detail / resume / abort + live
event stream. Closes the v0.1.0 gap "GUI 미존재" from the user's first
session requirements. No auth, no multi-tenant; single uvicorn worker
(per DR-3).
- `pyproject.toml`: runtime deps `fastapi>=0.115`,
`uvicorn[standard]>=0.30`, `sse-starlette>=2.1` (8 transitive deps).
- `src/my_deepagent/api/` (new tree):
- `app.py``create_app(config=None) -> FastAPI` factory. lifespan
stores `db`/`config`/`personas`/`workflows` on `app.state`.
`CORSMiddleware(allow_origin_regex=r"^http://localhost(:\d+)?$")`.
Static frontend mounted under `/static`, plus `/`, `/{page}.html`.
- `models.py` — pydantic v2 DTOs (`RunSummary`, `RunDetail`,
`PhaseInfo`, `ArtifactInfo`, `EventInfo`, `StartRunRequest`,
`StartRunResponse`, `PersonaSummary`, `WorkflowSummary`,
`BudgetSummary`, `BudgetScopeEntry`). All `extra="forbid"` so typos
surface at 422 deserialization time.
- `deps.py``get_db`, `get_config`, `get_personas`, `get_workflows`,
`seed_root`. Annotated[...] wrappers in each route module.
- `runner.py``start_new_run` / `start_resume` /
`is_running`. Pre-allocates a UUID and passes it to
`WorkflowEngine.run(pre_allocated_run_id=...)` so the route can
return the run_id before the phase loop starts. In-memory
`_tasks: dict[UUID, asyncio.Task]` prevents GC of in-flight tasks.
- `sse.py``run_events_stream(db, run_id, last_event_id)`.
0.5 s polling against `run_events.seq > last_event_id`; emits
`ServerSentEvent` per row; sends `event: done` and HTTP-200-closes
when run reaches terminal state.
- `routes/runs.py` — GET `/api/runs?limit=&state=`, GET `/api/runs/{id}`,
POST `/api/runs` (start), POST `/api/runs/{id}/resume`,
POST `/api/runs/{id}/abort`, GET `/api/runs/{id}/events` (SSE).
`Last-Event-ID` HTTP header honored alongside `?last_event_id=`.
- `routes/personas.py` — GET `/api/personas`.
- `routes/workflows.py` — GET `/api/workflows`.
- `routes/budget.py` — GET `/api/budget` (day / runs / personas
buckets with cap + warn thresholds from `Config`).
- `src/my_deepagent/cli/serve.py` (new) — `mydeepagent serve [--host
127.0.0.1] [--port 8000]`. Loud stderr warning when host is not
loopback (the API is unauthenticated). Uses uvicorn factory form +
forces `workers=1`.
- `src/my_deepagent/cli/main.py` — `serve` command registered.
- `src/my_deepagent/engine.py` — `WorkflowEngine.run` gained
`pre_allocated_run_id: UUID | None = None` so the FastAPI runner can
return the run_id immediately. Default behavior unchanged.
- `static/` (new) — vanilla HTML/JS/CSS, no build system:
- `index.html` — 런 목록 + 예산 (data-page="index")
- `new.html` — 신규 run 폼 (workflow select, repo path, requirements,
per-role persona override) (data-page="new")
- `run.html` — run 상세 + SSE 이벤트 라이브 + resume/abort 버튼
(data-page="run")
- `app.js` — fetch + EventSource. **XSS policy hardcoded at the top
of the file**: `element.textContent` only, `innerHTML` /
`insertAdjacentHTML` / `outerHTML` forbidden.
- `style.css` — dark theme, single file.
- Tests (new):
- `tests/integration/test_api_read.py` — 5 cases (list empty, get 404,
personas seed count, workflows seed, budget empty).
- `tests/integration/test_api_write.py` — 5 cases (missing template
400, extra field 422, resume 404, abort 404, mock-runner happy path).
- `tests/integration/test_api_sse.py` — 1 case: seed terminal run +
events, drain stream, assert types present and stream closes.
- `tests/integration/test_api_static.py` — 5 cases: index/new/run
HTML 200, app.js content-type + XSS-policy substring, style.css
content-type.
All tests use `httpx.ASGITransport` + `app.router.lifespan_context`
(httpx does not auto-trigger FastAPI lifespan) and sqlite tmp_path.
- **v0.2 PR #2b — `mydeepagent runs resume <id>` real implementation**.
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Reuses
v0.2 PR #2a's LangGraph wiring + sweep_orphan_runs's DB state machine,
no Temporal (DR-3).
- `src/my_deepagent/engine.py`:
- New `WorkflowEngine.resume(run_id)` async method. Loads `RunRow`,
rejects terminal states with `MyDeepAgentError.human_required("run_already_terminal")`,
reloads `worktree_root` + `WorkflowTemplate` (via `_reload_template`) +
bindings (via `_reload_bindings`) from DB. Does **not** call
`bind_personas` again — locks in the original binding so consent /
pool changes don't silently shift roles.
- New `_execute_run` helper (shared phase loop) extracted from `run()`.
Skips already-`completed` phases (emits `phase.skipped` event) and
re-executes the rest. Both `run` (new) and `resume` dispatch through
it.
- New helpers: `_get_run_or_raise`, `_reload_template`,
`_reload_bindings` (rebuilds `{role_id: Binding}` from
`run_bindings` ⨝ `agent_personas`; corrupt persona rows are logged
and skipped, surfacing as `run_metadata_missing` if no bindings remain),
`_get_completed_phase_keys`.
- New `RunEventType.RUN_RESUMED` and `RunEventType.PHASE_SKIPPED` are
now actually emitted (the enum members existed already from v0.1.0).
- `src/my_deepagent/cli/runs.py` `_runs_resume_async`: stub → real impl.
Validates run exists + non-terminal, loads seed personas + artifact
schemas (`docs/schemas/`), constructs `WorkflowEngine` with a
"abort-on-new-approval" callback, calls `engine.resume(UUID(id))`,
prints final state + report path. Catches `MyDeepAgentError` and prints
a red error with exit 1.
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase workflow: phase 1 succeeds, phase 2 fails → flip run row
back to executing → resume → phase 2 completes; assert phase 1 was
skipped (`phase.skipped` event present) and `run.resumed` event emitted.
2. Terminal run → `resume()` raises `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → raises `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → raises `MyDeepAgentError(code="run_metadata_missing")`.
5. workflow_templates.definition is malformed → raises `MyDeepAgentError(code="template_load_failed")`.
- E2E real OpenRouter regression PASS 78.52 s (baseline 71122 s);
within DR-3 acceptance threshold (+20%).
- **v0.2 PR #2a — LangGraph `AsyncPostgresSaver` engine wiring** (foundation
for `runs resume`). v0.2 PR #1 added the dependency; this commit actually
uses it.
- `src/my_deepagent/engine.py`:
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None` (defaults
to `config.database_url`).
- New `_maybe_open_saver` async context: opens `get_checkpointer_ctx` for
`postgresql{,+asyncpg,+psycopg}://` URLs, yields `None` for `sqlite+aiosqlite://`
(test affordance — production always Postgres per DR-2 / DR-3).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases via `self._saver` — opening per-phase would re-connect
5+ times for no isolation gain (checkpoints are keyed by `thread_id`, not
saver instance).
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes `config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}`
to `agent.ainvoke`. Same `thread_id` format already used by
`LlmCallRow.thread_id` (cost ledger), so one key namespace covers both.
- `tests/integration/test_engine_checkpointer_wiring.py` (new):
1. **Contract 1 — engine wiring**: `build_agent` receives a non-None saver;
`agent.ainvoke` receives `config.configurable.thread_id` in the
expected `run:<uuid>:phase:<uuid>` format.
2. **Contract 2 — LangGraph thread isolation**: two distinct `thread_id`s
write independent rows in the auto-created `checkpoints` table; aput /
aget round-trip preserves per-thread identity (sanity check against
future deepagents wrap regressions).
- `tests/integration/test_engine.py` — 5 mock-agent tests: fake `_ainvoke`
signature widened with `**_kwargs` to accept the new `config=` arg.
- E2E real OpenRouter regression PASS 75.99 s (baseline 71122 s); within
DR-3 acceptance threshold (+20%).
- **v0.2 PR #1 — Postgres migration**: production backing store switched from
SQLite to PostgreSQL 16 ahead of M8-Py (FastAPI) per DR-2.
- `pyproject.toml`: `asyncpg>=0.30` + `psycopg[binary]>=3.2` +
`langgraph-checkpoint-postgres>=2.0.0` added to runtime; `aiosqlite>=0.20`
moved to `[dependency-groups].dev` (test-only); `langgraph-checkpoint-sqlite`
removed.
- `src/my_deepagent/persistence/db.py`: dialect-aware connect listener —
SQLite still gets `WAL` + `busy_timeout=5000` + `foreign_keys=ON`, Postgres
gets `SET TIME ZONE 'UTC'`. New `Database.dialect_name` property + `drop_schema`
method for tests.
- `src/my_deepagent/persistence/checkpointer.py`: `SqliteSaver` →
`AsyncPostgresSaver`. API is now async (`async with`) and takes a
connection string; SQLAlchemy URL prefixes (`postgresql+asyncpg://`,
`postgresql+psycopg://`) are auto-stripped to a plain libpq DSN. New
`_to_psycopg_dsn` helper covered by 4 unit tests.
- `src/my_deepagent/persistence/upsert.py` (new): `insert_for(session)` —
dialect-aware UPSERT helper. Picks `postgresql.insert` or `sqlite.insert`
based on the bound engine's dialect. Replaces 5 hardcoded `sqlite_insert`
call sites in `budget.py`, `recovery.py`, and `cli/doctor.py`.
- `src/my_deepagent/config.py`: `database_url` default switched from
`sqlite+aiosqlite:///<data_dir>/database.sqlite3` to
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. The v3
`devflow` DB is preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `src/my_deepagent/persistence/models.py`: `RunRow.__table_args__` partial
unique index now declares **both** `postgresql_where=` and `sqlite_where=`
so the index is partial on both dialects.
- `src/my_deepagent/cli/doctor.py`: check 8 (`disk+db`) is now dialect-aware
— Postgres path runs `SELECT 1` (pg_isready equivalent: proves
reachability + auth + DB exists); SQLite path keeps
`PRAGMA integrity_check`. Doctor docstring updated.
- `alembic/env.py`: env-aware URL resolution — `MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > Postgres default. Async driver prefixes
(`+asyncpg`, `+aiosqlite`) are mapped to the sync equivalents alembic
needs (`+psycopg`, plain `sqlite`).
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite baseline
`79945fdc2649` + partial-index migration `839f2233e346` deleted.
- `tests/conftest.py` (new): `pg_db_url` async fixture. Creates a fresh
Postgres database per test (against docker-compose `devflow-postgres`)
via the maintenance `postgres` DB; drops on teardown after terminating
any lingering backends. Used by the E2E suite and the new checkpointer
tests.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter tests + 3 async context-manager tests).
- `tests/integration/test_e2e_workflow.py`: switched from `sqlite+aiosqlite`
tmp_path to `pg_db_url`. Real OpenRouter E2E now exercises the production
Postgres path end-to-end (~122 s, ~$0.05/run).
### Migration trigger (per DR-2)
- The bound is *two concurrent writers* on `runs` / `run_phases` / `llm_calls`.
Today the CLI is the only writer — but M8-Py (FastAPI) introduces a second
one, and SQLite WAL allows only a single concurrent writer. Doing the move
*before* M8-Py lands gives the test surface time to harden.
- Recovery: previous SQLite database at
`~/Library/Application Support/my-deepagent/database.sqlite3` (macOS) /
`$XDG_DATA_HOME/my-deepagent/database.sqlite3` is **not migrated** —
v0.1.0 was the only release that wrote to it and v0.2 starts a fresh
history. Set `MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to
read the legacy file if needed.
### Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s) — bulk on sqlite tmp_path, new
checkpointer suite on Postgres `pg_db_url`
- pytest E2E real OpenRouter: 1 PASS (122.93 s) on Postgres backend
## [0.1.0] - 2026-05-16
First tagged milestone of the Python rewrite. The pre-Python-rewrite TypeScript
monorepo has been removed (commit `0e61b2d`); recovery is available via the
`pre-python-rewrite` tag at `c9fed71`.
### Added
- Step 15 — End-to-end real OpenRouter integration: `tests/integration/test_e2e_workflow.py`
runs `spec-and-review@1` workflow (spec → review → verify) end-to-end against real
OpenRouter DeepSeek in ~76s for ~$0.05 per run. `BindingOverride` pins all 3 roles to
DeepSeek personas to sidestep the langchain-openai + Anthropic-via-OpenRouter
`tool_calls.args` JSON-string ValidationError (known v0.1.0 limit). New seed personas:
`openrouter-deepseek-spec-writer@1` (capabilities: spec_write, phase_planning;
max_cost_per_call_usd=0.01) and `openrouter-deepseek-code-reviewer@1` (capabilities:
code_review, evidence_check; max_cost_per_call_usd=0.01). Persona count test updated
to 12. `WorkflowEngine._build_envelope` now inlines the artifact JSON Schema directly
in the prompt so the LLM sees exact required fields. `WorkflowEngine._record_llm_call`
fills every NOT NULL `LlmCallRow` column (thread_id, persona_version, role, turn_index,
cached_tokens, reasoning_tokens, cost_usd_input/output, etc.). `CostMiddleware` now
probes both `usage_metadata` and `response_metadata.token_usage` (prompt_tokens /
completion_tokens fallback) to capture OpenAI-compatible streamed responses forwarded
by OpenRouter.
- Step 12 — Doctor full 8-check + OpenRouter pricing fetch: `mydeepagent doctor`
now runs 8 checks (python / uv / git / workspace_root / config+governance /
openrouter_api_key / openrouter_ping + pricing upsert / disk+sqlite integrity).
`mydeepagent pricing` lists the cached OpenRouter pricing matrix from the
persisted `model_pricing` table. `mydeepagent run` preview now reads from the
persisted `model_pricing` table when populated, falling back to the static seed
otherwise. 26 new tests (23 unit + 3 integration).
- Step 11 — Audit log + secret scrubbing: append-only `{state_dir}/audit.jsonl`
recording every tool call (name/args/duration/error). `AuditToolMiddleware` now
ships with a built-in JSONL recorder (`file_recorder`), attached automatically in
`WorkflowEngine` and Interactive REPL. `structlog` configured project-wide via
`my_deepagent.logging.configure_logging`, with a `_scrub_processor` that redacts
OpenRouter / Anthropic / OpenAI / LangSmith / GitHub / GitLab API keys plus
generic Bearer tokens before they reach stderr or JSON sinks. `audit.py` provides
`append_audit_record` (O_APPEND, 0o600 permissions), `read_audit_records` (with
optional limit, corrupt-line skip), and `make_audit_recorder` async factory.
19 new tests (8 audit unit, 9 logging unit, 3 audit-middleware integration).
- Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches a prompt_toolkit
REPL with `--agent` / `--model` overrides, slash commands (`/help`, `/quit`, `/exit`,
`/agent`, `/model`, `/clear`, `/stats`, `/budget`, `/runs`), file refs
(`@path/to/file.py` expansion with repo-root containment check), and
`CostMiddleware`-wired agent calls so spending is metered per interactive session.
`slash.py` implements `parse_slash` + `SlashRegistry`. `CostMiddleware` gains
`interactive_session_id` parameter. 21 new tests (10 slash unit, 5 file-ref unit,
3 CLI integration, 3 updated CLI unit).
- Step 9 — Crash recovery + concurrency: `sweep_orphan_runs(db)` in
`my_deepagent.recovery` marks non-terminal runs/phases as failed at app startup so
active-run uniqueness slots (partial unique index `ux_active_run_repo_base`) are freed;
`mydeepagent runs list/show/resume` CLI in `my_deepagent.cli.runs` (list with optional
`--state` filter, show by full UUID or 6+ char prefix, resume stub with exit-2 hint);
SIGTERM/SIGINT graceful shutdown in `WorkflowEngine` (`install_signal_handlers`,
`_on_signal`, `_force_cancel_inflight`; 30s grace then cancel in-flight tasks);
auto-sweep on `mydeepagent run` before any new phase begins. 21 new tests.
- Step 8 — Budget guardrails: `BudgetTracker` (SQLite WAL ledger via `BudgetLedgerRow`,
on_hit policy block/warn_continue/prompt, per-run + per-day + per-persona-daily
scopes) in `my_deepagent.budget`; cost preview before `mydeepagent run` (rich table
with per-phase est.) via `my_deepagent.monitoring.cost_estimator`;
`CostMiddleware` integrated with `BudgetTracker` (pre-call assert + post-call record);
`WorkflowEngine` accepts optional `budget_tracker` and `pricing` kwargs (backward-
compatible); CLI: `mydeepagent budget` (ledger), `mydeepagent stats --by model|persona|day`,
`mydeepagent costs` (alias); `--no-preview` flag on `mydeepagent run`.
28 new tests.
- Step 7 — Workflow engine: `WorkflowEngine` in `my_deepagent.engine` orchestrates
phase loop, artifact watcher (write_file/edit_file detection), jsonschema validation
with one repair retry, approval gate, and final report compose (JSON + Markdown).
`ArtifactWatcherMiddleware` in `my_deepagent.middleware.artifact_watcher` intercepts
write_file/edit_file tool calls targeting the expected artifact path.
`RunEventType` + `run_idempotency_key` in `my_deepagent.run_event` (closed event set,
deterministic idempotency keys per plan v2.0 §13.1).
`cli/run.py` exposes `mydeepagent run <workflow.yaml>`.
`tui/approval.py` prompts the user for approve/reject/request_changes/abort.
FK-safe persistence: WorkflowTemplateRow and AgentPersonaRow upserted before RunRow
to satisfy SQLite FK ordering constraints.
18 new tests: 12 engine unit/integration tests + 6 artifact watcher tests.
- Step 6 — Distribution: `mydeepagent init/login/logout/keys/doctor` CLI commands;
platformdirs-based data dirs; OS keyring (macOS Keychain / Linux Secret Service /
Windows Credential Store) for API keys via `my_deepagent.keys`; first-run
governance consent in `governance.py`; secret resolution priority
(config → env → keyring → error) in `my_deepagent.secrets`; i18n catalog
(ko / en) under `my_deepagent.i18n` controlled by `MYDEEPAGENT_LANG`.
- persistence/models.py (P0-1): partial unique index `ux_active_run_repo_base` on `runs(repo_path, base_branch) WHERE state NOT IN ('completed','failed','aborted')` — prevents duplicate active runs per repo/branch
- persistence/models.py (P0-3): FK constraints added to `RunRow.template_id` (RESTRICT), `RunBindingRow.persona_id` (RESTRICT), `InteractiveSessionRow.persona_id` (RESTRICT), `RunEventRow.phase_id` (CASCADE), `ApprovalRequestRow.phase_id` (CASCADE), `ArtifactRow.phase_id` (CASCADE), `ToolCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `LlmCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `PhaseFeedbackRow.run_id/phase_id` (CASCADE)
- alembic/versions/839f2233e346: new migration adding partial unique index and all FK constraints above; uses SQLite table-rebuild pattern with PRAGMA foreign_keys=OFF/ON guard
- persistence/checkpointer.py (P0-4): removed `get_checkpointer` (leaking connection helper); only `get_checkpointer_ctx` context manager is now exported
- tests/integration/test_checkpointer.py: 5 tests for checkpointer ctx lifecycle (file creation, parent dir, connection cleanup, lock-free concurrent use)
- tests/integration/test_persistence.py: 7 new P0 verification tests (active-run partial index blocks/allows, cascade-delete of phase_feedback+run_phases, RESTRICT on template delete, index exists in sqlite_master)
- tests/unit/test_session.py: full rewrite to deepagents dataclass API — FilesystemPermission attribute access (.mode/.paths/.operations), build_backend type dispatch (5 cases), _map_operations deduplication (8 cases), _spec_to_permission mapping, updated _subagent_to_dict and _resolve_openrouter_api_key tests; 47 unit tests total
- tests/integration/test_openrouter_smoke.py: real OpenRouter/DeepSeek smoke test (3 tests, ~$0.001-$0.003/run, max_tokens=50); skipped automatically when no API key is configured; validates ChatOpenAI response, usage_metadata tokens, and deepagents CompiledStateGraph end-to-end
- pyproject.toml: registered `integration` pytest marker to silence --strict-markers error
- v0.1.0 scaffolding (Step 0): src/tests/docs trees, ruff/mypy/pre-commit/alembic config
- Seed assets copied to docs/schemas/ (personas/workflows/artifacts validated)
- Core module (Step 1): config, enums, errors, hash + unit tests
- Persona / Workflow / Binding module (Step 2): pydantic schemas, YAML loaders, deterministic auto-select, override, consent store with atomic write
- Step 1 review patches (P0/P1): exception chain context suppression, classmethod LSP fix, workspace_root realpath canonicalization, config_invalid error mapping
### Changed
- deepagents 0.6.1 LocalShellBackend + permissions conflict workaround: removed `permissions` block from all 10 seed personas; `SafetyShellMiddleware` now enforces destructive-command + secret-path policy at the tool layer for local_shell backend agents.
- `build_agent` automatically prepends `SafetyShellMiddleware` to every agent and skips `permissions` kwarg when `deepagents_backend == "local_shell"`.
- `SafetyShellMiddleware` extended with secret-path enforcement: `read_file`/`write_file`/`edit_file`/`ls` tool calls are blocked when `file_path`/`path` matches any `DENY_PATH_PATTERNS` glob (wcmatch GLOBSTAR|IGNORECASE|DOTGLOB).
- All env vars require `MYDEEPAGENT_` prefix (e.g. `MYDEEPAGENT_OPENROUTER_API_KEY`, `MYDEEPAGENT_BUDGET_DAILY_USD`). `.env.example` updated accordingly. This isolates my-deepagent's env namespace from other tools.
- Persona / Workflow / FilesystemPermission models now store list-valued fields as tuples (deep immutability — prevents post-construction mutation that would invalidate compute_hash()).
### Known limitations (v0.1.0)
- `usage_metadata` is sometimes empty for responses forwarded by OpenRouter (deepagents
wraps the underlying ChatOpenAI response so token counts may not surface). The
`CostMiddleware` recorder still fires and a `LlmCallRow` row is persisted, but
`input_tokens` / `output_tokens` may read as 0 — the E2E test treats this as a known
limit. v0.2 will probe more response shapes (raw chunks / callbacks).
- Anthropic models via OpenRouter currently fail with a `tool_calls.args` JSON-string
vs dict ValidationError inside `langchain-openai`. Workaround: pin DeepSeek personas
via `BindingOverride`. Tracking for v0.2.
- `mydeepagent runs resume <run_id>` is a stub (exit-2 hint only); workflow replay
from a half-run state is not yet implemented.