Files
dev-puppeteer/my-deepagent/CHANGELOG.md
chungyeong ac428ba747 fix(engine): bugs surfaced by manual Web-GUI verification
Both bugs landed during `mydeepagent serve` + real OpenRouter run via
/api/runs. Neither was caught by the test suite — each test uses a fresh
sqlite tmp_path or per-test Postgres DB, so a "second run against existing
data" code path was never exercised.

1. `_compose_final_report` did not persist `RunRow.final_report_path`
   - CLI users received the path from the RunResult return value and never
     noticed.
   - API/GUI users read from the DB → got `null` → no link to the report
     showed up in the run detail page.
   - Fix: at the end of `_compose_final_report`, open a DB session, load
     the RunRow, set `final_report_path = str(json_path)`, commit. Both
     code paths now see the path.

2. `_run_approval_gate` built `idempotency_key = f"{phase_key}:{artifact_name}"`
   - The 2nd run of the same workflow on a populated DB hit
     `approval_requests_idempotency_key_key` UNIQUE violation on the first
     approval gate (`spec:spec.json` already existed from the previous run).
   - The background task died; the run stayed `executing` forever; the GUI
     loop never updated.
   - Fix: prefix with `run_id`: `f"{run_id}:{phase_key}:{artifact_name}"`.
     Same-run replay (resume / repair retry) still collides idempotently as
     intended. ApprovalDecisionRow inherits the new key shape automatically.

Verification
- 4th /api/runs POST against the populated Postgres DB completed in ~2 min,
  spec + review + verify all `completed`, 3 artifacts schema-valid, and
  `RunRow.final_report_path` now resolves to the report .json path.
- Gates: ruff / mypy --strict / 19 engine+resume+wiring tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:52:35 +09:00

346 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
## [Unreleased]
### Fixed
- **bugfix(engine): two production bugs surfaced by manual Web-GUI verification
(`mydeepagent serve` + real OpenRouter run via /api/runs)**.
1. `_compose_final_report` wrote the report files to disk and returned the
path but **did not update `RunRow.final_report_path`**. CLI users
received the path via the `RunResult` return value and never noticed;
API/GUI users read from the DB and got `null`. Fix: after writing the
JSON / MD report, update the RunRow column in a fresh session inside
`_compose_final_report` itself so both consumers see the path.
2. `_run_approval_gate` built `idempotency_key = f"{phase_key}:{artifact_name}"`
for `approval_requests`. The column has a UNIQUE constraint, so the
**second run** of the same workflow against a non-empty DB raised
`asyncpg.UniqueViolationError` on the first approval gate — the
background task died, the run stayed `executing` forever, the GUI
never updated. Confirmed by reading the server log after the GUI got
stuck on a 2nd run. Fix: prefix with `run_id`
(`f"{run_id}:{phase_key}:{artifact_name}"`) so each run has its own
key namespace; same-run replay still collides idempotently as intended.
E2E + integration tests did not catch this because each test uses a
fresh sqlite tmp_path or per-test Postgres DB — no second run ever
hit the same table.
### Added
- **v0.2 PR #3 — FastAPI + SSE + minimal Web GUI (`mydeepagent serve`)**.
Localhost Web UI for run start / list / detail / resume / abort + live
event stream. Closes the v0.1.0 gap "GUI 미존재" from the user's first
session requirements. No auth, no multi-tenant; single uvicorn worker
(per DR-3).
- `pyproject.toml`: runtime deps `fastapi>=0.115`,
`uvicorn[standard]>=0.30`, `sse-starlette>=2.1` (8 transitive deps).
- `src/my_deepagent/api/` (new tree):
- `app.py``create_app(config=None) -> FastAPI` factory. lifespan
stores `db`/`config`/`personas`/`workflows` on `app.state`.
`CORSMiddleware(allow_origin_regex=r"^http://localhost(:\d+)?$")`.
Static frontend mounted under `/static`, plus `/`, `/{page}.html`.
- `models.py` — pydantic v2 DTOs (`RunSummary`, `RunDetail`,
`PhaseInfo`, `ArtifactInfo`, `EventInfo`, `StartRunRequest`,
`StartRunResponse`, `PersonaSummary`, `WorkflowSummary`,
`BudgetSummary`, `BudgetScopeEntry`). All `extra="forbid"` so typos
surface at 422 deserialization time.
- `deps.py``get_db`, `get_config`, `get_personas`, `get_workflows`,
`seed_root`. Annotated[...] wrappers in each route module.
- `runner.py``start_new_run` / `start_resume` /
`is_running`. Pre-allocates a UUID and passes it to
`WorkflowEngine.run(pre_allocated_run_id=...)` so the route can
return the run_id before the phase loop starts. In-memory
`_tasks: dict[UUID, asyncio.Task]` prevents GC of in-flight tasks.
- `sse.py``run_events_stream(db, run_id, last_event_id)`.
0.5 s polling against `run_events.seq > last_event_id`; emits
`ServerSentEvent` per row; sends `event: done` and HTTP-200-closes
when run reaches terminal state.
- `routes/runs.py` — GET `/api/runs?limit=&state=`, GET `/api/runs/{id}`,
POST `/api/runs` (start), POST `/api/runs/{id}/resume`,
POST `/api/runs/{id}/abort`, GET `/api/runs/{id}/events` (SSE).
`Last-Event-ID` HTTP header honored alongside `?last_event_id=`.
- `routes/personas.py` — GET `/api/personas`.
- `routes/workflows.py` — GET `/api/workflows`.
- `routes/budget.py` — GET `/api/budget` (day / runs / personas
buckets with cap + warn thresholds from `Config`).
- `src/my_deepagent/cli/serve.py` (new) — `mydeepagent serve [--host
127.0.0.1] [--port 8000]`. Loud stderr warning when host is not
loopback (the API is unauthenticated). Uses uvicorn factory form +
forces `workers=1`.
- `src/my_deepagent/cli/main.py` — `serve` command registered.
- `src/my_deepagent/engine.py` — `WorkflowEngine.run` gained
`pre_allocated_run_id: UUID | None = None` so the FastAPI runner can
return the run_id immediately. Default behavior unchanged.
- `static/` (new) — vanilla HTML/JS/CSS, no build system:
- `index.html` — 런 목록 + 예산 (data-page="index")
- `new.html` — 신규 run 폼 (workflow select, repo path, requirements,
per-role persona override) (data-page="new")
- `run.html` — run 상세 + SSE 이벤트 라이브 + resume/abort 버튼
(data-page="run")
- `app.js` — fetch + EventSource. **XSS policy hardcoded at the top
of the file**: `element.textContent` only, `innerHTML` /
`insertAdjacentHTML` / `outerHTML` forbidden.
- `style.css` — dark theme, single file.
- Tests (new):
- `tests/integration/test_api_read.py` — 5 cases (list empty, get 404,
personas seed count, workflows seed, budget empty).
- `tests/integration/test_api_write.py` — 5 cases (missing template
400, extra field 422, resume 404, abort 404, mock-runner happy path).
- `tests/integration/test_api_sse.py` — 1 case: seed terminal run +
events, drain stream, assert types present and stream closes.
- `tests/integration/test_api_static.py` — 5 cases: index/new/run
HTML 200, app.js content-type + XSS-policy substring, style.css
content-type.
All tests use `httpx.ASGITransport` + `app.router.lifespan_context`
(httpx does not auto-trigger FastAPI lifespan) and sqlite tmp_path.
- **v0.2 PR #2b — `mydeepagent runs resume <id>` real implementation**.
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Reuses
v0.2 PR #2a's LangGraph wiring + sweep_orphan_runs's DB state machine,
no Temporal (DR-3).
- `src/my_deepagent/engine.py`:
- New `WorkflowEngine.resume(run_id)` async method. Loads `RunRow`,
rejects terminal states with `MyDeepAgentError.human_required("run_already_terminal")`,
reloads `worktree_root` + `WorkflowTemplate` (via `_reload_template`) +
bindings (via `_reload_bindings`) from DB. Does **not** call
`bind_personas` again — locks in the original binding so consent /
pool changes don't silently shift roles.
- New `_execute_run` helper (shared phase loop) extracted from `run()`.
Skips already-`completed` phases (emits `phase.skipped` event) and
re-executes the rest. Both `run` (new) and `resume` dispatch through
it.
- New helpers: `_get_run_or_raise`, `_reload_template`,
`_reload_bindings` (rebuilds `{role_id: Binding}` from
`run_bindings` ⨝ `agent_personas`; corrupt persona rows are logged
and skipped, surfacing as `run_metadata_missing` if no bindings remain),
`_get_completed_phase_keys`.
- New `RunEventType.RUN_RESUMED` and `RunEventType.PHASE_SKIPPED` are
now actually emitted (the enum members existed already from v0.1.0).
- `src/my_deepagent/cli/runs.py` `_runs_resume_async`: stub → real impl.
Validates run exists + non-terminal, loads seed personas + artifact
schemas (`docs/schemas/`), constructs `WorkflowEngine` with a
"abort-on-new-approval" callback, calls `engine.resume(UUID(id))`,
prints final state + report path. Catches `MyDeepAgentError` and prints
a red error with exit 1.
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase workflow: phase 1 succeeds, phase 2 fails → flip run row
back to executing → resume → phase 2 completes; assert phase 1 was
skipped (`phase.skipped` event present) and `run.resumed` event emitted.
2. Terminal run → `resume()` raises `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → raises `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → raises `MyDeepAgentError(code="run_metadata_missing")`.
5. workflow_templates.definition is malformed → raises `MyDeepAgentError(code="template_load_failed")`.
- E2E real OpenRouter regression PASS 78.52 s (baseline 71122 s);
within DR-3 acceptance threshold (+20%).
- **v0.2 PR #2a — LangGraph `AsyncPostgresSaver` engine wiring** (foundation
for `runs resume`). v0.2 PR #1 added the dependency; this commit actually
uses it.
- `src/my_deepagent/engine.py`:
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None` (defaults
to `config.database_url`).
- New `_maybe_open_saver` async context: opens `get_checkpointer_ctx` for
`postgresql{,+asyncpg,+psycopg}://` URLs, yields `None` for `sqlite+aiosqlite://`
(test affordance — production always Postgres per DR-2 / DR-3).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases via `self._saver` — opening per-phase would re-connect
5+ times for no isolation gain (checkpoints are keyed by `thread_id`, not
saver instance).
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes `config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}`
to `agent.ainvoke`. Same `thread_id` format already used by
`LlmCallRow.thread_id` (cost ledger), so one key namespace covers both.
- `tests/integration/test_engine_checkpointer_wiring.py` (new):
1. **Contract 1 — engine wiring**: `build_agent` receives a non-None saver;
`agent.ainvoke` receives `config.configurable.thread_id` in the
expected `run:<uuid>:phase:<uuid>` format.
2. **Contract 2 — LangGraph thread isolation**: two distinct `thread_id`s
write independent rows in the auto-created `checkpoints` table; aput /
aget round-trip preserves per-thread identity (sanity check against
future deepagents wrap regressions).
- `tests/integration/test_engine.py` — 5 mock-agent tests: fake `_ainvoke`
signature widened with `**_kwargs` to accept the new `config=` arg.
- E2E real OpenRouter regression PASS 75.99 s (baseline 71122 s); within
DR-3 acceptance threshold (+20%).
- **v0.2 PR #1 — Postgres migration**: production backing store switched from
SQLite to PostgreSQL 16 ahead of M8-Py (FastAPI) per DR-2.
- `pyproject.toml`: `asyncpg>=0.30` + `psycopg[binary]>=3.2` +
`langgraph-checkpoint-postgres>=2.0.0` added to runtime; `aiosqlite>=0.20`
moved to `[dependency-groups].dev` (test-only); `langgraph-checkpoint-sqlite`
removed.
- `src/my_deepagent/persistence/db.py`: dialect-aware connect listener —
SQLite still gets `WAL` + `busy_timeout=5000` + `foreign_keys=ON`, Postgres
gets `SET TIME ZONE 'UTC'`. New `Database.dialect_name` property + `drop_schema`
method for tests.
- `src/my_deepagent/persistence/checkpointer.py`: `SqliteSaver` →
`AsyncPostgresSaver`. API is now async (`async with`) and takes a
connection string; SQLAlchemy URL prefixes (`postgresql+asyncpg://`,
`postgresql+psycopg://`) are auto-stripped to a plain libpq DSN. New
`_to_psycopg_dsn` helper covered by 4 unit tests.
- `src/my_deepagent/persistence/upsert.py` (new): `insert_for(session)` —
dialect-aware UPSERT helper. Picks `postgresql.insert` or `sqlite.insert`
based on the bound engine's dialect. Replaces 5 hardcoded `sqlite_insert`
call sites in `budget.py`, `recovery.py`, and `cli/doctor.py`.
- `src/my_deepagent/config.py`: `database_url` default switched from
`sqlite+aiosqlite:///<data_dir>/database.sqlite3` to
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. The v3
`devflow` DB is preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `src/my_deepagent/persistence/models.py`: `RunRow.__table_args__` partial
unique index now declares **both** `postgresql_where=` and `sqlite_where=`
so the index is partial on both dialects.
- `src/my_deepagent/cli/doctor.py`: check 8 (`disk+db`) is now dialect-aware
— Postgres path runs `SELECT 1` (pg_isready equivalent: proves
reachability + auth + DB exists); SQLite path keeps
`PRAGMA integrity_check`. Doctor docstring updated.
- `alembic/env.py`: env-aware URL resolution — `MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > Postgres default. Async driver prefixes
(`+asyncpg`, `+aiosqlite`) are mapped to the sync equivalents alembic
needs (`+psycopg`, plain `sqlite`).
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite baseline
`79945fdc2649` + partial-index migration `839f2233e346` deleted.
- `tests/conftest.py` (new): `pg_db_url` async fixture. Creates a fresh
Postgres database per test (against docker-compose `devflow-postgres`)
via the maintenance `postgres` DB; drops on teardown after terminating
any lingering backends. Used by the E2E suite and the new checkpointer
tests.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter tests + 3 async context-manager tests).
- `tests/integration/test_e2e_workflow.py`: switched from `sqlite+aiosqlite`
tmp_path to `pg_db_url`. Real OpenRouter E2E now exercises the production
Postgres path end-to-end (~122 s, ~$0.05/run).
### Migration trigger (per DR-2)
- The bound is *two concurrent writers* on `runs` / `run_phases` / `llm_calls`.
Today the CLI is the only writer — but M8-Py (FastAPI) introduces a second
one, and SQLite WAL allows only a single concurrent writer. Doing the move
*before* M8-Py lands gives the test surface time to harden.
- Recovery: previous SQLite database at
`~/Library/Application Support/my-deepagent/database.sqlite3` (macOS) /
`$XDG_DATA_HOME/my-deepagent/database.sqlite3` is **not migrated** —
v0.1.0 was the only release that wrote to it and v0.2 starts a fresh
history. Set `MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to
read the legacy file if needed.
### Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s) — bulk on sqlite tmp_path, new
checkpointer suite on Postgres `pg_db_url`
- pytest E2E real OpenRouter: 1 PASS (122.93 s) on Postgres backend
## [0.1.0] - 2026-05-16
First tagged milestone of the Python rewrite. The pre-Python-rewrite TypeScript
monorepo has been removed (commit `0e61b2d`); recovery is available via the
`pre-python-rewrite` tag at `c9fed71`.
### Added
- Step 15 — End-to-end real OpenRouter integration: `tests/integration/test_e2e_workflow.py`
runs `spec-and-review@1` workflow (spec → review → verify) end-to-end against real
OpenRouter DeepSeek in ~76s for ~$0.05 per run. `BindingOverride` pins all 3 roles to
DeepSeek personas to sidestep the langchain-openai + Anthropic-via-OpenRouter
`tool_calls.args` JSON-string ValidationError (known v0.1.0 limit). New seed personas:
`openrouter-deepseek-spec-writer@1` (capabilities: spec_write, phase_planning;
max_cost_per_call_usd=0.01) and `openrouter-deepseek-code-reviewer@1` (capabilities:
code_review, evidence_check; max_cost_per_call_usd=0.01). Persona count test updated
to 12. `WorkflowEngine._build_envelope` now inlines the artifact JSON Schema directly
in the prompt so the LLM sees exact required fields. `WorkflowEngine._record_llm_call`
fills every NOT NULL `LlmCallRow` column (thread_id, persona_version, role, turn_index,
cached_tokens, reasoning_tokens, cost_usd_input/output, etc.). `CostMiddleware` now
probes both `usage_metadata` and `response_metadata.token_usage` (prompt_tokens /
completion_tokens fallback) to capture OpenAI-compatible streamed responses forwarded
by OpenRouter.
- Step 12 — Doctor full 8-check + OpenRouter pricing fetch: `mydeepagent doctor`
now runs 8 checks (python / uv / git / workspace_root / config+governance /
openrouter_api_key / openrouter_ping + pricing upsert / disk+sqlite integrity).
`mydeepagent pricing` lists the cached OpenRouter pricing matrix from the
persisted `model_pricing` table. `mydeepagent run` preview now reads from the
persisted `model_pricing` table when populated, falling back to the static seed
otherwise. 26 new tests (23 unit + 3 integration).
- Step 11 — Audit log + secret scrubbing: append-only `{state_dir}/audit.jsonl`
recording every tool call (name/args/duration/error). `AuditToolMiddleware` now
ships with a built-in JSONL recorder (`file_recorder`), attached automatically in
`WorkflowEngine` and Interactive REPL. `structlog` configured project-wide via
`my_deepagent.logging.configure_logging`, with a `_scrub_processor` that redacts
OpenRouter / Anthropic / OpenAI / LangSmith / GitHub / GitLab API keys plus
generic Bearer tokens before they reach stderr or JSON sinks. `audit.py` provides
`append_audit_record` (O_APPEND, 0o600 permissions), `read_audit_records` (with
optional limit, corrupt-line skip), and `make_audit_recorder` async factory.
19 new tests (8 audit unit, 9 logging unit, 3 audit-middleware integration).
- Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches a prompt_toolkit
REPL with `--agent` / `--model` overrides, slash commands (`/help`, `/quit`, `/exit`,
`/agent`, `/model`, `/clear`, `/stats`, `/budget`, `/runs`), file refs
(`@path/to/file.py` expansion with repo-root containment check), and
`CostMiddleware`-wired agent calls so spending is metered per interactive session.
`slash.py` implements `parse_slash` + `SlashRegistry`. `CostMiddleware` gains
`interactive_session_id` parameter. 21 new tests (10 slash unit, 5 file-ref unit,
3 CLI integration, 3 updated CLI unit).
- Step 9 — Crash recovery + concurrency: `sweep_orphan_runs(db)` in
`my_deepagent.recovery` marks non-terminal runs/phases as failed at app startup so
active-run uniqueness slots (partial unique index `ux_active_run_repo_base`) are freed;
`mydeepagent runs list/show/resume` CLI in `my_deepagent.cli.runs` (list with optional
`--state` filter, show by full UUID or 6+ char prefix, resume stub with exit-2 hint);
SIGTERM/SIGINT graceful shutdown in `WorkflowEngine` (`install_signal_handlers`,
`_on_signal`, `_force_cancel_inflight`; 30s grace then cancel in-flight tasks);
auto-sweep on `mydeepagent run` before any new phase begins. 21 new tests.
- Step 8 — Budget guardrails: `BudgetTracker` (SQLite WAL ledger via `BudgetLedgerRow`,
on_hit policy block/warn_continue/prompt, per-run + per-day + per-persona-daily
scopes) in `my_deepagent.budget`; cost preview before `mydeepagent run` (rich table
with per-phase est.) via `my_deepagent.monitoring.cost_estimator`;
`CostMiddleware` integrated with `BudgetTracker` (pre-call assert + post-call record);
`WorkflowEngine` accepts optional `budget_tracker` and `pricing` kwargs (backward-
compatible); CLI: `mydeepagent budget` (ledger), `mydeepagent stats --by model|persona|day`,
`mydeepagent costs` (alias); `--no-preview` flag on `mydeepagent run`.
28 new tests.
- Step 7 — Workflow engine: `WorkflowEngine` in `my_deepagent.engine` orchestrates
phase loop, artifact watcher (write_file/edit_file detection), jsonschema validation
with one repair retry, approval gate, and final report compose (JSON + Markdown).
`ArtifactWatcherMiddleware` in `my_deepagent.middleware.artifact_watcher` intercepts
write_file/edit_file tool calls targeting the expected artifact path.
`RunEventType` + `run_idempotency_key` in `my_deepagent.run_event` (closed event set,
deterministic idempotency keys per plan v2.0 §13.1).
`cli/run.py` exposes `mydeepagent run <workflow.yaml>`.
`tui/approval.py` prompts the user for approve/reject/request_changes/abort.
FK-safe persistence: WorkflowTemplateRow and AgentPersonaRow upserted before RunRow
to satisfy SQLite FK ordering constraints.
18 new tests: 12 engine unit/integration tests + 6 artifact watcher tests.
- Step 6 — Distribution: `mydeepagent init/login/logout/keys/doctor` CLI commands;
platformdirs-based data dirs; OS keyring (macOS Keychain / Linux Secret Service /
Windows Credential Store) for API keys via `my_deepagent.keys`; first-run
governance consent in `governance.py`; secret resolution priority
(config → env → keyring → error) in `my_deepagent.secrets`; i18n catalog
(ko / en) under `my_deepagent.i18n` controlled by `MYDEEPAGENT_LANG`.
- persistence/models.py (P0-1): partial unique index `ux_active_run_repo_base` on `runs(repo_path, base_branch) WHERE state NOT IN ('completed','failed','aborted')` — prevents duplicate active runs per repo/branch
- persistence/models.py (P0-3): FK constraints added to `RunRow.template_id` (RESTRICT), `RunBindingRow.persona_id` (RESTRICT), `InteractiveSessionRow.persona_id` (RESTRICT), `RunEventRow.phase_id` (CASCADE), `ApprovalRequestRow.phase_id` (CASCADE), `ArtifactRow.phase_id` (CASCADE), `ToolCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `LlmCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `PhaseFeedbackRow.run_id/phase_id` (CASCADE)
- alembic/versions/839f2233e346: new migration adding partial unique index and all FK constraints above; uses SQLite table-rebuild pattern with PRAGMA foreign_keys=OFF/ON guard
- persistence/checkpointer.py (P0-4): removed `get_checkpointer` (leaking connection helper); only `get_checkpointer_ctx` context manager is now exported
- tests/integration/test_checkpointer.py: 5 tests for checkpointer ctx lifecycle (file creation, parent dir, connection cleanup, lock-free concurrent use)
- tests/integration/test_persistence.py: 7 new P0 verification tests (active-run partial index blocks/allows, cascade-delete of phase_feedback+run_phases, RESTRICT on template delete, index exists in sqlite_master)
- tests/unit/test_session.py: full rewrite to deepagents dataclass API — FilesystemPermission attribute access (.mode/.paths/.operations), build_backend type dispatch (5 cases), _map_operations deduplication (8 cases), _spec_to_permission mapping, updated _subagent_to_dict and _resolve_openrouter_api_key tests; 47 unit tests total
- tests/integration/test_openrouter_smoke.py: real OpenRouter/DeepSeek smoke test (3 tests, ~$0.001-$0.003/run, max_tokens=50); skipped automatically when no API key is configured; validates ChatOpenAI response, usage_metadata tokens, and deepagents CompiledStateGraph end-to-end
- pyproject.toml: registered `integration` pytest marker to silence --strict-markers error
- v0.1.0 scaffolding (Step 0): src/tests/docs trees, ruff/mypy/pre-commit/alembic config
- Seed assets copied to docs/schemas/ (personas/workflows/artifacts validated)
- Core module (Step 1): config, enums, errors, hash + unit tests
- Persona / Workflow / Binding module (Step 2): pydantic schemas, YAML loaders, deterministic auto-select, override, consent store with atomic write
- Step 1 review patches (P0/P1): exception chain context suppression, classmethod LSP fix, workspace_root realpath canonicalization, config_invalid error mapping
### Changed
- deepagents 0.6.1 LocalShellBackend + permissions conflict workaround: removed `permissions` block from all 10 seed personas; `SafetyShellMiddleware` now enforces destructive-command + secret-path policy at the tool layer for local_shell backend agents.
- `build_agent` automatically prepends `SafetyShellMiddleware` to every agent and skips `permissions` kwarg when `deepagents_backend == "local_shell"`.
- `SafetyShellMiddleware` extended with secret-path enforcement: `read_file`/`write_file`/`edit_file`/`ls` tool calls are blocked when `file_path`/`path` matches any `DENY_PATH_PATTERNS` glob (wcmatch GLOBSTAR|IGNORECASE|DOTGLOB).
- All env vars require `MYDEEPAGENT_` prefix (e.g. `MYDEEPAGENT_OPENROUTER_API_KEY`, `MYDEEPAGENT_BUDGET_DAILY_USD`). `.env.example` updated accordingly. This isolates my-deepagent's env namespace from other tools.
- Persona / Workflow / FilesystemPermission models now store list-valued fields as tuples (deep immutability — prevents post-construction mutation that would invalidate compute_hash()).
### Known limitations (v0.1.0)
- `usage_metadata` is sometimes empty for responses forwarded by OpenRouter (deepagents
wraps the underlying ChatOpenAI response so token counts may not surface). The
`CostMiddleware` recorder still fires and a `LlmCallRow` row is persisted, but
`input_tokens` / `output_tokens` may read as 0 — the E2E test treats this as a known
limit. v0.2 will probe more response shapes (raw chunks / callbacks).
- Anthropic models via OpenRouter currently fail with a `tool_calls.args` JSON-string
vs dict ValidationError inside `langchain-openai`. Workaround: pin DeepSeek personas
via `BindingOverride`. Tracking for v0.2.
- `mydeepagent runs resume <run_id>` is a stub (exit-2 hint only); workflow replay
from a half-run state is not yet implemented.