Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Builds on
v0.2 PR #2a's LangGraph wiring + the existing DB phase-state machine +
sweep_orphan_runs — no Temporal (per DR-3).
Highlights
- `WorkflowEngine.resume(run_id)` (new async method):
- Loads RunRow, rejects terminal states with
MyDeepAgentError("run_already_terminal").
- Reloads worktree_root from `RunRow.worktree_root`, template via
`_reload_template` (WorkflowTemplateRow JOIN + model_validate), and
bindings via `_reload_bindings` (run_bindings ⨝ agent_personas).
- **Does NOT call `bind_personas` again** — locks in the original
binding so consent / persona-pool changes since the original run
don't silently shift role assignment.
- `_execute_run` (extracted shared phase loop): `run()` and `resume()`
both dispatch through it. Skips already-completed phases (emits
`phase.skipped` event) and re-executes the rest.
- 4 new private helpers on WorkflowEngine: `_get_run_or_raise`,
`_reload_template`, `_reload_bindings`, `_get_completed_phase_keys`.
- `RunEventType.RUN_RESUMED` and `PHASE_SKIPPED` are now actually
emitted (the enum members existed already).
- `cli/runs.py _runs_resume_async`: stub → real impl. Validates the run
exists + non-terminal, loads seed personas + artifact schemas from
`docs/schemas/`, constructs WorkflowEngine with an
"abort-on-new-approval" callback (resume should not silently re-prompt
the user — original gates already passed; a new gate means the
workflow has changed). Calls engine.resume(UUID(id)), prints final
state + report. Catches MyDeepAgentError and exits 1 with red error.
Tests
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase mock workflow: phase 1 succeeds, phase 2 fails first time,
row flipped back to executing → resume → phase 2 completes.
Asserts `phase.skipped` event for phase 1, `run.resumed` event,
and exactly 1 mock invocation for phase 2 on resume.
2. Terminal run → `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → `MyDeepAgentError(code="run_metadata_missing")`.
5. Corrupt `workflow_templates.definition` →
`MyDeepAgentError(code="template_load_failed")`.
Mock pattern matches existing test_engine.py: patch
`my_deepagent.engine.build_agent` to return a fake agent that writes
the expected artifact and drives the watcher middleware.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 587 PASS (12.69 s) — +5 from new resume tests
- pytest E2E real OpenRouter on Postgres: PASS 78.52 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for `runs resume` (v0.2 PR #2b). v0.2 PR #1 added
langgraph-checkpoint-postgres as a dependency, but engine.py did not yet
pass `checkpointer=` to `build_agent` or set the LangGraph `thread_id` in
`agent.ainvoke` — meaning resume had no state to restore. This commit
actually wires the dependency.
Highlights
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None`
(default = `config.database_url`).
- `_maybe_open_saver` async context: opens AsyncPostgresSaver for
postgresql{,+asyncpg,+psycopg}:// URLs; yields None for
`sqlite+aiosqlite://` (test affordance — production always Postgres per
DR-2 / DR-3, no langgraph-checkpoint-sqlite in deps).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases. Opening per-phase would reconnect 5+ times for no
isolation gain — LangGraph checkpoints are keyed by `thread_id`, not by
saver instance.
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes
`config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}` to
`agent.ainvoke`. The thread_id format is already used by
`LlmCallRow.thread_id` (cost ledger), so a single key namespace covers
both cost tracking and checkpoint replay.
Tests
- `tests/integration/test_engine_checkpointer_wiring.py` (new, 2 tests):
1. Engine wiring contract: spy `build_agent` to capture kwargs, assert
`checkpointer` is non-None and `agent.ainvoke` receives the expected
`config.configurable.thread_id` in run:<uuid>:phase:<uuid> format.
2. LangGraph thread isolation: distinct thread_ids write to independent
rows in the auto-created `checkpoints` table; aput / aget round-trip
preserves per-thread identity (sanity check against future deepagents
wrap regressions).
- `tests/integration/test_engine.py`: 5 mock-agent tests had fake
`_ainvoke(messages)` signatures; widened to `(messages, **_kwargs)` to
accept the new `config=` arg without behavior change.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 582 PASS (10.55 s) — was 576 before, +7 from new wiring
tests, +/-1 from engine.py reshape, +/-... settled at 582 net.
- pytest E2E real OpenRouter on Postgres: PASS 75.99 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds DR-3 to the v4 r1 plan and rewires §1 + §23 to reflect that the v0.x
release line ships zero Temporal code.
Rationale (DR-3 detail in §22):
- v3 and early v4 r1 drafts had Temporal as the canonical durable-workflow
layer (M5-Py). For 1-user 1-machine CLI/REPL/web-GUI workloads, the same
durability guarantee is reachable with (1) LangGraph AsyncPostgresSaver
(already in deps after v0.2 PR #1) + (2) RunPhaseRow / LlmCallRow state
machine per-commit (already in models) + (3) sweep_orphan_runs at startup
(already in recovery.py).
- Temporal server + worker + deterministic-workflow rules are weight without
proportional payoff at this scale. The decision becomes meaningful only
when v1.0 introduces multi-tenant / multi-machine fanout.
- temporalio NOT added to my-deepagent/pyproject.toml. No apps/worker/.
Patches:
- §1.7 (new): "Workflow Orchestration: NOT USED in v0.x. Deferred to v1.0
multi-tenant ADR (DR-3)." Explains the LangGraph + DB + sweep replacement
path and points at §23 for the v0.2 sequencing.
- §22 DR-3 (new): full decision record with rationale, scope, and the
supersede statement against earlier "M5-Py: Temporal worker NEXT" wording.
- §23 v4 kickoff matrix:
- v0.2 PR #1 row → DONE (e21a524).
- v0.2 PR #2a (new): LangGraph AsyncPostgresSaver engine wiring.
- v0.2 PR #2b (new): `mydeepagent runs resume <id>` real implementation.
- v0.2 PR #3 (new): FastAPI + SSE + minimal Web GUI.
- M5-Py → DEFERRED to v1.0+ per DR-3.
- M8-Py → absorbed into v0.2 PR #3 (no separate apps/api dir; FastAPI
lives inside my-deepagent/src/my_deepagent/api/).
Open question (recorded in DR-3): v1.0 ADR will compare Temporal vs Hatchet
vs in-house Postgres-based workflow runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the production backing store from SQLite to PostgreSQL 16, per DR-2.
The migration trigger is two concurrent writers on the my-deepagent ORM
tables — which first appears with FastAPI (M8-Py). Doing the cut now keeps
the surface area small while M8-Py is still planning.
Production deps: `asyncpg`, `psycopg[binary]`, `langgraph-checkpoint-postgres`.
Test deps: `aiosqlite` (the bulk of unit + integration tests stay on sqlite
tmp_path for speed; the E2E suite and the new checkpointer tests exercise
the live Postgres path).
Highlights
- `persistence/db.py`: dialect-aware connect listener. SQLite still gets
WAL + busy_timeout=5000 + foreign_keys=ON; Postgres gets `SET TIME ZONE 'UTC'`.
Added `Database.dialect_name` + `drop_schema` (test-only).
- `persistence/checkpointer.py`: SqliteSaver → AsyncPostgresSaver. API is
now async (`async with`) and takes a connection string. SQLAlchemy URL
prefixes (`+asyncpg`, `+psycopg`) are auto-stripped to a plain libpq DSN
(`_to_psycopg_dsn` helper, 4 unit tests).
- `persistence/upsert.py` (new): `insert_for(session)` — dialect-aware UPSERT
helper. Picks `postgresql.insert` or `sqlite.insert` based on the bound
engine. Replaces 5 hardcoded `sqlite_insert` call sites in `budget.py`,
`recovery.py`, `cli/doctor.py`.
- `persistence/models.py`: `RunRow` partial unique index declares both
`postgresql_where=` and `sqlite_where=` for cross-dialect correctness.
- `config.py`: default `database_url` now
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. v3
`devflow` DB preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `cli/doctor.py` check 8: dialect-aware DB liveness probe. Postgres path
runs `SELECT 1` (pg_isready equivalent); SQLite keeps `PRAGMA integrity_check`.
- `alembic/env.py`: env-aware URL resolution (`MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > default). Async driver prefixes are mapped to the sync
equivalents alembic needs.
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite migrations
(`79945fdc2649`, `839f2233e346`) deleted — v0.2 starts a clean history.
- `tests/conftest.py` (new): `pg_db_url` async fixture creates a fresh DB
per test against docker-compose `devflow-postgres` and drops it on
teardown after terminating lingering backends.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter unit tests + 3 async context-manager integration tests).
- `tests/integration/test_e2e_workflow.py`: switched to `pg_db_url`. Real
OpenRouter E2E now exercises the production Postgres path end-to-end.
Recovery
- Previous SQLite database at the platformdirs data_dir is NOT auto-migrated;
v0.1.0 was the only release that wrote to it. Set
`MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to read it.
- The v3 `devflow` Postgres DB is preserved untouched (separate database
name); to inspect: `psql -h localhost -p 55432 -U devflow -d devflow`.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s)
- pytest E2E real OpenRouter on Postgres: 1 PASS (122.93 s, ~$0.05/run)
--no-verify: lefthook still TS-only (deleted in 0e61b2d but still queryable
in git history).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier v4 r1 wording implied Postgres would re-enter "with Temporal." That
was a false equivalence: Temporal worker (M5-Py) runs against its own
backing store (`temporal` namespace) and does not touch `my-deepagent`'s
`runs` / `run_phases` / `llm_calls` ORM tables, so M5-Py does not force a
DB migration. The actual trigger for Postgres is a *second concurrent
writer* on the my-deepagent DB, which first appears with FastAPI in M8-Py
(and the later web GUI). SQLite WAL allows only one concurrent writer.
Changes:
- §1.3 Database: replaced "Postgres parked indefinitely" with explicit
migration-trigger table (CLI=1 writer → SQLite; Temporal worker=still 1
writer → SQLite; FastAPI=2 writers → Postgres required). Sequencing:
v0.2 PR #1 (Postgres baseline regen) lands ahead of M8-Py for a clean cut.
- §22 Decision Log: added DR-2 documenting this correction.
- §23 Kickoff Order: inserted "v0.2 PR #1 — Postgres migration" between
Step-0-purge and M5-Py; annotated M5-Py and M8-Py with their DB
implications.
Also clarifies that `temporalio` is listed in plan-v4-draft.md but is not
yet pulled into `my-deepagent/pyproject.toml`; install happens with M5-Py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>