feat(my-deepagent): v0.2 PR #2b — mydeepagent runs resume <id> real implementation

Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Builds on v0.2 PR #2a's LangGraph wiring + the existing DB phase-state machine + sweep_orphan_runs — no Temporal (per DR-3). Highlights - `WorkflowEngine.resume(run_id)` (new async method): - Loads RunRow, rejects terminal states with MyDeepAgentError("run_already_terminal"). - Reloads worktree_root from `RunRow.worktree_root`, template via `_reload_template` (WorkflowTemplateRow JOIN + model_validate), and bindings via `_reload_bindings` (run_bindings ⨝ agent_personas). - **Does NOT call `bind_personas` again** — locks in the original binding so consent / persona-pool changes since the original run don't silently shift role assignment. - `_execute_run` (extracted shared phase loop): `run()` and `resume()` both dispatch through it. Skips already-completed phases (emits `phase.skipped` event) and re-executes the rest. - 4 new private helpers on WorkflowEngine: `_get_run_or_raise`, `_reload_template`, `_reload_bindings`, `_get_completed_phase_keys`. - `RunEventType.RUN_RESUMED` and `PHASE_SKIPPED` are now actually emitted (the enum members existed already). - `cli/runs.py _runs_resume_async`: stub → real impl. Validates the run exists + non-terminal, loads seed personas + artifact schemas from `docs/schemas/`, constructs WorkflowEngine with an "abort-on-new-approval" callback (resume should not silently re-prompt the user — original gates already passed; a new gate means the workflow has changed). Calls engine.resume(UUID(id)), prints final state + report. Catches MyDeepAgentError and exits 1 with red error. Tests - `tests/integration/test_resume.py` (new, 5 scenarios): 1. 2-phase mock workflow: phase 1 succeeds, phase 2 fails first time, row flipped back to executing → resume → phase 2 completes. Asserts `phase.skipped` event for phase 1, `run.resumed` event, and exactly 1 mock invocation for phase 2 on resume. 2. Terminal run → `MyDeepAgentError(code="run_already_terminal")`. 3. Unknown run id → `MyDeepAgentError(code="run_not_found")`. 4. RunBindingRow rows missing → `MyDeepAgentError(code="run_metadata_missing")`. 5. Corrupt `workflow_templates.definition` → `MyDeepAgentError(code="template_load_failed")`. Mock pattern matches existing test_engine.py: patch `my_deepagent.engine.build_agent` to return a fake agent that writes the expected artifact and drives the watcher middleware. Gates - ruff check + ruff format --check + mypy --strict: PASS (103 source files) - pytest non-E2E: 587 PASS (12.69 s) — +5 from new resume tests - pytest E2E real OpenRouter on Postgres: PASS 78.52 s (baseline 71–122 s; within DR-3 acceptance threshold ≤+20%) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:07:24 +09:00
parent 50aacd3382
commit 501292a5cd
4 changed files with 804 additions and 16 deletions
--- a/my-deepagent/CHANGELOG.md
+++ b/my-deepagent/CHANGELOG.md
@@ -3,6 +3,45 @@
 ## [Unreleased]
 ### Added
 - **v0.2 PR #2b — `mydeepagent runs resume <id>` real implementation**.
  Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Reuses
  v0.2 PR #2a's LangGraph wiring + sweep_orphan_runs's DB state machine,
  no Temporal (DR-3).
  - `src/my_deepagent/engine.py`:
    - New `WorkflowEngine.resume(run_id)` async method. Loads `RunRow`,
      rejects terminal states with `MyDeepAgentError.human_required("run_already_terminal")`,
      reloads `worktree_root` + `WorkflowTemplate` (via `_reload_template`) +
      bindings (via `_reload_bindings`) from DB. Does **not** call
      `bind_personas` again — locks in the original binding so consent /
      pool changes don't silently shift roles.
    - New `_execute_run` helper (shared phase loop) extracted from `run()`.
      Skips already-`completed` phases (emits `phase.skipped` event) and
      re-executes the rest. Both `run` (new) and `resume` dispatch through
      it.
    - New helpers: `_get_run_or_raise`, `_reload_template`,
      `_reload_bindings` (rebuilds `{role_id: Binding}` from
      `run_bindings` ⨝ `agent_personas`; corrupt persona rows are logged
      and skipped, surfacing as `run_metadata_missing` if no bindings remain),
      `_get_completed_phase_keys`.
    - New `RunEventType.RUN_RESUMED` and `RunEventType.PHASE_SKIPPED` are
      now actually emitted (the enum members existed already from v0.1.0).
  - `src/my_deepagent/cli/runs.py` `_runs_resume_async`: stub → real impl.
    Validates run exists + non-terminal, loads seed personas + artifact
    schemas (`docs/schemas/`), constructs `WorkflowEngine` with a
    "abort-on-new-approval" callback, calls `engine.resume(UUID(id))`,
    prints final state + report path. Catches `MyDeepAgentError` and prints
    a red error with exit 1.
  - `tests/integration/test_resume.py` (new, 5 scenarios):
    1. 2-phase workflow: phase 1 succeeds, phase 2 fails → flip run row
       back to executing → resume → phase 2 completes; assert phase 1 was
       skipped (`phase.skipped` event present) and `run.resumed` event emitted.
    2. Terminal run → `resume()` raises `MyDeepAgentError(code="run_already_terminal")`.
    3. Unknown run id → raises `MyDeepAgentError(code="run_not_found")`.
    4. RunBindingRow rows missing → raises `MyDeepAgentError(code="run_metadata_missing")`.
    5. workflow_templates.definition is malformed → raises `MyDeepAgentError(code="template_load_failed")`.
  - E2E real OpenRouter regression PASS 78.52 s (baseline 71–122 s);
    within DR-3 acceptance threshold (+20%).
 - **v0.2 PR #2a — LangGraph `AsyncPostgresSaver` engine wiring** (foundation
  for `runs resume`). v0.2 PR #1 added the dependency; this commit actually
  uses it.
--- a/my-deepagent/src/my_deepagent/cli/runs.py
+++ b/my-deepagent/src/my_deepagent/cli/runs.py
@@ -135,35 +135,91 @@ async def _runs_show_async(run_id: str) -> None:
 async def _runs_resume_async(run_id: str) -> None:
-    """v0.1.0: resume is not implemented.
+    """Resume a non-terminal run from its first non-completed phase.
-    Surfaces the run state and hints at next steps. Future v0.2 implementation:
+    v0.2 PR #2b: actually re-enters `WorkflowEngine.resume(run_id)`. Reloads
-    rehydrate the workflow template by template_hash, replay phase loop from the
+    bindings + template + worktree from DB; skips already-completed phases;
-    first non-completed phase using the existing checkpointer.
+    runs the remaining phases under the LangGraph saver wired in v0.2 PR #2a.
    """
    from uuid import UUID
    from ..artifact_schema import ArtifactSchemaRegistry
    from ..binding import BackendAvailability, PersonaConsentStore
    from ..budget import make_budget_tracker_from_config
    from ..engine import WorkflowEngine
    from ..enums import Backend
    from ..errors import MyDeepAgentError
    from ..persona import load_personas_from_dir
    full_id = await _resolve_run_id(run_id)
    config = load_config()
    db = Database(config.database_url)
    await db.init_schema()
    # Fail fast on missing / terminal runs before constructing the engine.
    try:
        async with db.session() as s:
            run = await s.get(RunRow, full_id)
            if run is None:
                _CONSOLE.print(f"[red]run not found:[/] {run_id}")
                raise typer.Exit(code=1)
-        if run.state in ("completed", "failed", "aborted"):
+            if run.state in ("completed", "failed", "aborted"):
-            _CONSOLE.print(
+                _CONSOLE.print(
-                f"[yellow]Run {run.id} is already terminal ({run.state}). "
+                    f"[yellow]Run {run.id} is already terminal ({run.state}). "
-                "Start a fresh run with `mydeepagent run <workflow.yaml>`.[/]"
+                    "Start a fresh run with `mydeepagent run <workflow.yaml>`.[/]"
-            )
+                )
-            raise typer.Exit(code=1)
+                raise typer.Exit(code=1)
-        _CONSOLE.print(
+    except typer.Exit:
            "[yellow]Resume is not implemented in v0.1.0. The crash-recovery sweep at startup "
            "marked this run as failed; relaunch the workflow with `mydeepagent run`.[/]"
        )
        raise typer.Exit(code=2)
    finally:
        await db.dispose()
        raise
    # Seed assets needed by WorkflowEngine. Personas / artifact schemas come
    # from the seed directory, not from DB — they're language-neutral immutable
    # fixtures versioned in `docs/schemas/`.
    seed_root = Path(__file__).resolve().parents[3] / "docs" / "schemas"
    personas = load_personas_from_dir(seed_root / "personas")
    registry = ArtifactSchemaRegistry(roots=[seed_root / "artifacts"])
    consent = PersonaConsentStore(config.data_dir / "consents.json")
    backends = BackendAvailability(available_backends=frozenset(Backend))
    async def _no_op_approval(_payload: dict[str, object], _gates: list[str]) -> object:
        # Resume in CLI mode does not currently re-prompt for approval — the
        # original run already passed any gates it had reached. Reaching this
        # callback means a phase we *didn't* pass before is now hitting an
        # approval gate; treat that as REQUEST_CHANGES so the user knows.
        from ..enums import ApprovalDecisionAction
        _CONSOLE.print("[yellow]A new phase needs human approval; aborting resume.[/]")
        return ApprovalDecisionAction.REQUEST_CHANGES
    budget = make_budget_tracker_from_config(db, config)
    await budget.init()
    engine = WorkflowEngine(
        db=db,
        config=config,
        persona_pool=personas,
        artifact_registry=registry,
        consent_store=consent,
        available_backends=backends,
        approval_callback=_no_op_approval,
        budget_tracker=budget,
    )
    try:
        result = await engine.resume(UUID(full_id))
    except MyDeepAgentError as e:
        _CONSOLE.print(f"[red]resume failed:[/] {e.code} — {e}")
        await db.dispose()
        raise typer.Exit(code=1) from e
    _CONSOLE.print(f"[green]Resume complete:[/] run={result.run_id} state={result.state.value}")
    if result.final_report_path is not None:
        _CONSOLE.print(f"  report: {result.final_report_path}")
    if result.error:
        _CONSOLE.print(f"  error:  {result.error}")
    await db.dispose()
 async def _resolve_run_id(prefix_or_full: str) -> str:
--- a/my-deepagent/src/my_deepagent/engine.py
+++ b/my-deepagent/src/my_deepagent/engine.py
@@ -4,6 +4,7 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
 import signal
 from collections.abc import AsyncIterator
 from contextlib import asynccontextmanager, suppress
@@ -58,6 +59,8 @@ ApprovalCallback = Any  # Callable[[dict, list[str]], Awaitable[ApprovalDecision
 _DEFAULT_PHASE_TIMEOUT_SECONDS = 300  # 5 minutes
 _LOG_CORRUPT_PERSONA = logging.getLogger(__name__ + ".resume")
@dataclass(frozen=True)
 class RunResult:
@@ -165,6 +168,11 @@ class WorkflowEngine:
        requirements_md: str = "",
        override: BindingOverride | None = None,
    ) -> RunResult:
        """Start a brand-new run. Allocates a new `run_id`, binds personas, persists
        skeleton metadata, and dispatches to the shared `_execute_run` phase loop.
        For resuming an existing non-terminal run, use :meth:`resume` instead.
        """
        run_id = uuid4()
        worktree_root = self._config.workspace_root / str(run_id)
        worktree_root.mkdir(parents=True, exist_ok=True)
@@ -186,6 +194,59 @@ class WorkflowEngine:
        await self._append_event(run_id, None, RunEventType.RUN_CREATED, {})
        await self._append_event(run_id, None, RunEventType.RUN_STARTED, {})
        return await self._execute_run(run_id, template, worktree_root, bindings)
    async def resume(self, run_id: UUID) -> RunResult:
        """Resume a non-terminal run from its first non-completed phase.
        Reloads worktree_root, template, and bindings from the DB — does **not**
        re-run `bind_personas`, so consent/pool changes since the original run
        do not silently shift the binding. Phases whose `RunPhaseRow.state` is
        already ``completed`` are skipped; the rest re-execute and (when a
        LangGraph saver is wired) replay deepagents from the last checkpoint
        for that phase's thread_id.
        Raises:
            MyDeepAgentError: if the run is missing, terminal, or its bindings
                / template metadata cannot be reloaded.
        """
        run_row = await self._get_run_or_raise(run_id)
        if run_row.state in {
            RunState.COMPLETED.value,
            RunState.FAILED.value,
            RunState.ABORTED.value,
        }:
            raise MyDeepAgentError.human_required(
                "run_already_terminal",
                message=(
                    f"run {run_id} is already {run_row.state}; start a fresh run "
                    f"with `mydeepagent run`"
                ),
            )
        worktree_root = Path(run_row.worktree_root)
        template = await self._reload_template(run_row.template_id)
        bindings = await self._reload_bindings(run_id)
        if not bindings:
            raise MyDeepAgentError.human_required(
                "run_metadata_missing",
                message=(
                    f"run {run_id} has no binding rows; cannot resume — start a fresh run instead"
                ),
            )
        await self._append_event(run_id, None, RunEventType.RUN_RESUMED, {})
        return await self._execute_run(run_id, template, worktree_root, bindings)
    async def _execute_run(
        self,
        run_id: UUID,
        template: WorkflowTemplate,
        worktree_root: Path,
        bindings: dict[str, Binding],
    ) -> RunResult:
        """Shared phase loop used by both `run` (new) and `resume`."""
        await self._set_run_state(run_id, RunState.EXECUTING)
        # Open the LangGraph AsyncPostgresSaver once per run; all phases share it.
@@ -195,8 +256,17 @@ class WorkflowEngine:
        # checkpointer=None and runs without resume support.
        async with self._maybe_open_saver() as saver:
            self._saver = saver
            completed_keys = await self._get_completed_phase_keys(run_id)
            try:
                for phase_def in template.phases:
                    if phase_def.key in completed_keys:
                        await self._append_event(
                            run_id,
                            None,
                            RunEventType.PHASE_SKIPPED,
                            {"phase_key": phase_def.key, "reason": "already_completed"},
                        )
                        continue
                    role_binding = bindings[phase_def.role]
                    await self._run_phase(run_id, worktree_root, template, phase_def, role_binding)
                await self._set_run_state(run_id, RunState.COMPLETED)
@@ -933,6 +1003,90 @@ class WorkflowEngine:
            except Exception:
                await s.rollback()
    # ------------------------------------------------------------------
    # Resume helpers (used by `resume` to rehydrate state from DB)
    # ------------------------------------------------------------------
    async def _get_run_or_raise(self, run_id: UUID) -> RunRow:
        async with self._db.session() as s:
            row = await s.get(RunRow, str(run_id))
        if row is None:
            raise MyDeepAgentError.human_required(
                "run_not_found",
                message=f"run {run_id} not found in DB",
            )
        return row
    async def _reload_template(self, template_id: str) -> WorkflowTemplate:
        async with self._db.session() as s:
            row = await s.get(WorkflowTemplateRow, template_id)
        if row is None:
            raise MyDeepAgentError.fatal(
                "template_load_failed",
                message=f"workflow_templates row {template_id} not found",
            )
        try:
            return WorkflowTemplate.model_validate(row.definition)
        except Exception as e:
            raise MyDeepAgentError.fatal(
                "template_load_failed",
                message=f"workflow_templates.definition for {template_id} is malformed: {e}",
            ) from e
    async def _reload_bindings(self, run_id: UUID) -> dict[str, Binding]:
        """Rebuild the `{role_id: Binding}` dict from `run_bindings` + `agent_personas`.
        Empty result means the run was never fully persisted — caller raises
        `run_metadata_missing`. We do NOT re-run `bind_personas` here on purpose:
        consent / pool state could have shifted since the original run.
        """
        from .binding import Binding as _Binding  # local import to avoid cycle hint
        async with self._db.session() as s:
            binding_rows = (
                (await s.execute(select(RunBindingRow).where(RunBindingRow.run_id == str(run_id))))
                .scalars()
                .all()
            )
            persona_rows: dict[str, AgentPersonaRow] = {}
            for br in binding_rows:
                pr = await s.get(AgentPersonaRow, br.persona_id)
                if pr is not None:
                    persona_rows[br.persona_id] = pr
        out: dict[str, Binding] = {}
        for br in binding_rows:
            pr = persona_rows.get(br.persona_id)
            if pr is None:
                continue
            try:
                persona = Persona.model_validate(pr.definition)
            except Exception as e:
                # Corrupt persona JSON: skip the binding; an empty bindings dict
                # surfaces as `run_metadata_missing` in `resume`.
                _LOG_CORRUPT_PERSONA.warning("corrupt persona row %s during resume: %s", pr.id, e)
                continue
            out[br.role_id] = _Binding(
                role_id=br.role_id, persona=persona, binding_hash=br.binding_hash
            )
        return out
    async def _get_completed_phase_keys(self, run_id: UUID) -> set[str]:
        """Return the set of phase_keys that already reached `completed` state."""
        async with self._db.session() as s:
            rows = (
                (
                    await s.execute(
                        select(RunPhaseRow.phase_key)
                        .where(RunPhaseRow.run_id == str(run_id))
                        .where(RunPhaseRow.state == RunPhaseState.COMPLETED.value)
                    )
                )
                .scalars()
                .all()
            )
        return set(rows)
 # ------------------------------------------------------------------
 # Module-level helpers
--- a/my-deepagent/tests/integration/test_resume.py
+++ b/my-deepagent/tests/integration/test_resume.py
@@ -0,0 +1,539 @@
 """WorkflowEngine.resume — v0.2 PR #2b integration tests.
 Covers 5 scenarios from the plan:
  1. 2-phase workflow with phase 1 already completed → resume picks up phase 2.
  2. Terminal run (state=completed) → resume raises MyDeepAgentError.human_required.
  3. Unknown run_id → raises MyDeepAgentError.human_required("run_not_found").
  4. RunBindingRow rows missing → raises MyDeepAgentError.human_required.
  5. Workflow template definition is corrupt → raises MyDeepAgentError.fatal.
 All scenarios use sqlite tmp_path + mock build_agent (no real OpenRouter).
 """
 from __future__ import annotations
 import json
 from pathlib import Path
 from typing import Any
 from unittest.mock import AsyncMock, MagicMock, patch
 from uuid import UUID, uuid4
 import pytest
 from sqlalchemy import select
 from my_deepagent.artifact_schema import ArtifactSchemaRegistry
 from my_deepagent.binding import BackendAvailability, PersonaConsentStore
 from my_deepagent.config import load_config
 from my_deepagent.engine import WorkflowEngine
 from my_deepagent.enums import (
    ApprovalDecisionAction,
    Backend,
    RunPhaseState,
    RunState,
 )
 from my_deepagent.errors import MyDeepAgentError
 from my_deepagent.persistence.db import Database
 from my_deepagent.persistence.models import (
    AgentPersonaRow,
    RunBindingRow,
    RunEventRow,
    RunPhaseRow,
    RunRow,
    WorkflowTemplateRow,
 )
 from my_deepagent.persona import load_personas_from_dir
 from my_deepagent.workflow import WorkflowTemplate
 _DOCS = Path(__file__).resolve().parents[2] / "docs" / "schemas"
 _ARTIFACTS_ROOT = _DOCS / "artifacts"
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 def _valid_spec(run_id: UUID) -> dict[str, Any]:
    return {
        "runId": str(run_id),
        "phaseKey": "spec",
        "requirements": "Implement feature X with full test coverage",
        "acceptance_criteria": ["All tests pass", "Coverage >= 90%"],
        "approach": "TDD: write tests first, then implement.",
        "risks": [],
    }
 def _two_phase_workflow() -> WorkflowTemplate:
    raw = {
        "name": "test-resume-wf",
        "version": 1,
        "description": "two-phase workflow for resume tests",
        "roles": [
            {
                "id": "spec_writer",
                "required_capabilities": ["spec_write", "phase_planning"],
                "preferred_backends": ["openrouter"],
            }
        ],
        "phases": [
            {
                "key": "spec",
                "title": "Write spec phase 1",
                "risk": "low",
                "role": "spec_writer",
                "instructions": "Write the spec with enough words.",
                "timeout_seconds": 10,
                "expected_artifact": {"path": "artifacts/spec.json", "schema": "dev/spec@1"},
            },
            {
                "key": "spec2",
                "title": "Write spec phase 2",
                "risk": "low",
                "role": "spec_writer",
                "instructions": "Write the second spec with enough words.",
                "timeout_seconds": 10,
                "expected_artifact": {"path": "artifacts/spec2.json", "schema": "dev/spec@1"},
            },
        ],
    }
    return WorkflowTemplate.model_validate(raw)
@pytest.fixture
 def personas() -> list[Any]:
    return load_personas_from_dir(_DOCS / "personas")
@pytest.fixture
 def registry() -> ArtifactSchemaRegistry:
    return ArtifactSchemaRegistry(roots=[_ARTIFACTS_ROOT])
@pytest.fixture
 def consent_store(tmp_path: Path) -> PersonaConsentStore:
    return PersonaConsentStore(tmp_path / "consents.json")
@pytest.fixture
 def available_backends() -> BackendAvailability:
    return BackendAvailability(available_backends=frozenset(Backend))
@pytest.fixture
 async def db(tmp_path: Path) -> Database:
    url = f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}"
    database = Database(url)
    await database.init_schema()
    return database
 def _make_engine(
    database: Database,
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
 ) -> WorkflowEngine:
    cfg = load_config(
        workspace_root=tmp_path,
        data_dir=tmp_path / "data",
        database_url=f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}",
    )
    auto_approve = AsyncMock(return_value=ApprovalDecisionAction.APPROVE)
    return WorkflowEngine(
        db=database,
        config=cfg,
        persona_pool=personas,
        artifact_registry=registry,
        consent_store=consent_store,
        available_backends=available_backends,
        approval_callback=auto_approve,
    )
 def _fake_agent_writes(
    expected_artifact_path: Path, run_id: UUID, on_call: list[str] | None = None
 ) -> Any:
    """A build_agent stub: the returned agent writes the expected artifact then returns."""
    def _build(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
            if on_call is not None:
                on_call.append(str(expected_artifact_path))
            expected_artifact_path.parent.mkdir(parents=True, exist_ok=True)
            artifact = _valid_spec(run_id)
            content = json.dumps(artifact)
            expected_artifact_path.write_text(content, encoding="utf-8")
            # Drive watcher middleware so engine sees the artifact write.
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {
                            "file_path": str(expected_artifact_path),
                            "content": content,
                        },
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    return _build
 # ---------------------------------------------------------------------------
 # Scenario 1: phase 1 completed, phase 2 pending → resume picks up phase 2
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_resume_completes_remaining_phase(  # noqa: C901 — 2-phase scenario inherently spans 2 build_agent stubs + DB seam
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """Run phase 1 to completion via engine.run, then truncate phase 2 by aborting
    the agent the first time around, then resume and verify phase 2 finishes."""
    template = _two_phase_workflow()
    engine = _make_engine(
        db, tmp_path, personas, registry, consent_store, available_backends
    )
    # First run: phase 1 succeeds, phase 2 deliberately fails (agent never writes).
    phase2_calls: list[int] = []
    def _build_first(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
            # Decide which phase by the envelope content (phaseKey "spec" vs "spec2").
            envelope = messages["messages"][0]["content"] if "messages" in messages else ""
            is_phase2 = "spec2" in envelope
            if is_phase2:
                phase2_calls.append(1)
                # Do NOT write spec2.json → phase will time out + fail.
                return {"messages": []}
            expected = root_dir / "artifacts" / "spec.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            # Use a stable placeholder run_id; the schema only checks UUID format
            content = json.dumps(_valid_spec(uuid4()))
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_build_first):
        first_result = await engine.run(
            template, repo_path=tmp_path, base_branch="main", requirements_md="test"
        )
    assert first_result.state == RunState.FAILED, "phase 2 should have failed first run"
    assert len(phase2_calls) >= 1
    run_id = first_result.run_id
    # The first run is now state=failed (terminal). Reopen it as non-terminal so
    # resume is willing to pick it up. (In a real crash, the run would have been
    # left mid-state by SIGKILL; sweep_orphan_runs marks it failed at next start.
    # Here we mimic the *pre-sweep* state by flipping it back to executing.)
    async with db.session() as s:
        run = await s.get(RunRow, str(run_id))
        assert run is not None
        run.state = RunState.EXECUTING.value
        # Reset phase 2 row so resume sees it as non-completed
        phases = (
            (await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id))))
            .scalars()
            .all()
        )
        for p in phases:
            if p.phase_key == "spec2":
                p.state = RunPhaseState.PENDING.value
                p.ended_at = None
        await s.commit()
    # Second run: phase 2 succeeds this time.
    phase2_calls.clear()
    def _build_second(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
            envelope = messages["messages"][0]["content"] if "messages" in messages else ""
            is_phase2 = "spec2" in envelope
            phase2_calls.append(1 if is_phase2 else 0)
            expected_name = "spec2.json" if is_phase2 else "spec.json"
            expected = root_dir / "artifacts" / expected_name
            expected.parent.mkdir(parents=True, exist_ok=True)
            content = json.dumps(_valid_spec(uuid4()))
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "y",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_build_second):
        resume_result = await engine.resume(run_id)
    assert resume_result.state == RunState.COMPLETED, (
        f"resume should complete: state={resume_result.state}, error={resume_result.error}"
    )
    # Phase 1 should be skipped on resume (only phase 2 invoked).
    assert phase2_calls == [1], f"phase 2 should be the only re-invoked phase, got {phase2_calls}"
    # RUN_RESUMED event must have been emitted.
    async with db.session() as s:
        events = (
            (
                await s.execute(
                    select(RunEventRow.type).where(RunEventRow.run_id == str(run_id))
                )
            )
            .scalars()
            .all()
        )
    assert "run.resumed" in events
    assert "phase.skipped" in events  # phase 1 skipped
 # ---------------------------------------------------------------------------
 # Scenario 2: terminal run rejected
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_resume_terminal_run_raises(
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """A run in a terminal state (e.g. completed) cannot be resumed."""
    template = _two_phase_workflow()
    engine = _make_engine(
        db, tmp_path, personas, registry, consent_store, available_backends
    )
    def _build(
        persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
    ) -> Any:
        async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
            envelope = messages["messages"][0]["content"] if "messages" in messages else ""
            phase_key = "spec2" if "spec2" in envelope else "spec"
            expected = root_dir / "artifacts" / f"{phase_key}.json"
            expected.parent.mkdir(parents=True, exist_ok=True)
            content = json.dumps(_valid_spec(uuid4()))
            expected.write_text(content, encoding="utf-8")
            for mw in middleware:
                if hasattr(mw, "awrap_tool_call"):
                    req = MagicMock()
                    req.tool_call = {
                        "name": "write_file",
                        "args": {"file_path": str(expected), "content": content},
                        "id": "x",
                    }
                    await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
            return {"messages": []}
        agent = MagicMock()
        agent.ainvoke = _ainvoke
        return agent
    with patch("my_deepagent.engine.build_agent", side_effect=_build):
        result = await engine.run(template, repo_path=tmp_path, base_branch="main")
    assert result.state == RunState.COMPLETED
    with pytest.raises(MyDeepAgentError) as exc:
        await engine.resume(result.run_id)
    assert exc.value.code == "run_already_terminal"
 # ---------------------------------------------------------------------------
 # Scenario 3: unknown run id
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_resume_unknown_run_raises(
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """resume(unknown_uuid) → MyDeepAgentError(code=run_not_found)."""
    engine = _make_engine(
        db, tmp_path, personas, registry, consent_store, available_backends
    )
    with pytest.raises(MyDeepAgentError) as exc:
        await engine.resume(uuid4())
    assert exc.value.code == "run_not_found"
 # ---------------------------------------------------------------------------
 # Scenario 4: RunBindingRow rows missing
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_resume_missing_bindings_raises(
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """A run whose RunBindingRow rows were never persisted cannot be resumed."""
    template = _two_phase_workflow()
    engine = _make_engine(
        db, tmp_path, personas, registry, consent_store, available_backends
    )
    # Seed a RunRow + WorkflowTemplateRow but NO RunBindingRow.
    run_id = uuid4()
    tpl_id = uuid4()
    async with db.session() as s:
        s.add(
            WorkflowTemplateRow(
                id=str(tpl_id),
                name=template.name,
                version=template.version,
                hash="sha:fake",
                definition=template.model_dump(by_alias=True),
                created_at="2026-05-16T00:00:00+00:00",
            )
        )
        await s.flush()
        s.add(
            RunRow(
                id=str(run_id),
                template_id=str(tpl_id),
                template_hash="sha:fake",
                state=RunState.EXECUTING.value,
                repo_path=str(tmp_path / "repo"),
                base_branch="main",
                worktree_root=str(tmp_path / "wt"),
                created_at="2026-05-16T00:00:00+00:00",
                updated_at="2026-05-16T00:00:00+00:00",
            )
        )
        await s.commit()
    with pytest.raises(MyDeepAgentError) as exc:
        await engine.resume(run_id)
    assert exc.value.code == "run_metadata_missing"
 # ---------------------------------------------------------------------------
 # Scenario 5: workflow_templates.definition is corrupt
 # ---------------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_resume_corrupt_template_definition_raises(
    tmp_path: Path,
    personas: list[Any],
    registry: ArtifactSchemaRegistry,
    consent_store: PersonaConsentStore,
    available_backends: BackendAvailability,
    db: Database,
 ) -> None:
    """A run whose workflow_templates.definition is malformed → fatal."""
    engine = _make_engine(
        db, tmp_path, personas, registry, consent_store, available_backends
    )
    run_id = uuid4()
    tpl_id = uuid4()
    persona_id = uuid4()
    async with db.session() as s:
        # Corrupt definition (missing required `roles` / `phases`)
        s.add(
            WorkflowTemplateRow(
                id=str(tpl_id),
                name="corrupt",
                version=1,
                hash="sha:corrupt",
                definition={"name": "corrupt"},
                created_at="2026-05-16T00:00:00+00:00",
            )
        )
        await s.flush()
        s.add(
            RunRow(
                id=str(run_id),
                template_id=str(tpl_id),
                template_hash="sha:corrupt",
                state=RunState.EXECUTING.value,
                repo_path=str(tmp_path / "repo"),
                base_branch="main",
                worktree_root=str(tmp_path / "wt"),
                created_at="2026-05-16T00:00:00+00:00",
                updated_at="2026-05-16T00:00:00+00:00",
            )
        )
        s.add(
            AgentPersonaRow(
                id=str(persona_id),
                name="openrouter-deepseek-spec-writer",
                version=1,
                hash="sha:p",
                definition=personas[0].model_dump(by_alias=True),
                created_at="2026-05-16T00:00:00+00:00",
            )
        )
        # Flush RunRow + AgentPersonaRow before adding RunBindingRow so the FKs
        # are satisfied at INSERT time (SQLite with PRAGMA foreign_keys=ON).
        await s.flush()
        s.add(
            RunBindingRow(
                id=str(uuid4()),
                run_id=str(run_id),
                role_id="spec_writer",
                persona_id=str(persona_id),
                persona_hash="sha:p",
                backend="openrouter",
                binding_hash="sha:b",
            )
        )
        await s.commit()
    with pytest.raises(MyDeepAgentError) as exc:
        await engine.resume(run_id)
    assert exc.value.code == "template_load_failed"