feat(my-deepagent): v0.2 PR #2b — mydeepagent runs resume <id> real implementation
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Builds on
v0.2 PR #2a's LangGraph wiring + the existing DB phase-state machine +
sweep_orphan_runs — no Temporal (per DR-3).
Highlights
- `WorkflowEngine.resume(run_id)` (new async method):
- Loads RunRow, rejects terminal states with
MyDeepAgentError("run_already_terminal").
- Reloads worktree_root from `RunRow.worktree_root`, template via
`_reload_template` (WorkflowTemplateRow JOIN + model_validate), and
bindings via `_reload_bindings` (run_bindings ⨝ agent_personas).
- **Does NOT call `bind_personas` again** — locks in the original
binding so consent / persona-pool changes since the original run
don't silently shift role assignment.
- `_execute_run` (extracted shared phase loop): `run()` and `resume()`
both dispatch through it. Skips already-completed phases (emits
`phase.skipped` event) and re-executes the rest.
- 4 new private helpers on WorkflowEngine: `_get_run_or_raise`,
`_reload_template`, `_reload_bindings`, `_get_completed_phase_keys`.
- `RunEventType.RUN_RESUMED` and `PHASE_SKIPPED` are now actually
emitted (the enum members existed already).
- `cli/runs.py _runs_resume_async`: stub → real impl. Validates the run
exists + non-terminal, loads seed personas + artifact schemas from
`docs/schemas/`, constructs WorkflowEngine with an
"abort-on-new-approval" callback (resume should not silently re-prompt
the user — original gates already passed; a new gate means the
workflow has changed). Calls engine.resume(UUID(id)), prints final
state + report. Catches MyDeepAgentError and exits 1 with red error.
Tests
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase mock workflow: phase 1 succeeds, phase 2 fails first time,
row flipped back to executing → resume → phase 2 completes.
Asserts `phase.skipped` event for phase 1, `run.resumed` event,
and exactly 1 mock invocation for phase 2 on resume.
2. Terminal run → `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → `MyDeepAgentError(code="run_metadata_missing")`.
5. Corrupt `workflow_templates.definition` →
`MyDeepAgentError(code="template_load_failed")`.
Mock pattern matches existing test_engine.py: patch
`my_deepagent.engine.build_agent` to return a fake agent that writes
the expected artifact and drives the watcher middleware.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 587 PASS (12.69 s) — +5 from new resume tests
- pytest E2E real OpenRouter on Postgres: PASS 78.52 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,45 @@
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
- **v0.2 PR #2b — `mydeepagent runs resume <id>` real implementation**.
|
||||
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Reuses
|
||||
v0.2 PR #2a's LangGraph wiring + sweep_orphan_runs's DB state machine,
|
||||
no Temporal (DR-3).
|
||||
- `src/my_deepagent/engine.py`:
|
||||
- New `WorkflowEngine.resume(run_id)` async method. Loads `RunRow`,
|
||||
rejects terminal states with `MyDeepAgentError.human_required("run_already_terminal")`,
|
||||
reloads `worktree_root` + `WorkflowTemplate` (via `_reload_template`) +
|
||||
bindings (via `_reload_bindings`) from DB. Does **not** call
|
||||
`bind_personas` again — locks in the original binding so consent /
|
||||
pool changes don't silently shift roles.
|
||||
- New `_execute_run` helper (shared phase loop) extracted from `run()`.
|
||||
Skips already-`completed` phases (emits `phase.skipped` event) and
|
||||
re-executes the rest. Both `run` (new) and `resume` dispatch through
|
||||
it.
|
||||
- New helpers: `_get_run_or_raise`, `_reload_template`,
|
||||
`_reload_bindings` (rebuilds `{role_id: Binding}` from
|
||||
`run_bindings` ⨝ `agent_personas`; corrupt persona rows are logged
|
||||
and skipped, surfacing as `run_metadata_missing` if no bindings remain),
|
||||
`_get_completed_phase_keys`.
|
||||
- New `RunEventType.RUN_RESUMED` and `RunEventType.PHASE_SKIPPED` are
|
||||
now actually emitted (the enum members existed already from v0.1.0).
|
||||
- `src/my_deepagent/cli/runs.py` `_runs_resume_async`: stub → real impl.
|
||||
Validates run exists + non-terminal, loads seed personas + artifact
|
||||
schemas (`docs/schemas/`), constructs `WorkflowEngine` with a
|
||||
"abort-on-new-approval" callback, calls `engine.resume(UUID(id))`,
|
||||
prints final state + report path. Catches `MyDeepAgentError` and prints
|
||||
a red error with exit 1.
|
||||
- `tests/integration/test_resume.py` (new, 5 scenarios):
|
||||
1. 2-phase workflow: phase 1 succeeds, phase 2 fails → flip run row
|
||||
back to executing → resume → phase 2 completes; assert phase 1 was
|
||||
skipped (`phase.skipped` event present) and `run.resumed` event emitted.
|
||||
2. Terminal run → `resume()` raises `MyDeepAgentError(code="run_already_terminal")`.
|
||||
3. Unknown run id → raises `MyDeepAgentError(code="run_not_found")`.
|
||||
4. RunBindingRow rows missing → raises `MyDeepAgentError(code="run_metadata_missing")`.
|
||||
5. workflow_templates.definition is malformed → raises `MyDeepAgentError(code="template_load_failed")`.
|
||||
- E2E real OpenRouter regression PASS 78.52 s (baseline 71–122 s);
|
||||
within DR-3 acceptance threshold (+20%).
|
||||
|
||||
- **v0.2 PR #2a — LangGraph `AsyncPostgresSaver` engine wiring** (foundation
|
||||
for `runs resume`). v0.2 PR #1 added the dependency; this commit actually
|
||||
uses it.
|
||||
|
||||
@@ -135,35 +135,91 @@ async def _runs_show_async(run_id: str) -> None:
|
||||
|
||||
|
||||
async def _runs_resume_async(run_id: str) -> None:
|
||||
"""v0.1.0: resume is not implemented.
|
||||
"""Resume a non-terminal run from its first non-completed phase.
|
||||
|
||||
Surfaces the run state and hints at next steps. Future v0.2 implementation:
|
||||
rehydrate the workflow template by template_hash, replay phase loop from the
|
||||
first non-completed phase using the existing checkpointer.
|
||||
v0.2 PR #2b: actually re-enters `WorkflowEngine.resume(run_id)`. Reloads
|
||||
bindings + template + worktree from DB; skips already-completed phases;
|
||||
runs the remaining phases under the LangGraph saver wired in v0.2 PR #2a.
|
||||
"""
|
||||
from uuid import UUID
|
||||
|
||||
from ..artifact_schema import ArtifactSchemaRegistry
|
||||
from ..binding import BackendAvailability, PersonaConsentStore
|
||||
from ..budget import make_budget_tracker_from_config
|
||||
from ..engine import WorkflowEngine
|
||||
from ..enums import Backend
|
||||
from ..errors import MyDeepAgentError
|
||||
from ..persona import load_personas_from_dir
|
||||
|
||||
full_id = await _resolve_run_id(run_id)
|
||||
config = load_config()
|
||||
|
||||
db = Database(config.database_url)
|
||||
await db.init_schema()
|
||||
|
||||
# Fail fast on missing / terminal runs before constructing the engine.
|
||||
try:
|
||||
async with db.session() as s:
|
||||
run = await s.get(RunRow, full_id)
|
||||
if run is None:
|
||||
_CONSOLE.print(f"[red]run not found:[/] {run_id}")
|
||||
raise typer.Exit(code=1)
|
||||
if run.state in ("completed", "failed", "aborted"):
|
||||
_CONSOLE.print(
|
||||
f"[yellow]Run {run.id} is already terminal ({run.state}). "
|
||||
"Start a fresh run with `mydeepagent run <workflow.yaml>`.[/]"
|
||||
)
|
||||
raise typer.Exit(code=1)
|
||||
_CONSOLE.print(
|
||||
"[yellow]Resume is not implemented in v0.1.0. The crash-recovery sweep at startup "
|
||||
"marked this run as failed; relaunch the workflow with `mydeepagent run`.[/]"
|
||||
)
|
||||
raise typer.Exit(code=2)
|
||||
finally:
|
||||
if run.state in ("completed", "failed", "aborted"):
|
||||
_CONSOLE.print(
|
||||
f"[yellow]Run {run.id} is already terminal ({run.state}). "
|
||||
"Start a fresh run with `mydeepagent run <workflow.yaml>`.[/]"
|
||||
)
|
||||
raise typer.Exit(code=1)
|
||||
except typer.Exit:
|
||||
await db.dispose()
|
||||
raise
|
||||
|
||||
# Seed assets needed by WorkflowEngine. Personas / artifact schemas come
|
||||
# from the seed directory, not from DB — they're language-neutral immutable
|
||||
# fixtures versioned in `docs/schemas/`.
|
||||
seed_root = Path(__file__).resolve().parents[3] / "docs" / "schemas"
|
||||
personas = load_personas_from_dir(seed_root / "personas")
|
||||
registry = ArtifactSchemaRegistry(roots=[seed_root / "artifacts"])
|
||||
consent = PersonaConsentStore(config.data_dir / "consents.json")
|
||||
backends = BackendAvailability(available_backends=frozenset(Backend))
|
||||
|
||||
async def _no_op_approval(_payload: dict[str, object], _gates: list[str]) -> object:
|
||||
# Resume in CLI mode does not currently re-prompt for approval — the
|
||||
# original run already passed any gates it had reached. Reaching this
|
||||
# callback means a phase we *didn't* pass before is now hitting an
|
||||
# approval gate; treat that as REQUEST_CHANGES so the user knows.
|
||||
from ..enums import ApprovalDecisionAction
|
||||
|
||||
_CONSOLE.print("[yellow]A new phase needs human approval; aborting resume.[/]")
|
||||
return ApprovalDecisionAction.REQUEST_CHANGES
|
||||
|
||||
budget = make_budget_tracker_from_config(db, config)
|
||||
await budget.init()
|
||||
|
||||
engine = WorkflowEngine(
|
||||
db=db,
|
||||
config=config,
|
||||
persona_pool=personas,
|
||||
artifact_registry=registry,
|
||||
consent_store=consent,
|
||||
available_backends=backends,
|
||||
approval_callback=_no_op_approval,
|
||||
budget_tracker=budget,
|
||||
)
|
||||
|
||||
try:
|
||||
result = await engine.resume(UUID(full_id))
|
||||
except MyDeepAgentError as e:
|
||||
_CONSOLE.print(f"[red]resume failed:[/] {e.code} — {e}")
|
||||
await db.dispose()
|
||||
raise typer.Exit(code=1) from e
|
||||
|
||||
_CONSOLE.print(f"[green]Resume complete:[/] run={result.run_id} state={result.state.value}")
|
||||
if result.final_report_path is not None:
|
||||
_CONSOLE.print(f" report: {result.final_report_path}")
|
||||
if result.error:
|
||||
_CONSOLE.print(f" error: {result.error}")
|
||||
await db.dispose()
|
||||
|
||||
|
||||
async def _resolve_run_id(prefix_or_full: str) -> str:
|
||||
|
||||
@@ -4,6 +4,7 @@ from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import signal
|
||||
from collections.abc import AsyncIterator
|
||||
from contextlib import asynccontextmanager, suppress
|
||||
@@ -58,6 +59,8 @@ ApprovalCallback = Any # Callable[[dict, list[str]], Awaitable[ApprovalDecision
|
||||
|
||||
_DEFAULT_PHASE_TIMEOUT_SECONDS = 300 # 5 minutes
|
||||
|
||||
_LOG_CORRUPT_PERSONA = logging.getLogger(__name__ + ".resume")
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunResult:
|
||||
@@ -165,6 +168,11 @@ class WorkflowEngine:
|
||||
requirements_md: str = "",
|
||||
override: BindingOverride | None = None,
|
||||
) -> RunResult:
|
||||
"""Start a brand-new run. Allocates a new `run_id`, binds personas, persists
|
||||
skeleton metadata, and dispatches to the shared `_execute_run` phase loop.
|
||||
|
||||
For resuming an existing non-terminal run, use :meth:`resume` instead.
|
||||
"""
|
||||
run_id = uuid4()
|
||||
worktree_root = self._config.workspace_root / str(run_id)
|
||||
worktree_root.mkdir(parents=True, exist_ok=True)
|
||||
@@ -186,6 +194,59 @@ class WorkflowEngine:
|
||||
|
||||
await self._append_event(run_id, None, RunEventType.RUN_CREATED, {})
|
||||
await self._append_event(run_id, None, RunEventType.RUN_STARTED, {})
|
||||
|
||||
return await self._execute_run(run_id, template, worktree_root, bindings)
|
||||
|
||||
async def resume(self, run_id: UUID) -> RunResult:
|
||||
"""Resume a non-terminal run from its first non-completed phase.
|
||||
|
||||
Reloads worktree_root, template, and bindings from the DB — does **not**
|
||||
re-run `bind_personas`, so consent/pool changes since the original run
|
||||
do not silently shift the binding. Phases whose `RunPhaseRow.state` is
|
||||
already ``completed`` are skipped; the rest re-execute and (when a
|
||||
LangGraph saver is wired) replay deepagents from the last checkpoint
|
||||
for that phase's thread_id.
|
||||
|
||||
Raises:
|
||||
MyDeepAgentError: if the run is missing, terminal, or its bindings
|
||||
/ template metadata cannot be reloaded.
|
||||
"""
|
||||
run_row = await self._get_run_or_raise(run_id)
|
||||
if run_row.state in {
|
||||
RunState.COMPLETED.value,
|
||||
RunState.FAILED.value,
|
||||
RunState.ABORTED.value,
|
||||
}:
|
||||
raise MyDeepAgentError.human_required(
|
||||
"run_already_terminal",
|
||||
message=(
|
||||
f"run {run_id} is already {run_row.state}; start a fresh run "
|
||||
f"with `mydeepagent run`"
|
||||
),
|
||||
)
|
||||
|
||||
worktree_root = Path(run_row.worktree_root)
|
||||
template = await self._reload_template(run_row.template_id)
|
||||
bindings = await self._reload_bindings(run_id)
|
||||
if not bindings:
|
||||
raise MyDeepAgentError.human_required(
|
||||
"run_metadata_missing",
|
||||
message=(
|
||||
f"run {run_id} has no binding rows; cannot resume — start a fresh run instead"
|
||||
),
|
||||
)
|
||||
|
||||
await self._append_event(run_id, None, RunEventType.RUN_RESUMED, {})
|
||||
return await self._execute_run(run_id, template, worktree_root, bindings)
|
||||
|
||||
async def _execute_run(
|
||||
self,
|
||||
run_id: UUID,
|
||||
template: WorkflowTemplate,
|
||||
worktree_root: Path,
|
||||
bindings: dict[str, Binding],
|
||||
) -> RunResult:
|
||||
"""Shared phase loop used by both `run` (new) and `resume`."""
|
||||
await self._set_run_state(run_id, RunState.EXECUTING)
|
||||
|
||||
# Open the LangGraph AsyncPostgresSaver once per run; all phases share it.
|
||||
@@ -195,8 +256,17 @@ class WorkflowEngine:
|
||||
# checkpointer=None and runs without resume support.
|
||||
async with self._maybe_open_saver() as saver:
|
||||
self._saver = saver
|
||||
completed_keys = await self._get_completed_phase_keys(run_id)
|
||||
try:
|
||||
for phase_def in template.phases:
|
||||
if phase_def.key in completed_keys:
|
||||
await self._append_event(
|
||||
run_id,
|
||||
None,
|
||||
RunEventType.PHASE_SKIPPED,
|
||||
{"phase_key": phase_def.key, "reason": "already_completed"},
|
||||
)
|
||||
continue
|
||||
role_binding = bindings[phase_def.role]
|
||||
await self._run_phase(run_id, worktree_root, template, phase_def, role_binding)
|
||||
await self._set_run_state(run_id, RunState.COMPLETED)
|
||||
@@ -933,6 +1003,90 @@ class WorkflowEngine:
|
||||
except Exception:
|
||||
await s.rollback()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Resume helpers (used by `resume` to rehydrate state from DB)
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def _get_run_or_raise(self, run_id: UUID) -> RunRow:
|
||||
async with self._db.session() as s:
|
||||
row = await s.get(RunRow, str(run_id))
|
||||
if row is None:
|
||||
raise MyDeepAgentError.human_required(
|
||||
"run_not_found",
|
||||
message=f"run {run_id} not found in DB",
|
||||
)
|
||||
return row
|
||||
|
||||
async def _reload_template(self, template_id: str) -> WorkflowTemplate:
|
||||
async with self._db.session() as s:
|
||||
row = await s.get(WorkflowTemplateRow, template_id)
|
||||
if row is None:
|
||||
raise MyDeepAgentError.fatal(
|
||||
"template_load_failed",
|
||||
message=f"workflow_templates row {template_id} not found",
|
||||
)
|
||||
try:
|
||||
return WorkflowTemplate.model_validate(row.definition)
|
||||
except Exception as e:
|
||||
raise MyDeepAgentError.fatal(
|
||||
"template_load_failed",
|
||||
message=f"workflow_templates.definition for {template_id} is malformed: {e}",
|
||||
) from e
|
||||
|
||||
async def _reload_bindings(self, run_id: UUID) -> dict[str, Binding]:
|
||||
"""Rebuild the `{role_id: Binding}` dict from `run_bindings` + `agent_personas`.
|
||||
|
||||
Empty result means the run was never fully persisted — caller raises
|
||||
`run_metadata_missing`. We do NOT re-run `bind_personas` here on purpose:
|
||||
consent / pool state could have shifted since the original run.
|
||||
"""
|
||||
from .binding import Binding as _Binding # local import to avoid cycle hint
|
||||
|
||||
async with self._db.session() as s:
|
||||
binding_rows = (
|
||||
(await s.execute(select(RunBindingRow).where(RunBindingRow.run_id == str(run_id))))
|
||||
.scalars()
|
||||
.all()
|
||||
)
|
||||
persona_rows: dict[str, AgentPersonaRow] = {}
|
||||
for br in binding_rows:
|
||||
pr = await s.get(AgentPersonaRow, br.persona_id)
|
||||
if pr is not None:
|
||||
persona_rows[br.persona_id] = pr
|
||||
|
||||
out: dict[str, Binding] = {}
|
||||
for br in binding_rows:
|
||||
pr = persona_rows.get(br.persona_id)
|
||||
if pr is None:
|
||||
continue
|
||||
try:
|
||||
persona = Persona.model_validate(pr.definition)
|
||||
except Exception as e:
|
||||
# Corrupt persona JSON: skip the binding; an empty bindings dict
|
||||
# surfaces as `run_metadata_missing` in `resume`.
|
||||
_LOG_CORRUPT_PERSONA.warning("corrupt persona row %s during resume: %s", pr.id, e)
|
||||
continue
|
||||
out[br.role_id] = _Binding(
|
||||
role_id=br.role_id, persona=persona, binding_hash=br.binding_hash
|
||||
)
|
||||
return out
|
||||
|
||||
async def _get_completed_phase_keys(self, run_id: UUID) -> set[str]:
|
||||
"""Return the set of phase_keys that already reached `completed` state."""
|
||||
async with self._db.session() as s:
|
||||
rows = (
|
||||
(
|
||||
await s.execute(
|
||||
select(RunPhaseRow.phase_key)
|
||||
.where(RunPhaseRow.run_id == str(run_id))
|
||||
.where(RunPhaseRow.state == RunPhaseState.COMPLETED.value)
|
||||
)
|
||||
)
|
||||
.scalars()
|
||||
.all()
|
||||
)
|
||||
return set(rows)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Module-level helpers
|
||||
|
||||
539
my-deepagent/tests/integration/test_resume.py
Normal file
539
my-deepagent/tests/integration/test_resume.py
Normal file
@@ -0,0 +1,539 @@
|
||||
"""WorkflowEngine.resume — v0.2 PR #2b integration tests.
|
||||
|
||||
Covers 5 scenarios from the plan:
|
||||
1. 2-phase workflow with phase 1 already completed → resume picks up phase 2.
|
||||
2. Terminal run (state=completed) → resume raises MyDeepAgentError.human_required.
|
||||
3. Unknown run_id → raises MyDeepAgentError.human_required("run_not_found").
|
||||
4. RunBindingRow rows missing → raises MyDeepAgentError.human_required.
|
||||
5. Workflow template definition is corrupt → raises MyDeepAgentError.fatal.
|
||||
|
||||
All scenarios use sqlite tmp_path + mock build_agent (no real OpenRouter).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import pytest
|
||||
from sqlalchemy import select
|
||||
|
||||
from my_deepagent.artifact_schema import ArtifactSchemaRegistry
|
||||
from my_deepagent.binding import BackendAvailability, PersonaConsentStore
|
||||
from my_deepagent.config import load_config
|
||||
from my_deepagent.engine import WorkflowEngine
|
||||
from my_deepagent.enums import (
|
||||
ApprovalDecisionAction,
|
||||
Backend,
|
||||
RunPhaseState,
|
||||
RunState,
|
||||
)
|
||||
from my_deepagent.errors import MyDeepAgentError
|
||||
from my_deepagent.persistence.db import Database
|
||||
from my_deepagent.persistence.models import (
|
||||
AgentPersonaRow,
|
||||
RunBindingRow,
|
||||
RunEventRow,
|
||||
RunPhaseRow,
|
||||
RunRow,
|
||||
WorkflowTemplateRow,
|
||||
)
|
||||
from my_deepagent.persona import load_personas_from_dir
|
||||
from my_deepagent.workflow import WorkflowTemplate
|
||||
|
||||
_DOCS = Path(__file__).resolve().parents[2] / "docs" / "schemas"
|
||||
_ARTIFACTS_ROOT = _DOCS / "artifacts"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _valid_spec(run_id: UUID) -> dict[str, Any]:
|
||||
return {
|
||||
"runId": str(run_id),
|
||||
"phaseKey": "spec",
|
||||
"requirements": "Implement feature X with full test coverage",
|
||||
"acceptance_criteria": ["All tests pass", "Coverage >= 90%"],
|
||||
"approach": "TDD: write tests first, then implement.",
|
||||
"risks": [],
|
||||
}
|
||||
|
||||
|
||||
def _two_phase_workflow() -> WorkflowTemplate:
|
||||
raw = {
|
||||
"name": "test-resume-wf",
|
||||
"version": 1,
|
||||
"description": "two-phase workflow for resume tests",
|
||||
"roles": [
|
||||
{
|
||||
"id": "spec_writer",
|
||||
"required_capabilities": ["spec_write", "phase_planning"],
|
||||
"preferred_backends": ["openrouter"],
|
||||
}
|
||||
],
|
||||
"phases": [
|
||||
{
|
||||
"key": "spec",
|
||||
"title": "Write spec phase 1",
|
||||
"risk": "low",
|
||||
"role": "spec_writer",
|
||||
"instructions": "Write the spec with enough words.",
|
||||
"timeout_seconds": 10,
|
||||
"expected_artifact": {"path": "artifacts/spec.json", "schema": "dev/spec@1"},
|
||||
},
|
||||
{
|
||||
"key": "spec2",
|
||||
"title": "Write spec phase 2",
|
||||
"risk": "low",
|
||||
"role": "spec_writer",
|
||||
"instructions": "Write the second spec with enough words.",
|
||||
"timeout_seconds": 10,
|
||||
"expected_artifact": {"path": "artifacts/spec2.json", "schema": "dev/spec@1"},
|
||||
},
|
||||
],
|
||||
}
|
||||
return WorkflowTemplate.model_validate(raw)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def personas() -> list[Any]:
|
||||
return load_personas_from_dir(_DOCS / "personas")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def registry() -> ArtifactSchemaRegistry:
|
||||
return ArtifactSchemaRegistry(roots=[_ARTIFACTS_ROOT])
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def consent_store(tmp_path: Path) -> PersonaConsentStore:
|
||||
return PersonaConsentStore(tmp_path / "consents.json")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def available_backends() -> BackendAvailability:
|
||||
return BackendAvailability(available_backends=frozenset(Backend))
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def db(tmp_path: Path) -> Database:
|
||||
url = f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}"
|
||||
database = Database(url)
|
||||
await database.init_schema()
|
||||
return database
|
||||
|
||||
|
||||
def _make_engine(
|
||||
database: Database,
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
) -> WorkflowEngine:
|
||||
cfg = load_config(
|
||||
workspace_root=tmp_path,
|
||||
data_dir=tmp_path / "data",
|
||||
database_url=f"sqlite+aiosqlite:///{tmp_path / 'test.sqlite3'}",
|
||||
)
|
||||
auto_approve = AsyncMock(return_value=ApprovalDecisionAction.APPROVE)
|
||||
return WorkflowEngine(
|
||||
db=database,
|
||||
config=cfg,
|
||||
persona_pool=personas,
|
||||
artifact_registry=registry,
|
||||
consent_store=consent_store,
|
||||
available_backends=available_backends,
|
||||
approval_callback=auto_approve,
|
||||
)
|
||||
|
||||
|
||||
def _fake_agent_writes(
|
||||
expected_artifact_path: Path, run_id: UUID, on_call: list[str] | None = None
|
||||
) -> Any:
|
||||
"""A build_agent stub: the returned agent writes the expected artifact then returns."""
|
||||
|
||||
def _build(
|
||||
persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
|
||||
) -> Any:
|
||||
async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
|
||||
if on_call is not None:
|
||||
on_call.append(str(expected_artifact_path))
|
||||
expected_artifact_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
artifact = _valid_spec(run_id)
|
||||
content = json.dumps(artifact)
|
||||
expected_artifact_path.write_text(content, encoding="utf-8")
|
||||
# Drive watcher middleware so engine sees the artifact write.
|
||||
for mw in middleware:
|
||||
if hasattr(mw, "awrap_tool_call"):
|
||||
req = MagicMock()
|
||||
req.tool_call = {
|
||||
"name": "write_file",
|
||||
"args": {
|
||||
"file_path": str(expected_artifact_path),
|
||||
"content": content,
|
||||
},
|
||||
"id": "x",
|
||||
}
|
||||
await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
|
||||
return {"messages": []}
|
||||
|
||||
agent = MagicMock()
|
||||
agent.ainvoke = _ainvoke
|
||||
return agent
|
||||
|
||||
return _build
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Scenario 1: phase 1 completed, phase 2 pending → resume picks up phase 2
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_completes_remaining_phase( # noqa: C901 — 2-phase scenario inherently spans 2 build_agent stubs + DB seam
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
db: Database,
|
||||
) -> None:
|
||||
"""Run phase 1 to completion via engine.run, then truncate phase 2 by aborting
|
||||
the agent the first time around, then resume and verify phase 2 finishes."""
|
||||
template = _two_phase_workflow()
|
||||
engine = _make_engine(
|
||||
db, tmp_path, personas, registry, consent_store, available_backends
|
||||
)
|
||||
|
||||
# First run: phase 1 succeeds, phase 2 deliberately fails (agent never writes).
|
||||
phase2_calls: list[int] = []
|
||||
|
||||
def _build_first(
|
||||
persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
|
||||
) -> Any:
|
||||
async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
|
||||
# Decide which phase by the envelope content (phaseKey "spec" vs "spec2").
|
||||
envelope = messages["messages"][0]["content"] if "messages" in messages else ""
|
||||
is_phase2 = "spec2" in envelope
|
||||
if is_phase2:
|
||||
phase2_calls.append(1)
|
||||
# Do NOT write spec2.json → phase will time out + fail.
|
||||
return {"messages": []}
|
||||
expected = root_dir / "artifacts" / "spec.json"
|
||||
expected.parent.mkdir(parents=True, exist_ok=True)
|
||||
# Use a stable placeholder run_id; the schema only checks UUID format
|
||||
content = json.dumps(_valid_spec(uuid4()))
|
||||
expected.write_text(content, encoding="utf-8")
|
||||
for mw in middleware:
|
||||
if hasattr(mw, "awrap_tool_call"):
|
||||
req = MagicMock()
|
||||
req.tool_call = {
|
||||
"name": "write_file",
|
||||
"args": {"file_path": str(expected), "content": content},
|
||||
"id": "x",
|
||||
}
|
||||
await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
|
||||
return {"messages": []}
|
||||
|
||||
agent = MagicMock()
|
||||
agent.ainvoke = _ainvoke
|
||||
return agent
|
||||
|
||||
with patch("my_deepagent.engine.build_agent", side_effect=_build_first):
|
||||
first_result = await engine.run(
|
||||
template, repo_path=tmp_path, base_branch="main", requirements_md="test"
|
||||
)
|
||||
assert first_result.state == RunState.FAILED, "phase 2 should have failed first run"
|
||||
assert len(phase2_calls) >= 1
|
||||
|
||||
run_id = first_result.run_id
|
||||
|
||||
# The first run is now state=failed (terminal). Reopen it as non-terminal so
|
||||
# resume is willing to pick it up. (In a real crash, the run would have been
|
||||
# left mid-state by SIGKILL; sweep_orphan_runs marks it failed at next start.
|
||||
# Here we mimic the *pre-sweep* state by flipping it back to executing.)
|
||||
async with db.session() as s:
|
||||
run = await s.get(RunRow, str(run_id))
|
||||
assert run is not None
|
||||
run.state = RunState.EXECUTING.value
|
||||
# Reset phase 2 row so resume sees it as non-completed
|
||||
phases = (
|
||||
(await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id))))
|
||||
.scalars()
|
||||
.all()
|
||||
)
|
||||
for p in phases:
|
||||
if p.phase_key == "spec2":
|
||||
p.state = RunPhaseState.PENDING.value
|
||||
p.ended_at = None
|
||||
await s.commit()
|
||||
|
||||
# Second run: phase 2 succeeds this time.
|
||||
phase2_calls.clear()
|
||||
|
||||
def _build_second(
|
||||
persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
|
||||
) -> Any:
|
||||
async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
|
||||
envelope = messages["messages"][0]["content"] if "messages" in messages else ""
|
||||
is_phase2 = "spec2" in envelope
|
||||
phase2_calls.append(1 if is_phase2 else 0)
|
||||
expected_name = "spec2.json" if is_phase2 else "spec.json"
|
||||
expected = root_dir / "artifacts" / expected_name
|
||||
expected.parent.mkdir(parents=True, exist_ok=True)
|
||||
content = json.dumps(_valid_spec(uuid4()))
|
||||
expected.write_text(content, encoding="utf-8")
|
||||
for mw in middleware:
|
||||
if hasattr(mw, "awrap_tool_call"):
|
||||
req = MagicMock()
|
||||
req.tool_call = {
|
||||
"name": "write_file",
|
||||
"args": {"file_path": str(expected), "content": content},
|
||||
"id": "y",
|
||||
}
|
||||
await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
|
||||
return {"messages": []}
|
||||
|
||||
agent = MagicMock()
|
||||
agent.ainvoke = _ainvoke
|
||||
return agent
|
||||
|
||||
with patch("my_deepagent.engine.build_agent", side_effect=_build_second):
|
||||
resume_result = await engine.resume(run_id)
|
||||
|
||||
assert resume_result.state == RunState.COMPLETED, (
|
||||
f"resume should complete: state={resume_result.state}, error={resume_result.error}"
|
||||
)
|
||||
# Phase 1 should be skipped on resume (only phase 2 invoked).
|
||||
assert phase2_calls == [1], f"phase 2 should be the only re-invoked phase, got {phase2_calls}"
|
||||
|
||||
# RUN_RESUMED event must have been emitted.
|
||||
async with db.session() as s:
|
||||
events = (
|
||||
(
|
||||
await s.execute(
|
||||
select(RunEventRow.type).where(RunEventRow.run_id == str(run_id))
|
||||
)
|
||||
)
|
||||
.scalars()
|
||||
.all()
|
||||
)
|
||||
assert "run.resumed" in events
|
||||
assert "phase.skipped" in events # phase 1 skipped
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Scenario 2: terminal run rejected
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_terminal_run_raises(
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
db: Database,
|
||||
) -> None:
|
||||
"""A run in a terminal state (e.g. completed) cannot be resumed."""
|
||||
template = _two_phase_workflow()
|
||||
engine = _make_engine(
|
||||
db, tmp_path, personas, registry, consent_store, available_backends
|
||||
)
|
||||
|
||||
def _build(
|
||||
persona: Any, config: Any, *, root_dir: Path, middleware: list[Any], **_kw: Any
|
||||
) -> Any:
|
||||
async def _ainvoke(messages: Any, **_kwargs: Any) -> Any:
|
||||
envelope = messages["messages"][0]["content"] if "messages" in messages else ""
|
||||
phase_key = "spec2" if "spec2" in envelope else "spec"
|
||||
expected = root_dir / "artifacts" / f"{phase_key}.json"
|
||||
expected.parent.mkdir(parents=True, exist_ok=True)
|
||||
content = json.dumps(_valid_spec(uuid4()))
|
||||
expected.write_text(content, encoding="utf-8")
|
||||
for mw in middleware:
|
||||
if hasattr(mw, "awrap_tool_call"):
|
||||
req = MagicMock()
|
||||
req.tool_call = {
|
||||
"name": "write_file",
|
||||
"args": {"file_path": str(expected), "content": content},
|
||||
"id": "x",
|
||||
}
|
||||
await mw.awrap_tool_call(req, AsyncMock(return_value=MagicMock()))
|
||||
return {"messages": []}
|
||||
|
||||
agent = MagicMock()
|
||||
agent.ainvoke = _ainvoke
|
||||
return agent
|
||||
|
||||
with patch("my_deepagent.engine.build_agent", side_effect=_build):
|
||||
result = await engine.run(template, repo_path=tmp_path, base_branch="main")
|
||||
assert result.state == RunState.COMPLETED
|
||||
|
||||
with pytest.raises(MyDeepAgentError) as exc:
|
||||
await engine.resume(result.run_id)
|
||||
assert exc.value.code == "run_already_terminal"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Scenario 3: unknown run id
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_unknown_run_raises(
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
db: Database,
|
||||
) -> None:
|
||||
"""resume(unknown_uuid) → MyDeepAgentError(code=run_not_found)."""
|
||||
engine = _make_engine(
|
||||
db, tmp_path, personas, registry, consent_store, available_backends
|
||||
)
|
||||
with pytest.raises(MyDeepAgentError) as exc:
|
||||
await engine.resume(uuid4())
|
||||
assert exc.value.code == "run_not_found"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Scenario 4: RunBindingRow rows missing
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_missing_bindings_raises(
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
db: Database,
|
||||
) -> None:
|
||||
"""A run whose RunBindingRow rows were never persisted cannot be resumed."""
|
||||
template = _two_phase_workflow()
|
||||
engine = _make_engine(
|
||||
db, tmp_path, personas, registry, consent_store, available_backends
|
||||
)
|
||||
|
||||
# Seed a RunRow + WorkflowTemplateRow but NO RunBindingRow.
|
||||
run_id = uuid4()
|
||||
tpl_id = uuid4()
|
||||
async with db.session() as s:
|
||||
s.add(
|
||||
WorkflowTemplateRow(
|
||||
id=str(tpl_id),
|
||||
name=template.name,
|
||||
version=template.version,
|
||||
hash="sha:fake",
|
||||
definition=template.model_dump(by_alias=True),
|
||||
created_at="2026-05-16T00:00:00+00:00",
|
||||
)
|
||||
)
|
||||
await s.flush()
|
||||
s.add(
|
||||
RunRow(
|
||||
id=str(run_id),
|
||||
template_id=str(tpl_id),
|
||||
template_hash="sha:fake",
|
||||
state=RunState.EXECUTING.value,
|
||||
repo_path=str(tmp_path / "repo"),
|
||||
base_branch="main",
|
||||
worktree_root=str(tmp_path / "wt"),
|
||||
created_at="2026-05-16T00:00:00+00:00",
|
||||
updated_at="2026-05-16T00:00:00+00:00",
|
||||
)
|
||||
)
|
||||
await s.commit()
|
||||
|
||||
with pytest.raises(MyDeepAgentError) as exc:
|
||||
await engine.resume(run_id)
|
||||
assert exc.value.code == "run_metadata_missing"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Scenario 5: workflow_templates.definition is corrupt
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_corrupt_template_definition_raises(
|
||||
tmp_path: Path,
|
||||
personas: list[Any],
|
||||
registry: ArtifactSchemaRegistry,
|
||||
consent_store: PersonaConsentStore,
|
||||
available_backends: BackendAvailability,
|
||||
db: Database,
|
||||
) -> None:
|
||||
"""A run whose workflow_templates.definition is malformed → fatal."""
|
||||
engine = _make_engine(
|
||||
db, tmp_path, personas, registry, consent_store, available_backends
|
||||
)
|
||||
|
||||
run_id = uuid4()
|
||||
tpl_id = uuid4()
|
||||
persona_id = uuid4()
|
||||
async with db.session() as s:
|
||||
# Corrupt definition (missing required `roles` / `phases`)
|
||||
s.add(
|
||||
WorkflowTemplateRow(
|
||||
id=str(tpl_id),
|
||||
name="corrupt",
|
||||
version=1,
|
||||
hash="sha:corrupt",
|
||||
definition={"name": "corrupt"},
|
||||
created_at="2026-05-16T00:00:00+00:00",
|
||||
)
|
||||
)
|
||||
await s.flush()
|
||||
s.add(
|
||||
RunRow(
|
||||
id=str(run_id),
|
||||
template_id=str(tpl_id),
|
||||
template_hash="sha:corrupt",
|
||||
state=RunState.EXECUTING.value,
|
||||
repo_path=str(tmp_path / "repo"),
|
||||
base_branch="main",
|
||||
worktree_root=str(tmp_path / "wt"),
|
||||
created_at="2026-05-16T00:00:00+00:00",
|
||||
updated_at="2026-05-16T00:00:00+00:00",
|
||||
)
|
||||
)
|
||||
s.add(
|
||||
AgentPersonaRow(
|
||||
id=str(persona_id),
|
||||
name="openrouter-deepseek-spec-writer",
|
||||
version=1,
|
||||
hash="sha:p",
|
||||
definition=personas[0].model_dump(by_alias=True),
|
||||
created_at="2026-05-16T00:00:00+00:00",
|
||||
)
|
||||
)
|
||||
# Flush RunRow + AgentPersonaRow before adding RunBindingRow so the FKs
|
||||
# are satisfied at INSERT time (SQLite with PRAGMA foreign_keys=ON).
|
||||
await s.flush()
|
||||
s.add(
|
||||
RunBindingRow(
|
||||
id=str(uuid4()),
|
||||
run_id=str(run_id),
|
||||
role_id="spec_writer",
|
||||
persona_id=str(persona_id),
|
||||
persona_hash="sha:p",
|
||||
backend="openrouter",
|
||||
binding_hash="sha:b",
|
||||
)
|
||||
)
|
||||
await s.commit()
|
||||
|
||||
with pytest.raises(MyDeepAgentError) as exc:
|
||||
await engine.resume(run_id)
|
||||
assert exc.value.code == "template_load_failed"
|
||||
Reference in New Issue
Block a user