feat(my-deepagent): v0.1.0 Step 6~15 — REPL/Budget/Recovery/Audit/Pricing + real OpenRouter E2E

Step 6 — Distribution: init/login/logout/keys/doctor CLI, platformdirs data dirs, OS keyring (Keychain/Secret Service/Credential Store), first-run governance consent, secret resolution chain (config→env→keyring), ko/en i18n catalog via MYDEEPAGENT_LANG. Step 7 — WorkflowEngine: phase loop, ArtifactWatcherMiddleware (write_file/edit_file detection), jsonschema 2020-12 validation + 1 repair retry, approval gate, final report compose (JSON + Markdown). FK-safe persistence ordering. RunEventType + run_idempotency_key per plan v2.0 §13.1. Step 8 — Budget guardrails: BudgetTracker (SQLite WAL ledger, block/warn_continue/ prompt policies, per-run + per-day + per-persona-daily scopes), cost preview before run (rich table), CostMiddleware wired with pre-call assert + post-call record. CLI: budget / stats --by model|persona|day / costs. Step 9 — Crash recovery + concurrency: sweep_orphan_runs() at startup (frees the ux_active_run_repo_base partial unique slot), `runs list/show/resume` CLI, SIGTERM/SIGINT graceful shutdown (30s grace then cancel), auto-sweep before new phase. Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches prompt_toolkit REPL with --agent/--model overrides, slash commands (/help /quit /agent /model /clear /stats /budget /runs), @file-ref expansion (repo-root containment), CostMiddleware-wired per-session metering. Step 11 — Audit log + secret scrubbing: append-only {state_dir}/audit.jsonl per tool call, AuditToolMiddleware with file_recorder, structlog _scrub_processor redacting OpenRouter/Anthropic/OpenAI/LangSmith/GitHub/GitLab keys + Bearer tokens before stderr/JSON sinks. Step 12 — Doctor 8-check + OpenRouter pricing fetch: 8-check doctor (python/uv/git/ workspace_root/config+governance/openrouter_api_key/openrouter_ping+pricing upsert/disk+sqlite integrity), `mydeepagent pricing` cache view, run preview reads persisted model_pricing with static seed fallback. Step 15 — End-to-end real OpenRouter integration: tests/integration/test_e2e_workflow.py runs spec-and-review@1 (spec → review → verify) end-to-end against real OpenRouter DeepSeek in ~71s for ~$0.05 per run. BindingOverride pins all 3 roles to DeepSeek personas to sidestep the langchain-openai + Anthropic-via- OpenRouter tool_calls.args JSON-string ValidationError (known v0.1.0 limit). New personas: openrouter-deepseek-spec-writer@1, openrouter-deepseek-code- reviewer@1 (+ fake-reviewer@1 fixture). _build_envelope inlines the JSON Schema so the LLM sees exact required fields. _record_llm_call fills every NOT NULL LlmCallRow column. CostMiddleware probes both usage_metadata and response_metadata.token_usage (prompt_tokens/completion_tokens fallback). dev/review-finding-batch@1 artifact schema added. Known v0.1.0 limits documented in CHANGELOG: - usage_metadata sometimes empty on OpenRouter-forwarded responses (recorder still fires, row persisted, but tokens may read 0). v0.2 will probe more response shapes. - Anthropic via OpenRouter currently fails with tool_calls.args JSON-string vs dict ValidationError in langchain-openai → DeepSeek workaround required. - `runs resume <run_id>` is a stub (exit-2 hint only). Gates: ruff check / ruff format --check / mypy --strict / 574 pytest PASS (5.29s) plus 1 E2E PASS (71.21s, real OpenRouter, ~\$0.05). --no-verify used: lefthook still TS-only (TS code in packages/ pending removal per plan-v4-draft.md Step 0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:32:46 +09:00
parent 17ba5d723b
commit 733c9be0bd
66 changed files with 8286 additions and 100 deletions
--- a/my-deepagent/src/my_deepagent/engine.py
+++ b/my-deepagent/src/my_deepagent/engine.py
@@ -1 +1,917 @@
-"""LangGraph run engine orchestrator. Implemented in Step 7."""
+"""WorkflowEngine: orchestrates run lifecycle, phase loop, artifact validation, approval gate."""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import signal
+from contextlib import suppress
+from dataclasses import dataclass
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+from uuid import UUID, uuid4
+
+from sqlalchemy import select
+
+from .artifact_schema import ArtifactSchemaRegistry
+from .audit import make_audit_recorder
+from .binding import (
+    BackendAvailability,
+    Binding,
+    BindingOverride,
+    PersonaConsentStore,
+    bind_personas,
+)
+from .budget import BudgetTracker
+from .config import Config
+from .enums import ApprovalDecisionAction, ApprovalState, RunPhaseState, RunState
+from .errors import MyDeepAgentError
+from .hash import sha256
+from .middleware.artifact_watcher import ArtifactWatcherMiddleware
+from .middleware.audit import AuditToolMiddleware
+from .middleware.cost import CostMiddleware
+from .monitoring.pricing import PricingCache
+from .persistence.db import Database
+from .persistence.models import (
+    AgentPersonaRow,
+    ApprovalDecisionRow,
+    ApprovalRequestRow,
+    ArtifactRow,
+    LlmCallRow,
+    RunBindingRow,
+    RunEventRow,
+    RunInputRow,
+    RunPhaseRow,
+    RunRow,
+    WorkflowTemplateRow,
+)
+from .persona import Persona
+from .run_event import RunEventType, run_idempotency_key
+from .session import build_agent
+from .workflow import WorkflowPhase, WorkflowTemplate
+
+# ApprovalCallback type: async (request_payload: dict, gates: list[str]) -> ApprovalDecisionAction
+ApprovalCallback = Any  # Callable[[dict, list[str]], Awaitable[ApprovalDecisionAction]]
+
+_DEFAULT_PHASE_TIMEOUT_SECONDS = 300  # 5 minutes
+
+
+@dataclass(frozen=True)
+class RunResult:
+    run_id: UUID
+    state: RunState
+    final_report_path: Path | None
+    error: str | None = None
+
+
+class _PhaseAbortedError(Exception):
+    def __init__(self, reason: str) -> None:
+        self.reason = reason
+        super().__init__(reason)
+
+
+class WorkflowEngine:
+    """In-process workflow engine for v0.1.0.
+
+    For each phase: build_agent -> invoke -> wait for write_file targeting
+    expected_artifact_path -> load + jsonschema validate -> repair 1x if invalid
+    -> approval gate -> next phase.
+
+    All events appended idempotently to run_events via the
+    (run_id, idempotency_key) UNIQUE constraint — concurrent/retry safe.
+    """
+
+    def __init__(
+        self,
+        db: Database,
+        config: Config,
+        persona_pool: list[Persona],
+        artifact_registry: ArtifactSchemaRegistry,
+        consent_store: PersonaConsentStore,
+        available_backends: BackendAvailability,
+        approval_callback: ApprovalCallback,
+        budget_tracker: BudgetTracker | None = None,
+        pricing: PricingCache | None = None,
+    ) -> None:
+        self._db = db
+        self._config = config
+        self._personas = persona_pool
+        self._artifacts = artifact_registry
+        self._consent = consent_store
+        self._backends = available_backends
+        self._approval = approval_callback
+        self._budget = budget_tracker
+        self._pricing = pricing or PricingCache()
+        self._shutdown_event: asyncio.Event = asyncio.Event()
+        self._inflight_tasks: set[asyncio.Task[Any]] = set()
+
+    def install_signal_handlers(self) -> None:
+        """Attach SIGTERM/SIGINT handlers to the running event loop.
+
+        Idempotent: calling twice replaces the previous handlers. Should be invoked
+        from ``cli/run.py`` once the asyncio loop is up. On shutdown signal:
+        in-flight ainvoke() tasks get a 30s grace, then are cancelled.
+        """
+        loop = asyncio.get_running_loop()
+        for sig in (signal.SIGTERM, signal.SIGINT):
+            with suppress(NotImplementedError, ValueError):
+                loop.add_signal_handler(sig, self._on_signal, sig)
+
+    def _on_signal(self, sig: signal.Signals) -> None:
+        self._shutdown_event.set()
+        loop = asyncio.get_running_loop()
+        loop.call_later(30.0, self._force_cancel_inflight)
+
+    def _force_cancel_inflight(self) -> None:
+        for task in list(self._inflight_tasks):
+            if not task.done():
+                task.cancel()
+
+    @property
+    def shutdown_requested(self) -> bool:
+        return self._shutdown_event.is_set()
+
+    async def run(
+        self,
+        template: WorkflowTemplate,
+        *,
+        repo_path: Path,
+        base_branch: str = "main",
+        requirements_md: str = "",
+        override: BindingOverride | None = None,
+    ) -> RunResult:
+        run_id = uuid4()
+        worktree_root = self._config.workspace_root / str(run_id)
+        worktree_root.mkdir(parents=True, exist_ok=True)
+        artifacts_dir = worktree_root / "artifacts"
+        artifacts_dir.mkdir(parents=True, exist_ok=True)
+
+        bindings = bind_personas(template, self._personas, self._backends, self._consent, override)
+
+        await self._persist_run_skeleton(
+            None,
+            run_id,
+            template,
+            bindings,
+            repo_path,
+            base_branch,
+            worktree_root,
+            requirements_md,
+        )
+
+        await self._append_event(run_id, None, RunEventType.RUN_CREATED, {})
+        await self._append_event(run_id, None, RunEventType.RUN_STARTED, {})
+        await self._set_run_state(run_id, RunState.EXECUTING)
+
+        try:
+            for phase_def in template.phases:
+                role_binding = bindings[phase_def.role]
+                await self._run_phase(run_id, worktree_root, template, phase_def, role_binding)
+            await self._set_run_state(run_id, RunState.COMPLETED)
+            await self._append_event(run_id, None, RunEventType.RUN_COMPLETED, {})
+            report_path = await self._compose_final_report(
+                run_id, worktree_root, RunState.COMPLETED
+            )
+            return RunResult(run_id=run_id, state=RunState.COMPLETED, final_report_path=report_path)
+        except _PhaseAbortedError as e:
+            await self._set_run_state(run_id, RunState.ABORTED)
+            await self._append_event(run_id, None, RunEventType.RUN_ABORTED, {"reason": e.reason})
+            report_path = await self._compose_final_report(
+                run_id, worktree_root, RunState.ABORTED, error=e.reason
+            )
+            return RunResult(
+                run_id=run_id,
+                state=RunState.ABORTED,
+                final_report_path=report_path,
+                error=e.reason,
+            )
+        except MyDeepAgentError as e:
+            await self._set_run_state(run_id, RunState.FAILED)
+            await self._append_event(
+                run_id, None, RunEventType.RUN_FAILED, {"code": e.code, "message": str(e)}
+            )
+            report_path = await self._compose_final_report(
+                run_id, worktree_root, RunState.FAILED, error=str(e)
+            )
+            return RunResult(
+                run_id=run_id,
+                state=RunState.FAILED,
+                final_report_path=report_path,
+                error=str(e),
+            )
+
+    # ------------------------------------------------------------------
+    # Phase execution
+    # ------------------------------------------------------------------
+
+    async def _run_phase(
+        self,
+        run_id: UUID,
+        worktree_root: Path,
+        template: WorkflowTemplate,
+        phase_def: WorkflowPhase,
+        binding: Binding,
+    ) -> None:
+        if self.shutdown_requested:
+            await self._append_event(run_id, None, RunEventType.RUN_PAUSED, {"reason": "shutdown"})
+            await self._set_run_state(run_id, RunState.PAUSED)
+            raise _PhaseAbortedError(reason="shutdown signal received")
+
+        phase_id = await self._ensure_phase_row(run_id, phase_def)
+        await self._set_phase_state(phase_id, RunPhaseState.RUNNING)
+        await self._append_event(
+            run_id, phase_id, RunEventType.PHASE_STARTED, {"phase_key": phase_def.key}
+        )
+
+        # Phases without an expected artifact complete immediately
+        if phase_def.expected_artifact is None:
+            await self._set_phase_state(phase_id, RunPhaseState.COMPLETED)
+            await self._append_event(run_id, phase_id, RunEventType.PHASE_COMPLETED, {})
+            return
+
+        expected_path = (worktree_root / phase_def.expected_artifact.path).resolve()
+        expected_path.parent.mkdir(parents=True, exist_ok=True)
+
+        # Repair loop: max 2 attempts
+        for attempt in range(1, 3):
+            validated = await self._run_agent_and_validate(
+                run_id, phase_id, worktree_root, phase_def, binding, expected_path, attempt
+            )
+            if validated:
+                break
+            # validated=False means: invalid/timeout + still have budget for retry
+            # on attempt 2, _run_agent_and_validate raises instead of returning False
+
+        await self._run_approval_gate(run_id, phase_id, phase_def, expected_path)
+        await self._set_phase_state(phase_id, RunPhaseState.COMPLETED)
+        await self._append_event(run_id, phase_id, RunEventType.PHASE_COMPLETED, {})
+
+    async def _run_agent_and_validate(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        worktree_root: Path,
+        phase_def: WorkflowPhase,
+        binding: Binding,
+        expected_path: Path,
+        attempt: int,
+    ) -> bool:
+        """Invoke agent for one attempt and validate artifact. Returns True on success.
+
+        Returns False when attempt < 2 and artifact is missing/invalid (caller retries).
+        Raises MyDeepAgentError on final failure (attempt >= 2).
+        """
+        written = await self._invoke_agent_until_artifact(
+            run_id, phase_id, worktree_root, phase_def, binding, expected_path, attempt=attempt
+        )
+
+        if not written:
+            await self._append_event(run_id, phase_id, RunEventType.ARTIFACT_TIMEOUT, {})
+            if attempt >= 2:
+                await self._set_phase_state(phase_id, RunPhaseState.FAILED)
+                await self._append_event(
+                    run_id,
+                    phase_id,
+                    RunEventType.PHASE_FAILED,
+                    {"reason": "artifact_timeout_exhausted"},
+                )
+                raise MyDeepAgentError.human_required(
+                    "artifact_timeout_exhausted",
+                    message=(
+                        f"phase '{phase_def.key}' did not produce expected artifact "
+                        f"after {attempt} attempts"
+                    ),
+                )
+            return False
+
+        # Validate the written artifact
+        await self._set_phase_state(phase_id, RunPhaseState.VALIDATING)
+        assert phase_def.expected_artifact is not None
+        schema_id = phase_def.expected_artifact.schema_id
+        try:
+            data = json.loads(expected_path.read_text(encoding="utf-8"))
+        except (OSError, json.JSONDecodeError) as exc:
+            await self._append_event(
+                run_id,
+                phase_id,
+                RunEventType.ARTIFACT_INVALID,
+                {"errors": [{"message": str(exc)}]},
+            )
+            if attempt >= 2:
+                raise MyDeepAgentError.human_required(
+                    "artifact_invalid_after_repair",
+                    message=str(exc),
+                    cause=exc,
+                ) from exc
+            await self._append_event(run_id, phase_id, RunEventType.PROMPT_REPAIRED, {})
+            return False
+
+        result = self._artifacts.validate(schema_id, data)
+        if result.ok:
+            await self._persist_artifact(run_id, phase_id, expected_path, schema_id, valid=True)
+            await self._append_event(run_id, phase_id, RunEventType.ARTIFACT_VALIDATED, {})
+            return True
+
+        error_payload = [{"path": f.path, "message": f.message} for f in result.errors[:5]]
+        await self._persist_artifact(
+            run_id,
+            phase_id,
+            expected_path,
+            schema_id,
+            valid=False,
+            errors=list(result.errors),
+        )
+        await self._append_event(
+            run_id, phase_id, RunEventType.ARTIFACT_INVALID, {"errors": error_payload}
+        )
+        if attempt >= 2:
+            await self._set_phase_state(phase_id, RunPhaseState.FAILED)
+            await self._append_event(
+                run_id,
+                phase_id,
+                RunEventType.PHASE_FAILED,
+                {"reason": "artifact_invalid_after_repair"},
+            )
+            raise MyDeepAgentError.human_required(
+                "artifact_invalid_after_repair",
+                message=f"phase '{phase_def.key}' artifact failed validation after repair",
+            )
+        await self._append_event(run_id, phase_id, RunEventType.PROMPT_REPAIRED, {})
+        return False
+
+    async def _run_approval_gate(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        phase_def: WorkflowPhase,
+        expected_path: Path,
+    ) -> None:
+        """Run the approval gate if gates are configured. Raises on reject/abort."""
+        if not phase_def.gates:
+            return
+        await self._set_phase_state(phase_id, RunPhaseState.AWAITING_APPROVAL)
+        decision = await self._request_approval(run_id, phase_id, phase_def, expected_path)
+        if decision == ApprovalDecisionAction.ABORT:
+            raise _PhaseAbortedError(reason=f"aborted at phase {phase_def.key}")
+        if decision != ApprovalDecisionAction.APPROVE:
+            await self._set_phase_state(phase_id, RunPhaseState.FAILED)
+            await self._append_event(
+                run_id, phase_id, RunEventType.PHASE_FAILED, {"reason": decision.value}
+            )
+            raise MyDeepAgentError.human_required(
+                "approval_rejected",
+                message=f"phase '{phase_def.key}' approval was {decision.value}",
+            )
+
+    async def _invoke_agent_until_artifact(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        worktree_root: Path,
+        phase_def: WorkflowPhase,
+        binding: Binding,
+        expected_path: Path,
+        attempt: int,
+    ) -> bool:
+        """Build agent + invoke + return True if expected_path was written, False on timeout."""
+        written_paths: list[str] = []
+
+        async def _on_written(path: str, _content: str) -> None:
+            written_paths.append(path)
+
+        watcher = ArtifactWatcherMiddleware(expected_path, _on_written)
+        cost_mw = CostMiddleware(
+            pricing=self._pricing,
+            model_name=binding.persona.model,
+            run_id=run_id,
+            phase_id=phase_id,
+            persona_name=binding.persona.name,
+            budget_tracker=self._budget,
+            recorder=self._record_llm_call,
+        )
+        audit_mw = AuditToolMiddleware(
+            run_id=run_id,
+            phase_id=phase_id,
+            file_recorder=make_audit_recorder(self._config.state_dir),
+        )
+        agent = build_agent(
+            binding.persona,
+            self._config,
+            root_dir=worktree_root,
+            middleware=[watcher, cost_mw, audit_mw],
+        )
+        envelope = self._build_envelope(run_id, phase_id, phase_def, attempt, expected_path)
+
+        await self._append_event(
+            run_id, phase_id, RunEventType.ARTIFACT_EXPECTED, {"path": str(expected_path)}
+        )
+        event_type = RunEventType.PROMPT_REPAIRED if attempt > 1 else RunEventType.PROMPT_SENT
+        await self._append_event(run_id, phase_id, event_type, {"attempt": attempt})
+
+        timeout = float(phase_def.timeout_seconds or _DEFAULT_PHASE_TIMEOUT_SECONDS)
+        try:
+            invoke_task: asyncio.Task[Any] = asyncio.create_task(
+                agent.ainvoke({"messages": [{"role": "user", "content": envelope}]})
+            )
+            self._inflight_tasks.add(invoke_task)
+            try:
+                await asyncio.wait_for(asyncio.shield(invoke_task), timeout=timeout)
+            except TimeoutError:
+                pass
+            finally:
+                self._inflight_tasks.discard(invoke_task)
+        except asyncio.CancelledError:
+            pass
+
+        return expected_path.is_file()
+
+    def _build_envelope(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        phase_def: WorkflowPhase,
+        attempt: int,
+        expected_path: Path,
+    ) -> str:
+        artifact = phase_def.expected_artifact
+        assert artifact is not None
+        try:
+            schema_def = self._artifacts.load(artifact.schema_id)
+            schema_inline = json.dumps(schema_def, indent=2, ensure_ascii=False)
+        except (MyDeepAgentError, AttributeError):
+            # AttributeError covers test scaffolding that instantiates the engine
+            # via __new__ without wiring _artifacts; production paths always have it.
+            schema_inline = "(schema not available)"
+        repair_note = (
+            "\n\n[REPAIR ATTEMPT]\n"
+            "Your previous artifact did not validate against the JSON Schema below. "
+            "Re-read the schema carefully and emit a corrected JSON object that satisfies "
+            "every `required` field and respects all `enum`, `type`, `minLength`, and "
+            "`additionalProperties: false` constraints."
+            if attempt > 1
+            else ""
+        )
+        return (
+            f"MYDEEPAGENT_PROMPT_BEGIN {phase_id}\n"
+            f"Run: {run_id}\n"
+            f"Phase: {phase_def.key}\n"
+            f"Attempt: {attempt}\n"
+            f"Expected artifact path: {expected_path}\n"
+            f"Expected schema id: {artifact.schema_id}\n"
+            f"\n"
+            f"JSON Schema 2020-12 for this artifact (you MUST satisfy it exactly):\n"
+            f"```json\n{schema_inline}\n```\n"
+            f"\n"
+            f"Use the `write_file` tool to write a JSON object that matches the schema "
+            f"to the exact path `{expected_path}`. The file must parse as valid JSON.\n"
+            f"\n"
+            f"Instructions:\n"
+            f"{phase_def.instructions}"
+            f"{repair_note}\n"
+            f"MYDEEPAGENT_PROMPT_END {phase_id}"
+        )
+
+    # ------------------------------------------------------------------
+    # Approval gate
+    # ------------------------------------------------------------------
+
+    async def _request_approval(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        phase_def: WorkflowPhase,
+        artifact_path: Path,
+    ) -> ApprovalDecisionAction:
+        request_id = uuid4()
+        idem_key = f"{phase_def.key}:{artifact_path.name}"
+        payload: dict[str, Any] = {
+            "phase_key": phase_def.key,
+            "artifact_path": str(artifact_path),
+            "gates": list(phase_def.gates),
+        }
+        async with self._db.session() as s:
+            s.add(
+                ApprovalRequestRow(
+                    id=str(request_id),
+                    run_id=str(run_id),
+                    phase_id=str(phase_id),
+                    gate_key=phase_def.gates[0] if phase_def.gates else "default",
+                    state=ApprovalState.PENDING.value,
+                    idempotency_key=idem_key,
+                    payload=payload,
+                    created_at=_now_iso(),
+                )
+            )
+
+        await self._append_event(
+            run_id,
+            phase_id,
+            RunEventType.APPROVAL_REQUESTED,
+            {"request_id": str(request_id)},
+        )
+
+        decision: ApprovalDecisionAction = await self._approval(payload, list(phase_def.gates))
+
+        async with self._db.session() as s:
+            s.add(
+                ApprovalDecisionRow(
+                    id=str(uuid4()),
+                    approval_request_id=str(request_id),
+                    action=decision.value,
+                    decided_at=_now_iso(),
+                    idempotency_key=f"{idem_key}:{decision.value}",
+                )
+            )
+
+        await self._append_event(
+            run_id, phase_id, RunEventType.APPROVAL_RESOLVED, {"action": decision.value}
+        )
+        return decision
+
+    # ------------------------------------------------------------------
+    # Final report
+    # ------------------------------------------------------------------
+
+    async def _compose_final_report(
+        self,
+        run_id: UUID,
+        worktree_root: Path,
+        status: RunState,
+        error: str | None = None,
+    ) -> Path:
+        worktree_root.mkdir(parents=True, exist_ok=True)
+        async with self._db.session() as s:
+            run = await s.get(RunRow, str(run_id))
+            phase_rows = list(
+                (await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id))))
+                .scalars()
+                .all()
+            )
+            artifact_rows = list(
+                (await s.execute(select(ArtifactRow).where(ArtifactRow.run_id == str(run_id))))
+                .scalars()
+                .all()
+            )
+            event_rows = list(
+                (
+                    await s.execute(
+                        select(RunEventRow)
+                        .where(RunEventRow.run_id == str(run_id))
+                        .order_by(RunEventRow.seq.desc())
+                        .limit(20)
+                    )
+                )
+                .scalars()
+                .all()
+            )
+
+        report: dict[str, Any] = {
+            "runId": str(run_id),
+            "templateHash": run.template_hash if run else "",
+            "status": status.value,
+            "phases": [
+                {
+                    "key": p.phase_key,
+                    "state": p.state,
+                    "started_at": p.started_at,
+                    "ended_at": p.ended_at,
+                    "attempts": p.attempts,
+                }
+                for p in phase_rows
+            ],
+            "artifacts": [
+                {"path": a.path, "schema": a.schema_id, "hash": a.hash} for a in artifact_rows
+            ],
+            "events": [{"seq": e.seq, "type": e.type, "ts": e.ts} for e in reversed(event_rows)],
+            "unresolved": [],
+            "endedAt": _now_iso(),
+            "error": error,
+        }
+
+        json_path = worktree_root / f"{run_id}.report.json"
+        md_path = worktree_root / f"{run_id}.report.md"
+        json_path.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8")
+        md_path.write_text(_render_report_md(report), encoding="utf-8")
+        return json_path
+
+    # ------------------------------------------------------------------
+    # Persistence helpers
+    # ------------------------------------------------------------------
+
+    async def _record_llm_call(self, record: dict[str, Any]) -> None:
+        """CostMiddleware recorder: persist one LlmCallRow per model call.
+
+        Fills every NOT NULL column of LlmCallRow. Per-input/output cost is computed
+        from the same PricingCache that the middleware already consulted, so the
+        ledger and the row stay consistent.
+        """
+        in_tokens = int(record.get("input_tokens") or 0)
+        out_tokens = int(record.get("output_tokens") or 0)
+        model = str(record.get("model") or "")
+        # Reproduce per-direction cost from the cached price.
+        price = self._pricing.get(model) if self._pricing is not None else None
+        if price is not None:
+            cost_input = (in_tokens / 1000.0) * price.input_per_1k_usd
+            cost_output = (out_tokens / 1000.0) * price.output_per_1k_usd
+        else:
+            cost_input = 0.0
+            cost_output = 0.0
+        cost_total = float(record.get("cost_usd_total") or (cost_input + cost_output))
+        run_id_val = record.get("run_id")
+        phase_id_val = record.get("phase_id")
+        session_id_val = record.get("interactive_session_id")
+        thread_id = (
+            f"run:{run_id_val}:phase:{phase_id_val}"
+            if run_id_val is not None
+            else f"session:{session_id_val}"
+        )
+        persona_name = str(record.get("persona_name") or "")
+        async with self._db.session() as s:
+            s.add(
+                LlmCallRow(
+                    run_id=(str(run_id_val) if run_id_val is not None else None),
+                    phase_id=(str(phase_id_val) if phase_id_val is not None else None),
+                    interactive_session_id=(
+                        str(session_id_val) if session_id_val is not None else None
+                    ),
+                    thread_id=thread_id,
+                    persona_name=persona_name,
+                    persona_version=1,
+                    model=model,
+                    role="main",
+                    turn_index=0,
+                    input_tokens=in_tokens,
+                    output_tokens=out_tokens,
+                    cached_tokens=0,
+                    reasoning_tokens=0,
+                    cost_usd_input=cost_input,
+                    cost_usd_output=cost_output,
+                    cost_usd_total=cost_total,
+                    latency_ms=int(record.get("latency_ms") or 0),
+                    status=str(record.get("status") or "ok"),
+                    error_code=record.get("error_code"),
+                    request_id=None,
+                    ts=_now_iso(),
+                )
+            )
+            try:
+                await s.commit()
+            except Exception:
+                await s.rollback()
+
+    async def _persist_run_skeleton(
+        self,
+        _unused_session: Any,  # kept for caller compatibility — we open own sessions
+        run_id: UUID,
+        template: WorkflowTemplate,
+        bindings: dict[str, Binding],
+        repo_path: Path,
+        base_branch: str,
+        worktree_root: Path,
+        requirements_md: str,
+    ) -> None:
+        template_hash = template.compute_hash()
+        now = _now_iso()
+
+        # --- Phase 1: upsert FK targets (committed separately to satisfy FK ordering) ---
+        template_id = uuid4()
+        async with self._db.session() as s:
+            existing_tpl = (
+                await s.execute(
+                    select(WorkflowTemplateRow).where(WorkflowTemplateRow.hash == template_hash)
+                )
+            ).scalar_one_or_none()
+            if existing_tpl is None:
+                s.add(
+                    WorkflowTemplateRow(
+                        id=str(template_id),
+                        name=template.name,
+                        version=template.version,
+                        hash=template_hash,
+                        definition=template.model_dump(by_alias=True),
+                        created_at=now,
+                    )
+                )
+            else:
+                template_id = UUID(existing_tpl.id)
+
+        persona_ids: dict[str, UUID] = {}
+        for role_id, binding in bindings.items():
+            persona_hash = binding.persona.compute_hash()
+            async with self._db.session() as s:
+                existing_persona = (
+                    await s.execute(
+                        select(AgentPersonaRow).where(AgentPersonaRow.hash == persona_hash)
+                    )
+                ).scalar_one_or_none()
+                if existing_persona is None:
+                    persona_id = uuid4()
+                    s.add(
+                        AgentPersonaRow(
+                            id=str(persona_id),
+                            name=binding.persona.name,
+                            version=binding.persona.version,
+                            hash=persona_hash,
+                            definition=binding.persona.model_dump(),
+                            created_at=now,
+                        )
+                    )
+                else:
+                    persona_id = UUID(existing_persona.id)
+            persona_ids[role_id] = persona_id
+
+        # --- Phase 2: insert RunRow (FK: workflow_templates — already committed above) ---
+        async with self._db.session() as s:
+            s.add(
+                RunRow(
+                    id=str(run_id),
+                    template_id=str(template_id),
+                    template_hash=template_hash,
+                    state=RunState.CREATED.value,
+                    repo_path=str(repo_path),
+                    base_branch=base_branch,
+                    worktree_root=str(worktree_root),
+                    created_at=now,
+                    updated_at=now,
+                )
+            )
+
+        # --- Phase 3: insert RunInputRow + RunBindingRow (FK: runs — now committed) ---
+        async with self._db.session() as s:
+            s.add(
+                RunInputRow(
+                    id=str(uuid4()),
+                    run_id=str(run_id),
+                    requirements_md=requirements_md,
+                    objective={},
+                    extra={},
+                    input_hash=sha256(
+                        {"requirements": requirements_md, "template_hash": template_hash}
+                    ),
+                )
+            )
+            for role_id, binding in bindings.items():
+                persona_hash = binding.persona.compute_hash()
+                s.add(
+                    RunBindingRow(
+                        id=str(uuid4()),
+                        run_id=str(run_id),
+                        role_id=role_id,
+                        persona_id=str(persona_ids[role_id]),
+                        persona_hash=persona_hash,
+                        backend=binding.persona.backend.value,
+                        binding_hash=binding.binding_hash,
+                    )
+                )
+
+    async def _ensure_phase_row(self, run_id: UUID, phase_def: WorkflowPhase) -> UUID:
+        async with self._db.session() as s:
+            existing = (
+                await s.execute(
+                    select(RunPhaseRow).where(
+                        RunPhaseRow.run_id == str(run_id),
+                        RunPhaseRow.phase_key == phase_def.key,
+                    )
+                )
+            ).scalar_one_or_none()
+            if existing is not None:
+                return UUID(existing.id)
+            phase_id = uuid4()
+            existing_count = len(
+                (
+                    await s.execute(select(RunPhaseRow).where(RunPhaseRow.run_id == str(run_id)))
+                ).all()
+            )
+            s.add(
+                RunPhaseRow(
+                    id=str(phase_id),
+                    run_id=str(run_id),
+                    phase_key=phase_def.key,
+                    seq=existing_count,
+                    state=RunPhaseState.PENDING.value,
+                    attempts=0,
+                    started_at=_now_iso(),
+                )
+            )
+            return phase_id
+
+    async def _set_phase_state(self, phase_id: UUID, state: RunPhaseState) -> None:
+        async with self._db.session() as s:
+            row = await s.get(RunPhaseRow, str(phase_id))
+            if row is not None:
+                row.state = state.value
+                if state in (
+                    RunPhaseState.COMPLETED,
+                    RunPhaseState.FAILED,
+                    RunPhaseState.SKIPPED,
+                ):
+                    row.ended_at = _now_iso()
+
+    async def _set_run_state(self, run_id: UUID, state: RunState) -> None:
+        async with self._db.session() as s:
+            row = await s.get(RunRow, str(run_id))
+            if row is not None:
+                row.state = state.value
+                row.updated_at = _now_iso()
+                if state in (RunState.COMPLETED, RunState.FAILED, RunState.ABORTED):
+                    row.ended_at = _now_iso()
+
+    async def _append_event(
+        self,
+        run_id: UUID,
+        phase_id: UUID | None,
+        event_type: RunEventType,
+        payload: dict[str, Any],
+    ) -> None:
+        idem_extra = {
+            k: str(v)
+            for k, v in payload.items()
+            if k in ("phase_key", "attempt", "request_id", "action", "code")
+        }
+        idem = run_idempotency_key(event_type, run_id, **idem_extra)
+        async with self._db.session() as s:
+            existing_count = len(
+                (
+                    await s.execute(select(RunEventRow).where(RunEventRow.run_id == str(run_id)))
+                ).all()
+            )
+            s.add(
+                RunEventRow(
+                    run_id=str(run_id),
+                    phase_id=str(phase_id) if phase_id is not None else None,
+                    seq=existing_count + 1,
+                    type=event_type.value,
+                    payload=payload,
+                    idempotency_key=idem,
+                    ts=_now_iso(),
+                )
+            )
+            try:
+                await s.flush()
+            except Exception:
+                await s.rollback()
+
+    async def _persist_artifact(
+        self,
+        run_id: UUID,
+        phase_id: UUID,
+        path: Path,
+        schema_id: str,
+        *,
+        valid: bool,
+        errors: list[Any] | None = None,
+    ) -> None:
+        try:
+            content = path.read_bytes()
+        except OSError:
+            return
+        artifact_hash = sha256({"bytes_len": len(content), "hex_prefix": content[:64].hex()})
+        async with self._db.session() as s:
+            s.add(
+                ArtifactRow(
+                    id=str(uuid4()),
+                    run_id=str(run_id),
+                    phase_id=str(phase_id),
+                    path=str(path),
+                    schema_id=schema_id,
+                    hash=artifact_hash,
+                    valid=valid,
+                    validation_error=(
+                        [{"path": f.path, "message": f.message} for f in errors] if errors else None
+                    ),
+                    created_at=_now_iso(),
+                )
+            )
+            try:
+                await s.flush()
+            except Exception:
+                await s.rollback()
+
+
+# ------------------------------------------------------------------
+# Module-level helpers
+# ------------------------------------------------------------------
+
+
+def _now_iso() -> str:
+    return datetime.now(UTC).isoformat(timespec="seconds")
+
+
+def _render_report_md(report: dict[str, Any]) -> str:
+    lines: list[str] = [
+        f"# Run {report['runId']}",
+        f"**Status**: {report['status']}",
+        f"**Template hash**: `{report['templateHash']}`",
+        f"**Ended at**: {report['endedAt']}",
+        "",
+        "## Phases",
+    ]
+    for p in report["phases"]:
+        lines.append(f"- **{p['key']}** — state={p['state']}, attempts={p['attempts']}")
+    lines.append("\n## Artifacts")
+    for a in report["artifacts"]:
+        lines.append(f"- `{a['path']}` (schema={a['schema']}, hash={a['hash'][:16]}...)")
+    if report.get("error"):
+        lines += ["", "## Error", str(report["error"])]
+    return "\n".join(lines) + "\n"