feat: add temporal run engine integration

2026-05-13 08:39:19 +09:00
parent 78ebd5ef78
commit aa3033771a
37 changed files with 7338 additions and 224 deletions
--- a/docs/plan.md
+++ b/docs/plan.md
@@ -1,4 +1,4 @@
-# Devflow Implementation Plan v3 r9
+# Devflow Implementation Plan v3 r12

 ## 0. Document Status

@@ -16,6 +16,9 @@
 - r7 applies CC-21 through CC-23.
 - r8 applies CC-24 through CC-26.
 - r9 applies CC-27 through CC-28.
+- r10 applies CC-29 through CC-31.
+- r11 applies CC-32.
+- r12 applies CC-33 through CC-35.

 ## 1. Stack Decisions

@@ -1206,6 +1209,9 @@ Replay rules:

 - `phase.started.payload.repair === true` marks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions, `prompt.repaired`, and must not start a third attempt.
 - Repair replay from `running` may reuse an existing `READY` / bootstrapped session even if `last_prompt_hash` still contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet.
+- If phase state is `running`, existing artifact files are never accepted unless the current prompt event (`prompt.sent` or `prompt.repaired`) for the current dedup key is already recorded. Replay without prompt proof treats existing files as stale.
+- If phase state is `running`, session state is `BUSY`, and `last_prompt_hash` matches the current prompt but the matching prompt event is missing, replay waits for the artifact with the current file signature as the baseline. This preserves idempotency without validating a stale pre-existing artifact.
+- Baseline-protected waits must not synthesize durable prompt proof before the wait finishes. If replay crashes or is cancelled before validation, the next replay must still treat the existing artifact as baseline/stale unless real prompt proof already exists.
 - If phase state is `validating` and no artifact row exists yet, replay re-reads and validates the current `expectedArtifactPath` instead of treating the state as corruption.
 - If phase state is `validating` and artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current session `last_prompt_at`; older rows are treated as stale previous-attempt outputs and the file is revalidated.
 - Session bootstrap DB row/state changes and `session.created` / `session.ready` events are written in one DB transaction after adapter start succeeds.
@@ -1328,22 +1334,31 @@ interface RunEngine {

 Activities:

- `lockBindings(input)`
- `generatePhasePlan(runId, phaseKey, attempt)`
- `sendPromptToSession(sessionId, envelope)`
- `waitForArtifact(sessionId, expectedPath, expectedSchema, timeoutMs)`
- `validateArtifact(artifactPath, expectedSchema)`
- `recordEvent(runId, type, payload)`
- `requestApproval(runId, gateKey, phaseId, payload, idempotencyKey)`
- `runCommand(kind, argv, cwd, env)`
- `composeFinalReport(runId)`
+- M5 compatibility activity surface:
+  - `prepareRunActivity(input)`
+  - `lockBindingsActivity(runId)`
+  - `failRunActivity(runId, reason)`
+  - `advanceRunActivity(runId)`
+  - `signalApprovalActivity(runId, approvalRequestId, action, clientToken, comment?)`
+  - `pauseRunActivity(runId)`
+  - `resumeRunActivity(runId)`
+  - `abortRunActivity(runId, reason)`
+  - `getStatusActivity(runId)`
+  - `isRunTerminalActivity(runId)`
+  - `composeFinalReportActivity(runId)`
+- `advanceRunActivity` is the M5 parity wrapper over M4 phase advancement. It may internally perform prompt send, artifact wait/validation, event recording, and approval request creation through the same DB/idempotency contracts already locked in sections 8 through 14.
+- The granular activity split (`sendPromptToSession`, `waitForArtifact`, `validateArtifact`, `recordEvent`, `requestApproval`, `runCommand`) is deferred to a later hardening ADR. It is not an M5 acceptance gate.
+- Prompt/session mutation still occurs only inside worker-hosted activities through SessionManager. M5+ API code never mutates `SessionAdapter` directly.

 Retry policy:

 - Default: max attempts 3, exponential backoff start 1s, max 30s.
- `requestApproval`: max attempts 1.
- `composeFinalReport`: max attempts 1.
- `sendPromptToSession`: max attempts 2; further retry belongs to engine recovery.
+- `composeFinalReportActivity`: max attempts 1.
+- Activity-level failures serialize `DevflowError`; non-recoverable Devflow errors are rethrown as non-retryable Temporal failures.
+- `advanceRunActivity` is cancellation-aware and idempotent by DB state, event idempotency keys, prompt dedup keys, and artifact content keys.
+- Already-applied approval signal replay repairs missing final reports for every terminal run state: `completed`, `failed`, and `aborted`, regardless of whether the replayed approval action was `approve`, `request_changes`, `reject`, or `abort`.
+- API-side already-applied approval replay is report-repair only. It must not call `SessionAdapter` mutation methods; reject/abort session disposal belongs to the worker/session-manager path that originally applies the decision.
+- If a workflow closes before the API observes an approval signal result, closed-workflow settlement must first verify the requested decision was applied, then replay approval side effects, then wait for the terminal report.

 ### 15.3 Hard Constraints

@@ -1746,6 +1761,13 @@ M5+:
 | CC-26 | Session bootstrap state/events could diverge | session row/state and `session.created` / `session.ready` events are committed in one DB transaction |
 | CC-27 | `validating` replay could reuse stale previous-attempt artifact rows | artifact-row replay requires `artifact.created_at >= tui_sessions.last_prompt_at`; otherwise the file is revalidated |
 | CC-28 | repair `running` replay rejected existing READY sessions with previous attempt prompt hash | current-attempt repair prompt is considered unsent, so replay may reuse the session and send `prompt.repaired` |
+| CC-29 | API Temporal approval replay omitted M4 approval side-effect repair | API approval signal reader now wires `replayAppliedApprovalSideEffects`, so already-applied terminal approval replays can repair missing final reports |
+| CC-30 | `running` replay could validate stale artifacts without prompt proof | `running` replay requires matching prompt event proof; BUSY replay without prompt event uses current artifact signature as baseline and ignores stale files |
+| CC-31 | M5 activity list over-specified granular activities not implemented by the M4 parity adapter | M5 locks the compatibility activity wrapper surface; granular activity split is deferred to a later hardening ADR |
+| CC-32 | Already-applied `approve` / `request_changes` replay repaired missing reports for `completed` / `failed` but missed `aborted` | approval replay side-effect repair now composes missing final reports for all terminal states |
+| CC-33 | API-side already-applied `reject` / `abort` replay tried to dispose sessions through DB-only replay validation runtime | API replay side effects are report-repair only; worker-side decision application owns session disposal |
+| CC-34 | Closed-workflow approval settlement waited for reports but did not replay approval side effects | settlement now verifies the requested decision, replays side effects, then waits for the terminal report |
+| CC-35 | Baseline-protected BUSY replay recorded synthetic prompt proof before the baseline wait was durable | baseline replay no longer records synthetic prompt events; replay without real prompt proof keeps treating existing files as stale |

 ### Future Open Questions