feat: add fake phase harness

This commit is contained in:
chungyeong
2026-05-10 16:48:52 +09:00
parent be0ddb6e4e
commit 64efeabd33
22 changed files with 5766 additions and 76 deletions

View File

@@ -1,4 +1,4 @@
# Devflow Implementation Plan v3 r4
# Devflow Implementation Plan v3 r9
## 0. Document Status
@@ -11,6 +11,11 @@
- r2 applied CC-6 through CC-10.
- r3 applied CC-11 through CC-15.
- r4 applies CC-16 through CC-18.
- r5 applies CC-19.
- r6 applies CC-20.
- r7 applies CC-21 through CC-23.
- r8 applies CC-24 through CC-26.
- r9 applies CC-27 through CC-28.
## 1. Stack Decisions
@@ -40,7 +45,7 @@
- Pre-commit:
- `lefthook`.
- Runs `biome check --write` on staged files.
- Runs `tsc -b --noEmit` on changed packages.
- Runs `tsc -p tsconfig.typecheck.json --noEmit`.
- Runs related Vitest tests on changed packages.
### 1.3 Database
@@ -819,6 +824,7 @@ export interface TranscriptChunk {
### 8.3 Recovery Counters
- `sendPrompt` retry: 2.
- Means one initial send plus two adapter-level retries, three physical send attempts max.
- `resume` retry: 2.
- `rebootstrap` retry: 1.
- artifact repair retry: 1.
@@ -882,7 +888,7 @@ const PromptEnvelope = z.object({
### 9.3 Rules
- Prompt identity is `dedupKey`.
- Adapter refuses duplicate `dedupKey` for the same session within a run lifetime.
- Adapter treats duplicate `dedupKey` for the same session within a run lifetime as idempotent success and does not reprocess the prompt.
- `attempt` increments only when the engine intentionally re-sends after timeout or repair.
- Adapter-level retry does not increment attempt.
- Completion is never inferred from transcript text.
@@ -1152,7 +1158,7 @@ Transitions:
| `awaiting_approval` | request_changes | `planning` | increment phase attempts |
| `awaiting_approval` | timeout | `paused` | set `paused_from_state='awaiting_approval'` |
| `executing` | phase ok, more phases | `executing` | next phase |
| `executing` | phase needs gate | `awaiting_approval` | request gate |
| `executing` | normal workflow approval gate | `awaiting_approval` | request gate |
| `executing` | all phases done | `completed` | emit `run.completed`, write final report |
| `executing` | unrecoverable error | `failed` | emit `run.failed` |
| `executing` | manual `pauseRun` | `paused` | set `paused_from_state='executing'` |
@@ -1196,6 +1202,14 @@ Transitions:
| `awaiting_approval` | reject / abort | `failed` |
| `awaiting_approval` | request_changes | `running`, attempt + 1 |
Replay rules:
- `phase.started.payload.repair === true` marks that attempt as the single allowed repair attempt. Replaying that attempt MUST use repair instructions, `prompt.repaired`, and must not start a third attempt.
- Repair replay from `running` may reuse an existing `READY` / bootstrapped session even if `last_prompt_hash` still contains the previous attempt's prompt hash; current-attempt prompt send has not happened yet.
- If phase state is `validating` and no artifact row exists yet, replay re-reads and validates the current `expectedArtifactPath` instead of treating the state as corruption.
- If phase state is `validating` and artifact rows already exist for the same phase/path/schema, replay may reuse only an artifact row created at or after the current session `last_prompt_at`; older rows are treated as stale previous-attempt outputs and the file is revalidated.
- Session bootstrap DB row/state changes and `session.created` / `session.ready` events are written in one DB transaction after adapter start succeeds.
## 14. Approval State
States:
@@ -1463,6 +1477,7 @@ Human required:
- `artifact_invalid_after_repair`
- `artifact_timeout_exhausted`
- `prompt_send_exhausted`
- `destructive_command_blocked`
- `secret_access_blocked`
- `backend_unavailable`
@@ -1486,7 +1501,7 @@ Fatal:
Mapping:
- recoverable -> retry; exhausted -> human_required.
- human_required -> run paused and gate created.
- human_required / recovery gate -> run paused and gate created. This is distinct from normal workflow approval gates in §13.1, which use `awaiting_approval`.
- fatal -> run failed, sessions disposed, final report best-effort.
## 19. Concurrent Runs and Crash Recovery
@@ -1721,6 +1736,16 @@ M5+:
| CC-16 | Prompt hash used phaseId but envelope uses phaseKey | prompt hash uses phaseKey |
| CC-17 | abortRun transition too narrow | abort from any non-terminal run state |
| CC-18 | approval pending transition wording conflicted with pause epoch | pending can transition once per pending epoch; paused may unpause to pending |
| CC-19 | `tsc -b --noEmit` is brittle with TypeScript 5.6 project references on clean worktrees | build still uses `tsc -b`; no-emit verification uses root `tsconfig.typecheck.json` |
| CC-20 | `sendPrompt` retry count was ambiguous against Temporal activity attempts | §8.3 now states retry budget means initial attempt plus retries; §15.2 remains Temporal-level attempts only |
| CC-21 | Duplicate prompt dedup handling conflicted with adapter retry idempotency | duplicate `dedupKey` returns idempotent success without reprocessing |
| CC-22 | Normal workflow approval gates and human-required recovery gates were easy to conflate | §13.1 names normal workflow gates; §18 keeps human_required recovery gates paused |
| CC-23 | Phase start and event append could diverge under retry/error | phase start and `phase.started` append occur in one DB transaction |
| CC-24 | Repair attempt replay lost repair prompt identity and one-repair budget | repair attempts are derived from `phase.started.payload.repair`, replay uses repair instructions and `prompt.repaired`, and cannot start attempt 3 |
| CC-25 | `validating` replay failed if crash happened before artifact row insert | replay revalidates the expected artifact file when state is `validating` but no artifact row exists |
| CC-26 | Session bootstrap state/events could diverge | session row/state and `session.created` / `session.ready` events are committed in one DB transaction |
| CC-27 | `validating` replay could reuse stale previous-attempt artifact rows | artifact-row replay requires `artifact.created_at >= tui_sessions.last_prompt_at`; otherwise the file is revalidated |
| CC-28 | repair `running` replay rejected existing READY sessions with previous attempt prompt hash | current-attempt repair prompt is considered unsent, so replay may reuse the session and send `prompt.repaired` |
### Future Open Questions