test(verify-v04): polish — clean W4 final-state + dynamic 미완 section

직전 commit (f31aa5d) 의 두 보고서 결함 수정.  결과 수치 (26/1/0) 동일.

1. W4.json `final=...` 가 OpenRouter 402 응답 JSON 의 중간 문자
   (`'message': 'Insufficient credits. Add more using https://...', '`)
   에서 잘려 보고서 셀이 지저분.  `finalize_w34.py` 가 402 + "credit"
   문자열을 감지하면 `next-phase blocked by OpenRouter 402
   (credit top-up needed)` 한 줄로 치환.

2. `build_report.py` 의 미완 / 후속 작업 섹션이 W3 PASS 인데 phase 4 가
   미완료 라는 nuance 를 놓침 (기존: "없음 — W3/W4/C12 모두 live PASS").
   W3.note 가 "pending" / "credit" / "/4 phases" 패턴을 포함하면 phase 4
   결제 대기 안내를 자동 표시.

3. C12.json / W3.json / W4.json 의 ts 갱신 (재실행 흔적).

검증
  uv run mypy --strict src       → Success: no issues found in 77 source files
  uv run ruff check src tests    → All checks passed
  uv run ruff format --check src tests → 139 files already formatted
  node scripts/verify_v04/c12_ime.mjs → 7/7 passed
  uv run python scripts/verify_v04/finalize_w34.py
    → W3  (3/4 phases live PASS), W4  (resume() PHASE_SKIPPED ⊇ {repro,diag,fix})
  uv run python scripts/verify_v04/build_report.py → PASS=26 FAIL=1 SKIP=0
  uv run pytest -q --ignore=tests/integration/test_e2e_workflow.py \
                  --deselect tests/integration/test_openrouter_smoke.py
    → 709 passed, 4 deselected

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
chungyeong
2026-05-19 01:09:54 +09:00
parent f31aa5d1e8
commit 010f6423eb
6 changed files with 38 additions and 17 deletions

View File

@@ -41,7 +41,11 @@ def main() -> int:
# Add I1 manually (pytest baseline) # Add I1 manually (pytest baseline)
cats["I"][1].append( cats["I"][1].append(
{"id": "I1", "ok": True, "note": "pytest 709 PASS (workflow regression + unit + integration)"} {
"id": "I1",
"ok": True,
"note": "pytest 709 PASS (workflow regression + unit + integration)",
}
) )
pass_total = 0 pass_total = 0
@@ -71,9 +75,7 @@ def main() -> int:
# Q-judge detail # Q-judge detail
lines.append("## Q judge — 항목별 점수") lines.append("## Q judge — 항목별 점수")
lines.append("") lines.append("")
lines.append( lines.append("| Q | A (DeepSeek) | C (Claude Code sub) | A/C % | verdict |")
"| Q | A (DeepSeek) | C (Claude Code sub) | A/C % | verdict |"
)
lines.append("|---|---|---|---|---|") lines.append("|---|---|---|---|---|")
for qid in ("Q1", "Q2", "Q3", "Q4", "Q5", "Q6"): for qid in ("Q1", "Q2", "Q3", "Q4", "Q5", "Q6"):
jp = _JUDGES / f"{qid}.json" jp = _JUDGES / f"{qid}.json"
@@ -154,16 +156,28 @@ def main() -> int:
"`uv run python scripts/verify_v04/run_c12.py` 로 7 케이스 검증." "`uv run python scripts/verify_v04/run_c12.py` 로 7 케이스 검증."
) )
# W3 가 PASS 라도, partial-live (e.g. 3/4 phase) 경우는 phase 4 가 외부
# 결제 대기 중이라는 사실을 명시. PASS row 만으로는 그 뉘앙스가 빠짐.
w3_row = by_id.get("W3", {})
w3_note = (w3_row.get("note") or "").lower()
w3_partial = "pass" in w3_note and (
"pending" in w3_note or "credit" in w3_note or "/4 phases" in w3_note
)
lines.append("")
lines.append("### 미완 / 후속 작업")
if leftover_lines: if leftover_lines:
lines.append("")
lines.append("### 미완 / 후속 작업")
lines.extend(leftover_lines) lines.extend(leftover_lines)
lines.append("") elif w3_partial:
lines.append(
"- W3 phase 4 (verify): 3/4 phase 라이브 PASS 후 OpenRouter 크레딧 "
"소진으로 4번째 phase 차단. 결제 후 "
"`uv run python scripts/verify_v04/finalize_w34.py` 로 재실행하면 "
"phase 4 까지 완주."
)
else: else:
lines.append("")
lines.append("### 미완 / 후속 작업")
lines.append("- 없음 — W3/W4/C12 모두 live PASS.") lines.append("- 없음 — W3/W4/C12 모두 live PASS.")
lines.append("") lines.append("")
_REPORT.write_text("\n".join(lines), encoding="utf-8") _REPORT.write_text("\n".join(lines), encoding="utf-8")
print(f"report → {_REPORT}") print(f"report → {_REPORT}")

View File

@@ -142,7 +142,14 @@ async def main() -> int:
result = await engine.resume(_STUCK_RUN_ID) result = await engine.resume(_STUCK_RUN_ID)
final_state = result.state.value final_state = result.state.value
except Exception as e: except Exception as e:
final_state = f"{type(e).__name__}: {str(e)[:120]}" # Short, human-readable summary — the verify report needs to read cleanly.
# 402 from OpenRouter is the expected blocker for the next live LLM call;
# surface that as a single tag rather than dumping the full JSON body.
msg = str(e)
if "402" in msg and "credit" in msg.lower():
final_state = "next-phase blocked by OpenRouter 402 (credit top-up needed)"
else:
final_state = f"{type(e).__name__}: {msg[:80]}"
# Confirm PHASE_SKIPPED fired for each completed phase. # Confirm PHASE_SKIPPED fired for each completed phase.
async with db.session() as s: async with db.session() as s:

View File

@@ -2,5 +2,5 @@
"id": "C12", "id": "C12",
"ok": true, "ok": true,
"note": "C12 IME: 7/7 passed", "note": "C12 IME: 7/7 passed",
"ts": "2026-05-18T15:12:02+00:00" "ts": "2026-05-18T16:05:18+00:00"
} }

View File

@@ -2,5 +2,5 @@
"id": "W3", "id": "W3",
"ok": true, "ok": true,
"note": "3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up.", "note": "3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up.",
"ts": "2026-05-18T15:24:59+00:00" "ts": "2026-05-18T16:07:44+00:00"
} }

View File

@@ -1,6 +1,6 @@
{ {
"id": "W4", "id": "W4",
"ok": true, "ok": true,
"note": "resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient credits. Add more using https://openrouter.ai/settings/credits', '", "note": "resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=next-phase blocked by OpenRouter 402 (credit top-up needed)",
"ts": "2026-05-18T15:24:59+00:00" "ts": "2026-05-18T16:07:45+00:00"
} }

View File

@@ -47,7 +47,7 @@
|---|---|---| |---|---|---|
| W2 | ✅ PASS | spec-and-review E2E PASS in 160s (~$0.05) | | W2 | ✅ PASS | spec-and-review E2E PASS in 160s (~$0.05) |
| W3 | ✅ PASS | 3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up. | | W3 | ✅ PASS | 3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up. |
| W4 | ✅ PASS | resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient credits. Add more using https://openrouter.ai/settings/credits', ' | | W4 | ✅ PASS | resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=next-phase blocked by OpenRouter 402 (credit top-up needed) |
## Q — Benchmark vs Claude Code sub-agent ## Q — Benchmark vs Claude Code sub-agent
@@ -83,4 +83,4 @@
- Q1 (코드 생성, 84%) 만 보더라인. 코드 자체는 동작하나 sub-agent 의 오류 처리/스타일이 더 깔끔. - Q1 (코드 생성, 84%) 만 보더라인. 코드 자체는 동작하나 sub-agent 의 오류 처리/스타일이 더 깔끔.
### 미완 / 후속 작업 ### 미완 / 후속 작업
- 없음 — W3/W4/C12 모두 live PASS. - W3 phase 4 (verify): 3/4 phase 라이브 PASS 후 OpenRouter 크레딧 소진으로 4번째 phase 차단. 결제 후 `uv run python scripts/verify_v04/finalize_w34.py` 로 재실행하면 phase 4 까지 완주.