test(verify-v04): polish — clean W4 final-state + dynamic 미완 section
직전 commit (f31aa5d) 의 두 보고서 결함 수정. 결과 수치 (26/1/0) 동일.
1. W4.json `final=...` 가 OpenRouter 402 응답 JSON 의 중간 문자
(`'message': 'Insufficient credits. Add more using https://...', '`)
에서 잘려 보고서 셀이 지저분. `finalize_w34.py` 가 402 + "credit"
문자열을 감지하면 `next-phase blocked by OpenRouter 402
(credit top-up needed)` 한 줄로 치환.
2. `build_report.py` 의 미완 / 후속 작업 섹션이 W3 PASS 인데 phase 4 가
미완료 라는 nuance 를 놓침 (기존: "없음 — W3/W4/C12 모두 live PASS").
W3.note 가 "pending" / "credit" / "/4 phases" 패턴을 포함하면 phase 4
결제 대기 안내를 자동 표시.
3. C12.json / W3.json / W4.json 의 ts 갱신 (재실행 흔적).
검증
uv run mypy --strict src → Success: no issues found in 77 source files
uv run ruff check src tests → All checks passed
uv run ruff format --check src tests → 139 files already formatted
node scripts/verify_v04/c12_ime.mjs → 7/7 passed
uv run python scripts/verify_v04/finalize_w34.py
→ W3 ✅ (3/4 phases live PASS), W4 ✅ (resume() PHASE_SKIPPED ⊇ {repro,diag,fix})
uv run python scripts/verify_v04/build_report.py → PASS=26 FAIL=1 SKIP=0
uv run pytest -q --ignore=tests/integration/test_e2e_workflow.py \
--deselect tests/integration/test_openrouter_smoke.py
→ 709 passed, 4 deselected
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -41,7 +41,11 @@ def main() -> int:
|
||||
|
||||
# Add I1 manually (pytest baseline)
|
||||
cats["I"][1].append(
|
||||
{"id": "I1", "ok": True, "note": "pytest 709 PASS (workflow regression + unit + integration)"}
|
||||
{
|
||||
"id": "I1",
|
||||
"ok": True,
|
||||
"note": "pytest 709 PASS (workflow regression + unit + integration)",
|
||||
}
|
||||
)
|
||||
|
||||
pass_total = 0
|
||||
@@ -71,9 +75,7 @@ def main() -> int:
|
||||
# Q-judge detail
|
||||
lines.append("## Q judge — 항목별 점수")
|
||||
lines.append("")
|
||||
lines.append(
|
||||
"| Q | A (DeepSeek) | C (Claude Code sub) | A/C % | verdict |"
|
||||
)
|
||||
lines.append("| Q | A (DeepSeek) | C (Claude Code sub) | A/C % | verdict |")
|
||||
lines.append("|---|---|---|---|---|")
|
||||
for qid in ("Q1", "Q2", "Q3", "Q4", "Q5", "Q6"):
|
||||
jp = _JUDGES / f"{qid}.json"
|
||||
@@ -154,16 +156,28 @@ def main() -> int:
|
||||
"`uv run python scripts/verify_v04/run_c12.py` 로 7 케이스 검증."
|
||||
)
|
||||
|
||||
# W3 가 PASS 라도, partial-live (e.g. 3/4 phase) 경우는 phase 4 가 외부
|
||||
# 결제 대기 중이라는 사실을 명시. PASS row 만으로는 그 뉘앙스가 빠짐.
|
||||
w3_row = by_id.get("W3", {})
|
||||
w3_note = (w3_row.get("note") or "").lower()
|
||||
w3_partial = "pass" in w3_note and (
|
||||
"pending" in w3_note or "credit" in w3_note or "/4 phases" in w3_note
|
||||
)
|
||||
|
||||
lines.append("")
|
||||
lines.append("### 미완 / 후속 작업")
|
||||
if leftover_lines:
|
||||
lines.append("")
|
||||
lines.append("### 미완 / 후속 작업")
|
||||
lines.extend(leftover_lines)
|
||||
lines.append("")
|
||||
elif w3_partial:
|
||||
lines.append(
|
||||
"- W3 phase 4 (verify): 3/4 phase 라이브 PASS 후 OpenRouter 크레딧 "
|
||||
"소진으로 4번째 phase 차단. 결제 후 "
|
||||
"`uv run python scripts/verify_v04/finalize_w34.py` 로 재실행하면 "
|
||||
"phase 4 까지 완주."
|
||||
)
|
||||
else:
|
||||
lines.append("")
|
||||
lines.append("### 미완 / 후속 작업")
|
||||
lines.append("- 없음 — W3/W4/C12 모두 live PASS.")
|
||||
lines.append("")
|
||||
lines.append("")
|
||||
|
||||
_REPORT.write_text("\n".join(lines), encoding="utf-8")
|
||||
print(f"report → {_REPORT}")
|
||||
|
||||
@@ -142,7 +142,14 @@ async def main() -> int:
|
||||
result = await engine.resume(_STUCK_RUN_ID)
|
||||
final_state = result.state.value
|
||||
except Exception as e:
|
||||
final_state = f"{type(e).__name__}: {str(e)[:120]}"
|
||||
# Short, human-readable summary — the verify report needs to read cleanly.
|
||||
# 402 from OpenRouter is the expected blocker for the next live LLM call;
|
||||
# surface that as a single tag rather than dumping the full JSON body.
|
||||
msg = str(e)
|
||||
if "402" in msg and "credit" in msg.lower():
|
||||
final_state = "next-phase blocked by OpenRouter 402 (credit top-up needed)"
|
||||
else:
|
||||
final_state = f"{type(e).__name__}: {msg[:80]}"
|
||||
|
||||
# Confirm PHASE_SKIPPED fired for each completed phase.
|
||||
async with db.session() as s:
|
||||
|
||||
@@ -2,5 +2,5 @@
|
||||
"id": "C12",
|
||||
"ok": true,
|
||||
"note": "C12 IME: 7/7 passed",
|
||||
"ts": "2026-05-18T15:12:02+00:00"
|
||||
"ts": "2026-05-18T16:05:18+00:00"
|
||||
}
|
||||
@@ -2,5 +2,5 @@
|
||||
"id": "W3",
|
||||
"ok": true,
|
||||
"note": "3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up.",
|
||||
"ts": "2026-05-18T15:24:59+00:00"
|
||||
"ts": "2026-05-18T16:07:44+00:00"
|
||||
}
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"id": "W4",
|
||||
"ok": true,
|
||||
"note": "resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient credits. Add more using https://openrouter.ai/settings/credits', '",
|
||||
"ts": "2026-05-18T15:24:59+00:00"
|
||||
"note": "resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=next-phase blocked by OpenRouter 402 (credit top-up needed)",
|
||||
"ts": "2026-05-18T16:07:45+00:00"
|
||||
}
|
||||
@@ -47,7 +47,7 @@
|
||||
|---|---|---|
|
||||
| W2 | ✅ PASS | spec-and-review E2E PASS in 160s (~$0.05) |
|
||||
| W3 | ✅ PASS | 3/4 phases live PASS — reproduce, diagnose, fix (artefact validated + approval gate). phase 'verify' pending OpenRouter credit top-up. |
|
||||
| W4 | ✅ PASS | resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient credits. Add more using https://openrouter.ai/settings/credits', ' |
|
||||
| W4 | ✅ PASS | resume() emitted PHASE_SKIPPED for ['diagnose', 'fix', 'reproduce'] (expected ⊇ ['diagnose', 'fix', 'reproduce']); final=next-phase blocked by OpenRouter 402 (credit top-up needed) |
|
||||
|
||||
## Q — Benchmark vs Claude Code sub-agent
|
||||
|
||||
@@ -83,4 +83,4 @@
|
||||
- Q1 (코드 생성, 84%) 만 보더라인. 코드 자체는 동작하나 sub-agent 의 오류 처리/스타일이 더 깔끔.
|
||||
|
||||
### 미완 / 후속 작업
|
||||
- 없음 — W3/W4/C12 모두 live PASS.
|
||||
- W3 phase 4 (verify): 3/4 phase 라이브 PASS 후 OpenRouter 크레딧 소진으로 4번째 phase 차단. 결제 후 `uv run python scripts/verify_v04/finalize_w34.py` 로 재실행하면 phase 4 까지 완주.
|
||||
|
||||
Reference in New Issue
Block a user