Make plan-review a review-fix-verify loop

2026-03-15 00:01:26 +09:00
parent 60c7b07939
commit a85a490a9b
7 changed files with 289 additions and 73 deletions
--- a/cross_eval/prompts.py
+++ b/cross_eval/prompts.py
@@ -472,12 +472,58 @@ PLAN_REVIEW_TEMPLATE_KO = """\
 그렇지 않으면: VERDICT: FAIL
 """

+PLAN_FIX_TEMPLATE = """\
+You are tasked with revising planning documents based on adjudicated review feedback.
+
+## Artifact References
+{artifact_references}
+
+## Current Review Feedback
+{feedback}
+
+## Instructions
+1. Read the referenced plan/checklist/docs/review artifacts directly from disk.
+2. Update the planning package itself: the plan, checklist, and reference documents as needed.
+3. Do NOT write or modify production code. Only revise planning artifacts.
+4. Address ONLY the confirmed planning issues from the current review feedback.
+5. If feedback marks any item as DISMISSED or false positive, leave it unchanged.
+6. Make the smallest document changes that resolve ambiguity, omissions, scope creep, or repository compatibility issues.
+7. Keep the plan, checklist, and supporting docs internally consistent after your edits.
+8. After editing, briefly summarize what you changed and any blocker that still needs human input.
+"""
+
+PLAN_FIX_TEMPLATE_KO = """\
+당신은 시니어 리뷰 결과를 바탕으로 기획 문서를 수정하는 담당자입니다.
+
+## 참조 아티팩트
+{artifact_references}
+
+## 현재 리뷰 피드백
+{feedback}
+
+## 지침
+1. 참조된 plan/checklist/docs/review markdown를 직접 읽으세요.
+2. 수정 대상은 기획 패키지 자체입니다. 필요에 따라 기획서, 체크리스트, 참고 문서를 수정하세요.
+3. 프로덕션 코드를 작성하거나 수정하지 마세요. 기획 문서만 고치세요.
+4. 현재 리뷰 피드백에서 확정된 기획 이슈만 해결하세요.
+5. DISMISSED 또는 오탐으로 정리된 항목은 건드리지 마세요.
+6. 모호성, 누락, 과도한 범위, 저장소 정합성 문제를 해소하는 최소한의 문서 수정만 하세요.
+7. 수정 후에도 기획서, 체크리스트, 참고 문서가 서로 모순되지 않게 유지하세요.
+8. 수정이 끝나면 무엇을 바꿨는지와 아직 사람 판단이 필요한 blocker가 있는지 짧게 정리하세요.
+"""
+
 AGGREGATE_REVIEW_TEMPLATE = """\
 You are adjudicating multiple review results and turning them into an actionable decision.

 ## Artifact References
 {artifact_references}

+## Candidate Artifact Under Review
+{candidate_outputs}
+
+## Reviewer Findings Bundle
+{reviews_bundle}
+
 ## Previous Issue Tracker
 {previous_senior_tracker}

@@ -486,19 +532,19 @@ You are adjudicating multiple review results and turning them into an actionable

 ## Instructions
 Read the referenced plan/checklist/docs/review artifacts directly from disk. \
-Explore the project directory and the referenced git commit/diff to confirm the \
-current codebase state. Use the execution evidence above to verify claims against \
+Inspect the repository and referenced artifacts only as needed to confirm the \
+current target state. Use the execution evidence above to verify claims against \
 actual command outputs, artifact paths, and exit codes. Then:
 1. Deduplicate overlapping issues across reviewers.
 2. Resolve disagreements explicitly.
-3. Keep only issues supported by the plan, checklist, code, or reviewer evidence.
+3. Keep only issues supported by the plan, checklist, reference docs, repository state, or reviewer evidence.
 4. When evidence is mixed, explain what was confirmed, what was dismissed, and what still needs follow-up.
-5. Produce a prioritized action list for the coder.
+5. Produce a prioritized action list for the implementer/editor.
 6. Maintain the Issue Tracker table across iterations (carry forward unresolved issues).
 7. If no confirmed issue remains, output VERDICT: PASS.
-8. If issues exist that the coder can fix, output VERDICT: FAIL.
+8. If issues exist that the implementer/editor can fix, output VERDICT: FAIL.
 9. If issues require human intervention (ambiguous requirements, architecture decisions, \
-external dependency problems, or the same issue persists after 2+ fix attempts), \
+external dependency problems, or the same issue persists after 2+ attempts), \
 output VERDICT: ESCALATE.

 ## Output Format
@@ -512,8 +558,8 @@ output VERDICT: ESCALATE.
 (Write "None" if nothing was dismissed.)

 ### Action Items
-1. Concrete fix the coder should make
-2. Concrete fix the coder should make
+1. Concrete fix the implementer/editor should make
+2. Concrete fix the implementer/editor should make

 ## Issue Tracker

@@ -536,6 +582,12 @@ AGGREGATE_REVIEW_TEMPLATE_KO = """\
 ## 참조 아티팩트
 {artifact_references}

+## 현재 검토 대상
+{candidate_outputs}
+
+## 리뷰 결과 묶음
+{reviews_bundle}
+
 ## 이전 이슈 트래커
 {previous_senior_tracker}

@@ -543,17 +595,17 @@ AGGREGATE_REVIEW_TEMPLATE_KO = """\
 {execution_evidence}

 ## 지침
-참조된 plan/checklist/docs/review markdown와 git 상태를 직접 읽어 현재 코드베이스 상태를 확인한 뒤, \
+참조된 plan/checklist/docs/review markdown와 저장소 상태를 직접 읽어 현재 검토 대상의 상태를 확인한 뒤, \
 위 실행 증거를 활용하여 에이전트의 주장을 실제 명령어 출력, 아티팩트 경로, 종료 코드로 검증하세요. \
 그런 다음 아래를 수행하세요.
 1. 리뷰어들 사이에 중복되는 이슈를 합치세요.
 2. 의견 충돌은 명시적으로 정리하세요.
-3. 기획서, 체크리스트, 코드, 리뷰 근거로 뒷받침되는 이슈만 남기세요.
+3. 기획서, 체크리스트, 참고 문서, 저장소 상태, 리뷰 근거로 뒷받침되는 이슈만 남기세요.
 4. 근거가 엇갈리면 무엇이 확정이고 무엇이 기각 또는 추가확인 대상인지 분명히 적으세요.
-5. coder가 바로 수정할 수 있는 우선순위 액션 아이템을 만드세요.
+5. 수정 담당자가 바로 처리할 수 있는 우선순위 액션 아이템을 만드세요.
 6. 이슈 트래커 테이블을 반복 간에 유지하세요 (미해결 이슈를 이월).
 7. 확정된 이슈가 없으면 VERDICT: PASS 를 출력하세요.
-8. coder가 수정 가능한 이슈가 있으면 VERDICT: FAIL 을 출력하세요.
+8. 수정 담당자가 해결 가능한 이슈가 있으면 VERDICT: FAIL 을 출력하세요.
 9. 사람의 개입이 필요한 이슈(모호한 요구사항, 아키텍처 결정, 외부 의존성 문제, \
 동일 이슈가 2회 이상 해결 실패)가 있으면 VERDICT: ESCALATE 를 출력하세요.

@@ -568,8 +620,8 @@ AGGREGATE_REVIEW_TEMPLATE_KO = """\
 (기각된 항목이 없으면 "없음"이라고 작성하세요.)

 ### 액션 아이템
-1. coder가 수정해야 할 구체적인 작업
-2. coder가 수정해야 할 구체적인 작업
+1. 수정 담당자가 처리해야 할 구체적인 작업
+2. 수정 담당자가 처리해야 할 구체적인 작업

 ## 이슈 트래커

@@ -592,6 +644,7 @@ DEFAULT_TEMPLATES: dict[str, dict[str, str]] = {
        "coding": CODING_TEMPLATE,
        "review": REVIEW_TEMPLATE,
        "plan-review": PLAN_REVIEW_TEMPLATE,
+        "plan-fix": PLAN_FIX_TEMPLATE,
        "review-only": REVIEW_ONLY_TEMPLATE,
        "aggregate-review": AGGREGATE_REVIEW_TEMPLATE,
    },
@@ -599,6 +652,7 @@ DEFAULT_TEMPLATES: dict[str, dict[str, str]] = {
        "coding": CODING_TEMPLATE_KO,
        "review": REVIEW_TEMPLATE_KO,
        "plan-review": PLAN_REVIEW_TEMPLATE_KO,
+        "plan-fix": PLAN_FIX_TEMPLATE_KO,
        "review-only": REVIEW_ONLY_TEMPLATE_KO,
        "aggregate-review": AGGREGATE_REVIEW_TEMPLATE_KO,
    },
@@ -843,56 +897,75 @@ def _build_review_only_preset(
 def _build_plan_review_preset(
    coders: list[str], reviewers: list[str], seniors: list[str],
 ) -> list[StepConfig]:
-    """Plan-review: reviewers audit planning docs before implementation."""
+    """Plan-review: review planning docs, revise them, then verify in a loop."""
+    if not coders:
+        raise ValueError("'plan-review' preset requires at least 1 coder")
    if not reviewers:
        raise ValueError("'plan-review' preset requires at least 1 reviewer")

-    if len(reviewers) == 1 and not seniors:
-        return [
+    review_steps: list[StepConfig] = []
+    if len(reviewers) == 1:
+        review_steps.append(
            StepConfig(
                name="plan_review",
                agent=reviewers[0],
                role="review",
                prompt_template="default:plan-review",
                output_key="plan_review_result",
-                verdict=True,
            ),
-        ]
+        )
+        review_step_names = ["plan_review"]
+        review_output_keys = ["plan_review_result"]
+    else:
+        reviewer_keys = _unique_safe_keys(reviewers)
+        for reviewer, rk in zip(reviewers, reviewer_keys):
+            review_steps.append(
+                StepConfig(
+                    name=f"plan_review_{rk}",
+                    agent=reviewer,
+                    role="review",
+                    prompt_template="default:plan-review",
+                    output_key=f"plan_review_{rk}",
+                    parallel=True,
+                ),
+            )
+        review_step_names = [f"plan_review_{rk}" for rk in reviewer_keys]
+        review_output_keys = [f"plan_review_{rk}" for rk in reviewer_keys]

-    steps: list[StepConfig] = []
-    reviewer_keys = _unique_safe_keys(reviewers)
-    for reviewer, rk in zip(reviewers, reviewer_keys):
-        steps.append(
-            StepConfig(
-                name=f"plan_review_{rk}",
-                agent=reviewer,
-                role="review",
-                prompt_template="default:plan-review",
-                output_key=f"plan_review_{rk}",
-                verdict=not seniors,
-                parallel=True,
-            ),
-        )
-    if seniors:
-        step_names = [f"plan_review_{rk}" for rk in reviewer_keys]
-        output_keys = [f"plan_review_{rk}" for rk in reviewer_keys]
-        steps.append(
-            StepConfig(
-                name="senior_review",
-                agent=seniors[0],
-                role="review",
-                prompt_template="default:aggregate-review",
-                output_key="senior_review_result",
-                verdict=True,
-                context_override={
-                    "candidate_outputs": "Planning documents under review (plan/checklist/reference docs).",
-                    "reviews_bundle": _build_named_bundle(
-                        reviewers, step_names, output_keys, "Review",
-                    ),
-                },
-            ),
-        )
-    return steps
+    fix_coder = coders[0]
+    senior_agent = seniors[0] if seniors else reviewers[0]
+
+    return review_steps + [
+        StepConfig(
+            name="aggregate_review",
+            agent=senior_agent,
+            role="review",
+            prompt_template="default:aggregate-review",
+            output_key="aggregate_review",
+            context_override={
+                "candidate_outputs": "Current planning package under review (plan/checklist/reference docs).",
+                "reviews_bundle": _build_named_bundle(
+                    reviewers, review_step_names, review_output_keys, "Review",
+                ),
+            },
+        ),
+        StepConfig(
+            name="plan_fix",
+            agent=fix_coder,
+            role="coding",
+            prompt_template="default:plan-fix",
+            output_key="plan_fix_output",
+            context_override={"feedback": "{aggregate_review}"},
+        ),
+        StepConfig(
+            name="verify",
+            agent=senior_agent,
+            role="review",
+            prompt_template="default:plan-review",
+            output_key="verify_result",
+            verdict=True,
+        ),
+    ]


 def _build_review_fix_preset(