cross-eval

Author	SHA1	Message	Date
chungyeong	a85a490a9b	Make plan-review a review-fix-verify loop	2026-03-15 00:01:26 +09:00
chungyeong	60c7b07939	fix: capture_diff uses base commit to handle agent self-commits Claude in agentic mode (interactive, no -p flag) commits its own changes, advancing HEAD. This made `git diff --cached HEAD` return empty, triggering false EMPTY_DIFF errors every time. Now capture_diff diffs against the base commit SHA recorded at worktree creation, so changes are captured regardless of whether the agent committed them. Also adds UX_IMPROVEMENT_PLAN.md for guided message improvements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 23:59:53 +09:00
이충영 에이닷서비스개발	af05fc1ddb	fix: preserve agentic branch when intermediate commits exist _finalize_worktree was returning None and deleting the branch when the final commit was empty, even though _commit_iteration had already committed changes during the pipeline. Now checks git log for any commits on the branch before deciding to clean up. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 20:48:25 +09:00
이충영 에이닷서비스개발	cc8d583914	fix: Claude reviewer empty output, worktree isolation false positives, and input file access - Add -p flag to _CLAUDE_REVIEW_ARGS so reviewer uses print mode (stdin→stdout) instead of interactive mode which conflicts with plan permission mode - Copy input files (plan, checklist) into worktree .cross-eval-inputs/ so agents in plan mode can access them without escaping the sandbox - Simplify _snapshot_repo_state to use only git diff HEAD + untracked hashes, eliminating false positives from staging state changes (git diff --cached) and git status index drift during long-running pipelines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 16:19:57 +09:00
chungyeong	7b95233edf	feat: tighten agentic runtime handoffs and quality gates	2026-03-14 10:05:25 +09:00
chungyeong	87bc0ffbfb	feat: propagate execution evidence across iterations and enhance reports - Carry execution evidence forward so reviewer/senior prompts in subsequent iterations can inspect prior transcript and command data - Add {execution_evidence} to REVIEW_ONLY templates (en/ko) - Add evidence summary table to iteration reports - Fix test_agentic to match stdin-based prompt delivery for Claude - Add expanded claim/no-change marker tests and cross-iteration evidence propagation tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 23:36:28 +09:00
chungyeong	b19d174c98	feat: isolate agentic worktrees and surface execution evidence	2026-03-13 22:50:46 +09:00
chungyeong	3fb19e90c0	feat: harden runtime evidence and claude agentic validation	2026-03-13 22:29:22 +09:00
chungyeong	28dd794f54	feat: add runtime discovery and execution traces	2026-03-13 21:52:13 +09:00
chungyeong	941304398d	release: cut 0.2.0 baseline	2026-03-13 21:47:54 +09:00
chungyeong	204e071b74	feat: ESCALATE verdict, issue tracker, onboarding commands Add 3-verdict system (PASS/FAIL/ESCALATE) with priority handling across simple and phased pipelines. Senior reviewers can now escalate issues requiring human intervention, immediately breaking the review loop. - ESCALATE verdict extraction with highest priority over PASS/FAIL - Issue Tracker tables (ISS-NNN) carried across iterations - Auto-escalate heuristic using (file, keyword) composite fingerprints - Report restructuring: executive view first (verdict → tracker → metrics) - Onboarding: `doctor`, `demo`, `init --guided` commands - Exit codes: PASS=0, FAIL=1, ESCALATE=2 - 87 tests passing (54 config + 25 onboarding + 8 integration) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 18:19:05 +09:00
이충영 에이닷서비스개발	ee4f1a07ef	initial commit	2026-03-11 21:53:14 +09:00

12 Commits