Commit Graph

20 Commits

Author SHA1 Message Date
chungyeong
28efd5bb8f fix: use incremental diff per iteration instead of cumulative base diff
After each iteration's _commit_iteration, record the new HEAD SHA and use
it as the diff anchor for the next iteration. Previously capture_diff
always diffed against the initial base commit, causing every iteration to
return the same full cumulative diff — reviewers couldn't see what changed
between iterations, leading to repeated feedback and stuck FAIL loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:07:11 +09:00
chungyeong
bf64d19123 Fix plan-review worktree document tracking 2026-03-15 00:35:42 +09:00
chungyeong
a85a490a9b Make plan-review a review-fix-verify loop 2026-03-15 00:01:26 +09:00
chungyeong
60c7b07939 fix: capture_diff uses base commit to handle agent self-commits
Claude in agentic mode (interactive, no -p flag) commits its own changes,
advancing HEAD. This made `git diff --cached HEAD` return empty, triggering
false EMPTY_DIFF errors every time. Now capture_diff diffs against the
base commit SHA recorded at worktree creation, so changes are captured
regardless of whether the agent committed them.

Also adds UX_IMPROVEMENT_PLAN.md for guided message improvements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 23:59:53 +09:00
이충영 에이닷서비스개발
af05fc1ddb fix: preserve agentic branch when intermediate commits exist
_finalize_worktree was returning None and deleting the branch when the
final commit was empty, even though _commit_iteration had already
committed changes during the pipeline. Now checks git log for any
commits on the branch before deciding to clean up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 20:48:25 +09:00
이충영 에이닷서비스개발
0858675076 fix: remove --permission-mode plan from reviewer args
Plan mode causes Claude to spend all time on tool calls (Read/Grep)
in -p mode, producing empty stdout. Reviewers receive full context
(diff, plan, checklist) via the prompt, so file access is not required.
Without --permission-mode, -p mode defaults to read-allowed, write-denied.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 18:29:53 +09:00
이충영 에이닷서비스개발
cc8d583914 fix: Claude reviewer empty output, worktree isolation false positives, and input file access
- Add -p flag to _CLAUDE_REVIEW_ARGS so reviewer uses print mode (stdin→stdout)
  instead of interactive mode which conflicts with plan permission mode
- Copy input files (plan, checklist) into worktree .cross-eval-inputs/ so
  agents in plan mode can access them without escaping the sandbox
- Simplify _snapshot_repo_state to use only git diff HEAD + untracked hashes,
  eliminating false positives from staging state changes (git diff --cached)
  and git status index drift during long-running pipelines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 16:19:57 +09:00
chungyeong
7b95233edf feat: tighten agentic runtime handoffs and quality gates 2026-03-14 10:05:25 +09:00
chungyeong
87bc0ffbfb feat: propagate execution evidence across iterations and enhance reports
- Carry execution evidence forward so reviewer/senior prompts in
  subsequent iterations can inspect prior transcript and command data
- Add {execution_evidence} to REVIEW_ONLY templates (en/ko)
- Add evidence summary table to iteration reports
- Fix test_agentic to match stdin-based prompt delivery for Claude
- Add expanded claim/no-change marker tests and cross-iteration
  evidence propagation tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:36:28 +09:00
chungyeong
c467222a2a fix: instruct coder to use Edit/Write tools instead of describing changes
Claude -p mode tends to describe changes in text rather than actually
applying them via tools. Added explicit rule requiring tool-based edits
so that file modifications produce real git diffs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:19:22 +09:00
chungyeong
99cbf171aa fix: revert -p removal — Claude -p mode has full tool access
Claude -p (print mode) is non-interactive but retains full tool access
(Edit, Write, Bash, etc.) with --dangerously-skip-permissions. Removing
-p caused Claude to enter interactive mode which requires a TTY and
produces zero output when run as a subprocess with piped I/O.

Now delivers prompt via stdin for both Claude and Codex in agentic mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:13:12 +09:00
chungyeong
d5fcc258b7 fix: unset CLAUDECODE env var to allow nested Claude subprocess calls
Claude Code refuses to launch inside another Claude Code session.
Strip the CLAUDECODE marker from the inherited environment so that
cross-eval can spawn Claude as a subprocess from within Claude Code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:05:16 +09:00
chungyeong
290eace01b fix: send EOF via empty stdin so Claude exits after agentic prompt
Without -p, Claude enters interactive mode and waits for more input
indefinitely. Setting input="" closes the stdin pipe immediately,
causing Claude to process the positional prompt and then exit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:04:13 +09:00
chungyeong
ecf44b4c07 fix: strip -p/--print flags in agentic mode so Claude can actually modify files
The agentic invocation path inherited -p (print mode) from _CLAUDE_BASE_ARGS
but only stripped the stdin sentinel "-". Print mode makes Claude a one-shot
text completer that cannot use tools or write files, resulting in zero diffs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:00:40 +09:00
chungyeong
b19d174c98 feat: isolate agentic worktrees and surface execution evidence 2026-03-13 22:50:46 +09:00
chungyeong
3fb19e90c0 feat: harden runtime evidence and claude agentic validation 2026-03-13 22:29:22 +09:00
chungyeong
28dd794f54 feat: add runtime discovery and execution traces 2026-03-13 21:52:13 +09:00
chungyeong
941304398d release: cut 0.2.0 baseline 2026-03-13 21:47:54 +09:00
chungyeong
204e071b74 feat: ESCALATE verdict, issue tracker, onboarding commands
Add 3-verdict system (PASS/FAIL/ESCALATE) with priority handling across
simple and phased pipelines. Senior reviewers can now escalate issues
requiring human intervention, immediately breaking the review loop.

- ESCALATE verdict extraction with highest priority over PASS/FAIL
- Issue Tracker tables (ISS-NNN) carried across iterations
- Auto-escalate heuristic using (file, keyword) composite fingerprints
- Report restructuring: executive view first (verdict → tracker → metrics)
- Onboarding: `doctor`, `demo`, `init --guided` commands
- Exit codes: PASS=0, FAIL=1, ESCALATE=2
- 87 tests passing (54 config + 25 onboarding + 8 integration)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:19:05 +09:00
이충영 에이닷서비스개발
ee4f1a07ef initial commit 2026-03-11 21:53:14 +09:00