- Carry execution evidence forward so reviewer/senior prompts in
subsequent iterations can inspect prior transcript and command data
- Add {execution_evidence} to REVIEW_ONLY templates (en/ko)
- Add evidence summary table to iteration reports
- Fix test_agentic to match stdin-based prompt delivery for Claude
- Add expanded claim/no-change marker tests and cross-iteration
evidence propagation tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude -p mode tends to describe changes in text rather than actually
applying them via tools. Added explicit rule requiring tool-based edits
so that file modifications produce real git diffs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude -p (print mode) is non-interactive but retains full tool access
(Edit, Write, Bash, etc.) with --dangerously-skip-permissions. Removing
-p caused Claude to enter interactive mode which requires a TTY and
produces zero output when run as a subprocess with piped I/O.
Now delivers prompt via stdin for both Claude and Codex in agentic mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code refuses to launch inside another Claude Code session.
Strip the CLAUDECODE marker from the inherited environment so that
cross-eval can spawn Claude as a subprocess from within Claude Code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without -p, Claude enters interactive mode and waits for more input
indefinitely. Setting input="" closes the stdin pipe immediately,
causing Claude to process the positional prompt and then exit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agentic invocation path inherited -p (print mode) from _CLAUDE_BASE_ARGS
but only stripped the stdin sentinel "-". Print mode makes Claude a one-shot
text completer that cannot use tools or write files, resulting in zero diffs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>