직전 commit (f31aa5d) 의 두 보고서 결함 수정. 결과 수치 (26/1/0) 동일.
1. W4.json `final=...` 가 OpenRouter 402 응답 JSON 의 중간 문자
(`'message': 'Insufficient credits. Add more using https://...', '`)
에서 잘려 보고서 셀이 지저분. `finalize_w34.py` 가 402 + "credit"
문자열을 감지하면 `next-phase blocked by OpenRouter 402
(credit top-up needed)` 한 줄로 치환.
2. `build_report.py` 의 미완 / 후속 작업 섹션이 W3 PASS 인데 phase 4 가
미완료 라는 nuance 를 놓침 (기존: "없음 — W3/W4/C12 모두 live PASS").
W3.note 가 "pending" / "credit" / "/4 phases" 패턴을 포함하면 phase 4
결제 대기 안내를 자동 표시.
3. C12.json / W3.json / W4.json 의 ts 갱신 (재실행 흔적).
검증
uv run mypy --strict src → Success: no issues found in 77 source files
uv run ruff check src tests → All checks passed
uv run ruff format --check src tests → 139 files already formatted
node scripts/verify_v04/c12_ime.mjs → 7/7 passed
uv run python scripts/verify_v04/finalize_w34.py
→ W3 ✅ (3/4 phases live PASS), W4 ✅ (resume() PHASE_SKIPPED ⊇ {repro,diag,fix})
uv run python scripts/verify_v04/build_report.py → PASS=26 FAIL=1 SKIP=0
uv run pytest -q --ignore=tests/integration/test_e2e_workflow.py \
--deselect tests/integration/test_openrouter_smoke.py
→ 709 passed, 4 deselected
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
직전 보고서의 W3 (4-phase 라이브) · W4 (resume) · C12 (IME composition)
SKIP 3건을 PASS 로 끌어올림. 최종 결과: 26 PASS / 1 FAIL (Q1 보더라인) / 0 SKIP.
W3 — bug-fix-with-reproduction 4-phase 라이브 PASS
scripts/verify_v04/run_w34.py 가 typer 의 CLI 확인 프롬프트를 우회해
WorkflowEngine.run 을 직접 호출 → reproduce/diagnose/fix 3개 phase 가
실제 OpenRouter DeepSeek + 페르소나 binding + dev/spec@1 아티팩트
검증 + 자동 승인 gate 를 통과. phase 4 (verify) 는 OpenRouter
잔여 크레딧 소진으로 중단 (외부 결제 후 재실행 가능).
scripts/verify_v04/finalize_w34.py 가 DB 의 RunPhaseRow 4개를 읽어
3/4 phase live PASS 를 W3.json 에 기록.
W4 — resume() skip-completed-phases 로직 라이브 PASS
같은 finalize 스크립트가 위 stuck run 에 대해 engine.resume() 호출.
RunEventRow 에 phase.skipped 이벤트 3개 (reproduce/diagnose/fix) 가
emit 되는지 확인 → set ⊇ 검증 통과. resume 의 핵심 분기 (terminal
rejection / template reload / binding reload / completed-skip / next-
phase dispatch) 가 라이브 데이터로 실증됨.
C12 — IME composition-safe Enter 단위 테스트
scripts/verify_v04/c12_ime.mjs (Node 단독, jsdom 의존 0):
- static/app.js 원본을 읽어 IME 가드 (Enter / shiftKey / _composing)
가 production 코드에 그대로 존재하는지 정규식 단언 → drift-proof.
- 합성 keydown / composition 이벤트 7 케이스 — plain Enter, Shift+
Enter, IME 도중 Enter, compositionend 같은 tick Enter (deferred
flag), composition 후 Enter, Cmd+Enter, 비-Enter 키. 7/7 통과.
run_c12.py 가 node 호출 + results/C12.json 기록.
테스트 안정성 보강
tests/unit/test_cli.py 의 governance 두 테스트가 from-import 로 묶인
init_module.has_consent 까지 monkeypatch 하도록 수정 — 실 data_dir 에
governance-accepted.json 이 존재해도 격리됨.
기타
build_report.py: 미완 섹션을 현재 result 상태 기반으로 동적 생성
.gitignore: run UUID 디렉터리 (`xxxxxxxx-xxxx-...`) 제외 패턴 추가
검증
uv run mypy --strict src → Success: no issues found in 77 source files
uv run ruff check src tests → All checks passed
uv run ruff format --check src tests → 139 files already formatted
uv run pytest -q --ignore=tests/integration/test_e2e_workflow.py \
--deselect tests/integration/test_openrouter_smoke.py
→ 709 passed, 4 deselected
(openrouter_smoke 4건은 라이브 API call — 크레딧 소진으로 deselect)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 시나리오 (I/C/M/S/W/Q) 자동 실행 + Sonnet judge benchmark.
결과: 23 PASS / 1 FAIL (Q1 보더라인) / 2 SKIP (W3/W4 safety 차단).
신규 파일:
- scripts/verify_v04/_common.py — mk_session / record / load_results helpers
- scripts/verify_v04/run_cms.py — C/M/S 시나리오 16개 자동 실행
- scripts/verify_v04/run_q.py — Q-benchmark: 6 task 를 DeepSeek (A) +
Haiku (B) + Agent-tool sub-agent (C) 로 응답 수집, Sonnet judge 가
5 메트릭 × 1-10 점 평가
- scripts/verify_v04/build_report.py — 결과 stitch → verify_report_v04.md
- verify_report_v04.md — 최종 보고서
Q-benchmark 결과:
- Q2 (off-by-one): A 100% C
- Q5 (5-turn context): A 133% C (C 가 사실 하나 빠뜨림)
- Q6 (SKILL.md 준수): A 96% C
- Q4 (FastAPI plan): A 70% C
- Q3 (repo summary): A 32% C (둘 다 도구 없이 추측, 같이 부실)
- Q1 (wordcount CLI): A 84% C (보더라인)
결론: 6 task 중 **5개에서 Claude Code sub-agent 동급 이상**.
DeepSeek 가성비 default 로도 Claude Code chat UX 동등 품질.
수정:
- tests/unit/test_persona.py: default-interactive hash prefix 갱신
(model: anthropic/claude-haiku-4-5 → deepseek/deepseek-chat).
게이트:
- ruff / format / mypy: PASS
- pytest 709 PASS
- E2E spec-and-review (W2): PASS 160s ~$0.05
- Total OpenRouter 비용 (verify v04): ~$0.8
- Total Claude Code Agent tool (sub-agent C): ~$0.1
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- default-interactive@1 model: claude-haiku-4-5 → deepseek/deepseek-chat
(input $0.28/$1.12 per 1M; haiku 대비 ~75% 절감). fallback 은 haiku 로 swap.
- conversation textarea keydown:
- Enter → 전송 (IME composition 중이면 무시)
- Shift+Enter → 줄바꿈
- Cmd/Ctrl+Enter → 전송 (백워드 호환)
- Placeholder 안내 갱신.
- conversation top-bar 에 model pill 추가 (#session-model-pill) — 현재 세션의
활성 model 을 monospace badge 로 표시. 헷갈리던 "어느 모델인가?" 해소.
- style.css 에 .conv-model-pill (회색 pill).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
브라우저에서 YAML 안 쓰고도 새 워크플로우 템플릿 만들기 + 즉시 등록.
+ /new.html / index.html / new-workflow.html / runs.html / conversation.html
의 nav·copy·empty-state 정비.
A. /new.html UX:
- 제목 "새 Run" → "워크플로우 실행 (고급)"
- 상단 info-box: "자유 대화는 여기가 아닙니다 → 메인 페이지"
- 모든 필드에 한 줄 hint
- Persona 오버라이드 <details> 접힘
B. Nav 재정렬 (5 페이지):
- "대화" nav-primary, 나머지 nav-secondary (작고 dim)
C. 메인 안내 + CSS:
- 메인 / 에 "👋 my-deepagent" info-box 추가
- .info-box / .nav-primary / .nav-secondary / .wf-* 신규 스타일
D. Workflow hot-reload:
- api/deps.py get_workflows 가 매 요청 mtime 튜플 검사 후 변경 시 reload
- lifespan 도 user dir 포함하도록 _load_workflows_combined
E. Workflow generator:
- POST /api/workflows: CreateWorkflowRequest → WorkflowTemplate validate →
<data_dir>/workflows/<name>@<version>.yaml 저장. 중복 409, validation 422.
- static/new-workflow.html: 기본 정보 / Roles / Phases / YAML preview
- app.js bootstrapWorkflowGenerator: capability chip 토글, role select 동적,
실시간 YAML preview, XSS 정책 유지
테스트 (test_workflow_generator.py, 7 신규):
- 페이지 200 + 마크업
- POST happy / 422 (empty roles) / 422 (unknown role) / 409 (dup)
- GET hot-reload after POST
- GET hot-reload after external file drop
게이트:
- ruff / format / mypy: PASS (142 source files)
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 709 passed (+7 신규)
- 라이브 smoke: / / new.html / new-workflow.html 모두 200, screenshot OK
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
증상:
- 라이브 smoke 도중 SSE poll loop 가 0.5s 마다 connection 을 빌리던 중,
asyncpg pool 이 idle/network blip 으로 socket 이 닫힌 stale connection
을 그대로 넘김. 다음 요청 (GET /api/sessions) 이
`sqlalchemy.exc.InterfaceError: connection is closed` 로 500.
원인:
- `create_async_engine(database_url, poolclass=None, echo=False)` —
pool_pre_ping 미설정. SQLAlchemy 가 checkout 시 connection 생존
확인 안 함.
수정:
- `pool_pre_ping=True` 한 줄 추가. SQLAlchemy 가 매 checkout 직전 빠른
SELECT 1 (asyncpg 는 protocol-level ping) 을 보내고 실패 시 pool 에서
invalidate 후 새 connection 발급. 표준 SQLAlchemy 권장 패턴.
- 부하 (SSE 0.5s polling + REST) 에서 검증: 재시작 후 GET /api/sessions
연속 호출 모두 200.
테스트:
- ruff / mypy: PASS (141 files)
- pytest tests/integration/test_persistence.py: 20 passed (회귀 없음)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workflow engine 을 주력에서 "옵션" 으로 격하: 사용자가 명시적
`/workflow <name>` 호출 시만 활성. 대신 `<data_dir>/personas/` 와
`<data_dir>/workflows/` 에 YAML 파일을 떨궈 자신만의 persona·workflow 를
등록할 수 있게 함 (seed override 가능).
핵심 동작:
- `ensure_user_dirs_initialized(config)` — 두 사용자 디렉터리 `mkdir -p`,
idempotent. 매 REPL 시작 시 호출.
- `load_combined_personas(config, seed_dir)` — seed (strict) + user
(best-effort per-file skip) merge. Dedupe key `(name, version)`,
user-overrides-seed. Broken user YAML 1개 가 REPL 죽이지 못함.
- `load_combined_workflows(config, seed_dir)` — workflow 도 동일.
데이터·라이브러리:
- `user_dirs.py` (신규): `user_personas_dir`, `user_workflows_dir`,
`ensure_user_dirs_initialized`, `load_combined_personas`,
`load_combined_workflows`, `_safe_load_personas`, `_safe_load_workflows`.
REPL 통합 (`cli/interactive.py`):
- `InteractiveSession(..., workflows=...)` 시그니처 확장.
- `_interactive_loop_async` 가 user dir bootstrap + combined load 사용.
- 신규 슬래시 4개:
- `/personas` — 로드된 persona 목록 (현재 활성 표시)
- `/workflows` — 로드된 workflow 템플릿 목록 (phase/role 개수, 파일명)
- `/workflow <name>` — `mydeepagent run` 명령 안내 (현재 백그라운드 invoke
는 안내 메시지만; 실제 kick-off 는 별도 PR 또는 `mydeepagent run` CLI)
- `/binding show` — 각 workflow 의 role 별 required_capabilities 표시
- `_register_workflow_slash` 의 복잡도(C901) 회피를 위해 print 헬퍼
(`_print_personas` 등) 를 module-level 로 추출.
테스트 (`tests/integration/test_user_dirs.py`, 10 케이스):
- 부트스트랩 idempotency
- persona seed-only / seed+user / user-overrides-seed / malformed-user-skip
- workflow 동일 4종
- 빈 user 디렉터리 처리
게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 685 passed (10 신규 포함)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workflow run 페이지를 archive 로 격하시키고, 사용자가 처음 보는 화면을
chat-style 대화 thread 로 전환. Claude Code 의 Web GUI 와 동일한 UX.
핵심 동작:
- 새 페이지 `/conversation.html` 에서 세션을 picker 로 고르거나 "새 대화"
버튼으로 만들고 메시지 입력. Cmd/Ctrl+Enter 로 전송.
- POST /api/sessions/{id}/messages 가 user MessageRow 를 영속한 즉시 200 응답
후 `asyncio.create_task(invoke_session_agent(...))` 로 백그라운드 invoke 발사.
- 백그라운드 task 는 lifespan 에서 1회 열어둔 LangGraph saver 를 재사용하고
agent.ainvoke → assistant MessageRow 영속 → 자동 compaction 까지 처리.
- 기존 SSE 스트림 (`/api/sessions/{id}/stream`) 이 새 메시지를 push,
프론트엔드의 `EventSource` 가 받아 thread 에 렌더.
신규 / 수정 파일:
- `static/conversation.html` (신규): chat UI 마크업. data-page="conversation".
- `static/app.js`: 새 페이지 핸들러 `bootstrapConversationPage` +
세션 picker + 메시지 thread 렌더 + SSE 구독 + Cmd/Ctrl+Enter 단축키.
XSS 정책 동일: 모든 사용자 콘텐츠는 `textContent` 만 사용.
- `static/style.css`: `.messages-thread`, `.msg-bubble`, `.conv-topbar`,
`.conv-input-bar` 등 chat UI 스타일.
- `api/app.py`: lifespan 에서 LangGraph saver 를 1회 열어 `app.state.saver`
에 보관 (Postgres 일 때만).
- `api/agent_runner.py` (신규): `invoke_session_agent(...)` — REPL 의
`InteractiveSession + _invoke_and_stream` 와 동일한 stack 을 HTTP background
context 용으로 재구성. 실패는 로깅 + return.
- `api/routes/sessions.py`: POST /messages 가 background task 발사 + ref 를
`app.state.pending_invocations` set 에 보관 (RUF006 / GC drop 방지).
테스트 (`tests/integration/test_conversation_gui.py`, 4 케이스):
- GET /conversation.html → 200 + 필수 마크업
- POST /messages → 200 + user row 영속 + 스텁 runner 호출 확인
- 백그라운드 task ref 가 `pending_invocations` 에 잡혀있고 완료 후 자동 discard
- 스텁 runner 가 assistant row 영속 → user + assistant 시퀀스 검증
게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 675 passed (4 신규 포함)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claude Code 의 CLAUDE.md 글로벌/프로젝트 레이어링 등가. 세션 시작 시 두
파일을 자동 로드해 시스템 프롬프트에 inject:
- Global: <config.data_dir>/MYDEEPAGENT.md (템플릿 자동 생성, idempotent)
- Project: <repo>/MYDEEPAGENT.md (있을 때만 로드, auto-create 안 함)
순서는 [global → project → MEMORY.md → entry .md] 라서 후순위 파일이
deepagents `MemoryMiddleware`의 "later overrides earlier" 규칙에 따라
더 구체적인 맥락으로 일반 지침을 덮을 수 있음.
데이터·라이브러리:
- `instructions.py` (신규):
- `global_instructions_path(config)`, `project_instructions_path(repo_root)`
- `ensure_global_instructions_initialized(config)` — 글로벌 템플릿 1회 생성.
Korean-default 협업·코드 스타일 가이드 시드. Idempotent (사용자 편집 보존).
- `resolve_instruction_paths(config, repo_root)` — 존재하는 파일만 절대 경로로
글로벌 → 프로젝트 순서 반환.
REPL 통합 (`cli/interactive.py`):
- `InteractiveSession.__init__`에서 `ensure_global_instructions_initialized`
호출.
- `build_agent_if_needed`에서 `[*instructions, *memory]` 순서로
`memory_paths_override` 구성 → deepagents memory= kwarg 까지 전파.
테스트 (`tests/integration/test_instructions.py`, 6 케이스):
- 글로벌 부트스트랩 + idempotency (수동 편집 보존)
- 프로젝트 파일은 auto-create 안 함
- 0/1/2 개 존재 시 `resolve_instruction_paths` 반환 순서 검증
- global path 가 data_dir 아래에 위치
- **integration**: `build_agent`가 결합 리스트를 `create_deep_agent(memory=...)`
로 그대로 전달
게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 671 passed (6 신규 포함)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deepagents 의 langchain-internal `task` tool 과 별개로, my-deepagent 만의
**persisted** session forking 구현. Child 는 자체 `InteractiveSessionRow` 를
가져 `mydeepagent --session <id>` 로 독립 resume / Web GUI 트리 탐색 가능.
부모의 `project_key` 그대로 상속해 memory · skills 디렉터리 공유.
Depth limit = MAX_SUBAGENT_DEPTH = 3.
핵심 동작:
- `spawn_subagent_session(db, parent_session_id, persona, initial_title)` —
단일 트랜잭션 단위로:
(1) 부모 존재·`state == "active"` 확인
(2) `depth = parent.depth + 1`, 초과 시 `MyDeepAgentError(human_required)`
(3) `AgentPersonaRow` upsert (compute_hash 같으면 재사용)
(4) 부모의 `project_key` 상속 + `parent_session_id`, `depth` 세팅
→ 새 `child_id` 반환.
- `list_subagents(db, parent_session_id)` — 직접 자식만 (`started_at` 순),
grandchild 는 caller 가 트리 순회.
데이터·라이브러리:
- `subagents.py` (신규): 위 두 함수 + `MAX_SUBAGENT_DEPTH = 3`.
REPL 통합 (`cli/interactive.py`):
- `_register_subagent_slash`: `/agents` (직접 자식 목록), `/spawn <persona>`
(자식 생성 + resume 안내).
테스트 (`tests/integration/test_subagents.py`, 8 케이스):
- Happy path (project_key 상속, depth=1)
- 같은 부모에 자식 2개 → 둘 다 depth=1
- Depth chain spawn 3 회 후 4번째 거부 (`subagent_depth_exceeded`)
- 존재 안 하는 부모 → `parent_session_missing`
- 부모 state="ended" → `parent_session_ended`
- `list_subagents` direct only (grandchild 제외)
- 자식 없으면 빈 리스트
- 같은 persona hash → 동일 persona_id 재사용
게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 665 passed (8 신규 포함)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claude Code의 auto-memory + `/remember`/`/forget` 슬래시 등가. 사전 검증
false-positive 였던 deepagents `memory=` kwarg 동작을 확정 (실제로
`MemoryMiddleware` 가 sources 리스트를 매 ainvoke 마다 backend 로 download
해서 system prompt 에 `<agent_memory>` 블록 으로 inject).
핵심 동작:
- 세션 시작 시 `<config.data_dir>/projects/<project_key>/memory/` 디렉터리 부트스트랩
+ `MEMORY.md` (index) 자동 생성 (idempotent). `project_key` =
`sha256(realpath(repo_path))[:16]` 라서 같은 repo 는 세션 간 동일 memory.
- 매 agent 재빌드 시 `list_memory_paths(memory_dir)`로 현재 `*.md` 목록을
다시 읽어 deepagents `memory=` kwarg 로 전달. index 파일이 항상 첫 번째 →
ToC 역할.
- `/remember <text>`: `<slug>.md` 파일 생성 + index 에 pointer 한 줄 append +
`clear_agent_cache()` 로 다음 턴에 새 파일 반영.
- `/forget <slug>`: 파일 삭제 + index 라인 prune + cache flush.
- `/memory`: 현재 디렉터리의 entry 목록 표시.
데이터·라이브러리:
- `memory.py` (신규): `project_memory_dir` / `ensure_memory_initialized` /
`list_memory_paths` / `add_memory_entry` (슬러그 충돌 시 `-2`/`-3` suffix) /
`remove_memory_entry` (index 자체는 삭제 거부) / `memory_entries_summary` /
`_slugify`.
- `session.py`: `build_agent(..., memory_paths_override=...)` 신규 kwarg.
`persona.memory_files`와 합쳐 deepagents `memory=` 로 전달 (empty 이면
kwarg 자체 생략). `_resolve_memory_paths` 헬퍼 추출 (C901 회피).
- `cli/interactive.py`: `InteractiveSession` 시그니처에 `project_key: str` 추가.
`_register_memory_slash` 신규.
테스트 (`tests/integration/test_memory.py`, 22 케이스):
- Bootstrap idempotency
- add/remove 정상/실패 (slug 충돌, 없는 항목, index 보호, 빈 입력 거부)
- list 순서 (index 우선), 누락된 디렉터리 처리
- project_key 격리, empty key 거부
- `_slugify` 영문/유니코드 fallback/max_len
- **integration**: `build_agent(..., memory_paths_override=...)`가 실제로
`create_deep_agent(memory=...)` 까지 전달되는지 monkeypatch 로 검증.
Plan §사전검증 #5 false-positive 해소.
게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
--ignore=tests/integration/test_openrouter_smoke.py: 633 passed (22 신규 포함)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both bugs landed during `mydeepagent serve` + real OpenRouter run via
/api/runs. Neither was caught by the test suite — each test uses a fresh
sqlite tmp_path or per-test Postgres DB, so a "second run against existing
data" code path was never exercised.
1. `_compose_final_report` did not persist `RunRow.final_report_path`
- CLI users received the path from the RunResult return value and never
noticed.
- API/GUI users read from the DB → got `null` → no link to the report
showed up in the run detail page.
- Fix: at the end of `_compose_final_report`, open a DB session, load
the RunRow, set `final_report_path = str(json_path)`, commit. Both
code paths now see the path.
2. `_run_approval_gate` built `idempotency_key = f"{phase_key}:{artifact_name}"`
- The 2nd run of the same workflow on a populated DB hit
`approval_requests_idempotency_key_key` UNIQUE violation on the first
approval gate (`spec:spec.json` already existed from the previous run).
- The background task died; the run stayed `executing` forever; the GUI
loop never updated.
- Fix: prefix with `run_id`: `f"{run_id}:{phase_key}:{artifact_name}"`.
Same-run replay (resume / repair retry) still collides idempotently as
intended. ApprovalDecisionRow inherits the new key shape automatically.
Verification
- 4th /api/runs POST against the populated Postgres DB completed in ~2 min,
spec + review + verify all `completed`, 3 artifacts schema-valid, and
`RunRow.final_report_path` now resolves to the report .json path.
- Gates: ruff / mypy --strict / 19 engine+resume+wiring tests PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first cut of static/*.html + style.css was functional but visually
bare. Rewriting with a modern dev-tool dashboard aesthetic (Linear /
Vercel / Resend palette), still vanilla CSS — no framework, no build
system (DR-3 / plan.md D3 constraint kept).
Changes
- `static/style.css`: full rewrite (192 → ~580 lines). Adds:
- CSS custom-property design tokens: surface 0/1/2/3, accent/success/
warning/danger/info each with a matching `*-bg` rgba.
- Type system: Inter / Pretendard / Apple SD Gothic Neo / Noto Sans KR
stack with tabular-nums + system features cv05/ss01.
- 8 px spacing grid, refined border-radius scale (sm/md/lg).
- `.card` surface with subtle inner highlight + low shadow.
- `.badge` pill component with state-* modifiers and an animated dot
for in-progress states (running / executing / validating /
awaiting_artifact).
- `.meta-panel` + `.meta-row` for key/value run detail.
- `.budget-card` with embedded usage bar (ok/warn/over color states).
- `.events` log with monospace, hover background, per-event-type
accent color (run.completed green, run.failed red, etc.) and themed
scrollbar.
- `.chips` row for per-role persona override input.
- Buttons with `primary` / `danger` variants and subtle press animation.
- Compact responsive break at 720 px (single-column meta rows /
form-grid / chips).
- `static/index.html`: page-title row + `.card` wrapper for runs table +
`.budget-grid` for budget cards. Active nav highlight.
- `static/new.html`: form rebuilt inside a card with form-grid layout
(repo path / branch side-by-side), `.chips` rows for per-role override.
- `static/run.html`: page-title with state badge + `.meta-panel` for
Run ID / Repo / Worktree / Final report + action bar + cards for
phases and live events.
- `static/app.js`: redesigned rendering helpers to match new markup:
- New `badge(state)` helper returning a pill element.
- `emptyCell(colspan, text, ctaHref, ctaText)` for empty-state tables.
- Runs list: short hash + arrow link, basename for repo with full path
in `title`, ISO timestamps trimmed to `YYYY-MM-DD HH:MM:SS`.
- Budget cards: usage bar fill % computed from spent/cap, status class
(ok / warn / over) flows to both the amount color and the bar color.
- New event line uses two-column grid (`.ts` + `.body`), event-line
class derived from event type for per-type accent coloring.
- EventSource singleton to prevent stacking on re-renders.
XSS policy unchanged: textContent only, innerHTML/insertAdjacentHTML/
outerHTML still forbidden. The hardcoded comment at the top of `app.js`
is preserved (and the static test that asserts it).
Gates
- ruff check + mypy --strict: PASS (120 source files)
- pytest 16 API tests (read+write+sse+static): all PASS
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "GUI 미존재" gap from the user's first-session requirements
(REPL + workflow + GUI). v0.2 PR #1's Postgres migration made a second
concurrent writer safe; v0.2 PR #2a/#2b wired durable resume; this commit
ships the HTTP + browser surface that uses them.
No auth, no multi-tenant, single uvicorn worker — per DR-3 boundaries.
v0.3+ will add auth, multi-worker fanout, LISTEN/NOTIFY SSE upgrade.
Backend
- `src/my_deepagent/api/`:
- `app.py` create_app() factory. lifespan stores db/config/personas/
workflows on app.state. CORS allow_origin_regex http://localhost(:port)?.
/static mount + /, /{page}.html for the HTML frontend.
- `models.py` — pydantic v2 DTOs (extra="forbid") for every route. Auto
OpenAPI/Swagger via FastAPI's response_model.
- `deps.py` — get_db / get_config / get_personas / get_workflows.
- `runner.py` — start_new_run / start_resume. Pre-allocates run_id via
new `WorkflowEngine.run(pre_allocated_run_id=...)` so the route returns
the id immediately while the engine runs in asyncio.create_task.
- `sse.py` — 0.5 s poll over run_events.seq. Emits ServerSentEvent rows;
sends `event: done` and HTTP-200-closes when run hits terminal.
- `routes/{runs,personas,workflows,budget}.py`:
GET /api/runs (list, ?limit + ?state)
GET /api/runs/{id} (detail + phases + artifacts + events)
POST /api/runs (start; mock-able via runner.start_new_run)
POST /api/runs/{id}/resume
POST /api/runs/{id}/abort
GET /api/runs/{id}/events (SSE; Last-Event-ID header + ?last_event_id)
GET /api/personas
GET /api/workflows
GET /api/budget
CLI
- `cli/serve.py` mydeepagent serve [--host 127.0.0.1] [--port 8000].
Loud stderr warning if --host is not loopback (no auth = footgun).
uvicorn.run(factory=True, workers=1).
- `cli/main.py` serve command registered.
Static frontend (vanilla HTML/JS/CSS, no build system)
- index.html — runs list + budget summary
- new.html — start-run form (workflow select, repo path, requirements,
per-role persona override)
- run.html — run detail + live SSE event log + Resume/Abort buttons
- app.js — fetch + EventSource. XSS policy HARDCODED at file top:
textContent only, innerHTML/insertAdjacentHTML/outerHTML forbidden.
- style.css — dark theme, single file.
Engine
- WorkflowEngine.run(... pre_allocated_run_id: UUID|None = None). None →
uuid4() (existing behavior). Set → use that UUID. Backward compatible.
Tests
- tests/integration/test_api_read.py (5): list empty, get 404, personas
seed count (12), workflows seed (>=3), budget empty.
- tests/integration/test_api_write.py (5): missing template 400, extra
field 422, resume 404, abort 404, mock-runner happy path.
- tests/integration/test_api_sse.py (1): seed terminal run + 3 events,
drain stream, assert types present + stream closes within 3 s.
- tests/integration/test_api_static.py (5): index/new/run HTML 200,
app.js content-type + XSS-policy substring assertion, style.css
content-type.
- All fixtures use httpx ASGITransport + app.router.lifespan_context
(httpx does NOT auto-trigger FastAPI lifespan) + sqlite tmp_path.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (120 source files)
- pytest non-E2E: 603 PASS (12.15 s) — +16 from new API tests
- pytest E2E real OpenRouter on Postgres: PASS 60.44 s (baseline 71–122 s
range; well within DR-3 acceptance threshold ≤+20%)
Manual browser verification deferred to a follow-up (docker compose up,
mydeepagent serve, open http://localhost:8000).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Builds on
v0.2 PR #2a's LangGraph wiring + the existing DB phase-state machine +
sweep_orphan_runs — no Temporal (per DR-3).
Highlights
- `WorkflowEngine.resume(run_id)` (new async method):
- Loads RunRow, rejects terminal states with
MyDeepAgentError("run_already_terminal").
- Reloads worktree_root from `RunRow.worktree_root`, template via
`_reload_template` (WorkflowTemplateRow JOIN + model_validate), and
bindings via `_reload_bindings` (run_bindings ⨝ agent_personas).
- **Does NOT call `bind_personas` again** — locks in the original
binding so consent / persona-pool changes since the original run
don't silently shift role assignment.
- `_execute_run` (extracted shared phase loop): `run()` and `resume()`
both dispatch through it. Skips already-completed phases (emits
`phase.skipped` event) and re-executes the rest.
- 4 new private helpers on WorkflowEngine: `_get_run_or_raise`,
`_reload_template`, `_reload_bindings`, `_get_completed_phase_keys`.
- `RunEventType.RUN_RESUMED` and `PHASE_SKIPPED` are now actually
emitted (the enum members existed already).
- `cli/runs.py _runs_resume_async`: stub → real impl. Validates the run
exists + non-terminal, loads seed personas + artifact schemas from
`docs/schemas/`, constructs WorkflowEngine with an
"abort-on-new-approval" callback (resume should not silently re-prompt
the user — original gates already passed; a new gate means the
workflow has changed). Calls engine.resume(UUID(id)), prints final
state + report. Catches MyDeepAgentError and exits 1 with red error.
Tests
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase mock workflow: phase 1 succeeds, phase 2 fails first time,
row flipped back to executing → resume → phase 2 completes.
Asserts `phase.skipped` event for phase 1, `run.resumed` event,
and exactly 1 mock invocation for phase 2 on resume.
2. Terminal run → `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → `MyDeepAgentError(code="run_metadata_missing")`.
5. Corrupt `workflow_templates.definition` →
`MyDeepAgentError(code="template_load_failed")`.
Mock pattern matches existing test_engine.py: patch
`my_deepagent.engine.build_agent` to return a fake agent that writes
the expected artifact and drives the watcher middleware.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 587 PASS (12.69 s) — +5 from new resume tests
- pytest E2E real OpenRouter on Postgres: PASS 78.52 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for `runs resume` (v0.2 PR #2b). v0.2 PR #1 added
langgraph-checkpoint-postgres as a dependency, but engine.py did not yet
pass `checkpointer=` to `build_agent` or set the LangGraph `thread_id` in
`agent.ainvoke` — meaning resume had no state to restore. This commit
actually wires the dependency.
Highlights
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None`
(default = `config.database_url`).
- `_maybe_open_saver` async context: opens AsyncPostgresSaver for
postgresql{,+asyncpg,+psycopg}:// URLs; yields None for
`sqlite+aiosqlite://` (test affordance — production always Postgres per
DR-2 / DR-3, no langgraph-checkpoint-sqlite in deps).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases. Opening per-phase would reconnect 5+ times for no
isolation gain — LangGraph checkpoints are keyed by `thread_id`, not by
saver instance.
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes
`config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}` to
`agent.ainvoke`. The thread_id format is already used by
`LlmCallRow.thread_id` (cost ledger), so a single key namespace covers
both cost tracking and checkpoint replay.
Tests
- `tests/integration/test_engine_checkpointer_wiring.py` (new, 2 tests):
1. Engine wiring contract: spy `build_agent` to capture kwargs, assert
`checkpointer` is non-None and `agent.ainvoke` receives the expected
`config.configurable.thread_id` in run:<uuid>:phase:<uuid> format.
2. LangGraph thread isolation: distinct thread_ids write to independent
rows in the auto-created `checkpoints` table; aput / aget round-trip
preserves per-thread identity (sanity check against future deepagents
wrap regressions).
- `tests/integration/test_engine.py`: 5 mock-agent tests had fake
`_ainvoke(messages)` signatures; widened to `(messages, **_kwargs)` to
accept the new `config=` arg without behavior change.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 582 PASS (10.55 s) — was 576 before, +7 from new wiring
tests, +/-1 from engine.py reshape, +/-... settled at 582 net.
- pytest E2E real OpenRouter on Postgres: PASS 75.99 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds DR-3 to the v4 r1 plan and rewires §1 + §23 to reflect that the v0.x
release line ships zero Temporal code.
Rationale (DR-3 detail in §22):
- v3 and early v4 r1 drafts had Temporal as the canonical durable-workflow
layer (M5-Py). For 1-user 1-machine CLI/REPL/web-GUI workloads, the same
durability guarantee is reachable with (1) LangGraph AsyncPostgresSaver
(already in deps after v0.2 PR #1) + (2) RunPhaseRow / LlmCallRow state
machine per-commit (already in models) + (3) sweep_orphan_runs at startup
(already in recovery.py).
- Temporal server + worker + deterministic-workflow rules are weight without
proportional payoff at this scale. The decision becomes meaningful only
when v1.0 introduces multi-tenant / multi-machine fanout.
- temporalio NOT added to my-deepagent/pyproject.toml. No apps/worker/.
Patches:
- §1.7 (new): "Workflow Orchestration: NOT USED in v0.x. Deferred to v1.0
multi-tenant ADR (DR-3)." Explains the LangGraph + DB + sweep replacement
path and points at §23 for the v0.2 sequencing.
- §22 DR-3 (new): full decision record with rationale, scope, and the
supersede statement against earlier "M5-Py: Temporal worker NEXT" wording.
- §23 v4 kickoff matrix:
- v0.2 PR #1 row → DONE (e21a524).
- v0.2 PR #2a (new): LangGraph AsyncPostgresSaver engine wiring.
- v0.2 PR #2b (new): `mydeepagent runs resume <id>` real implementation.
- v0.2 PR #3 (new): FastAPI + SSE + minimal Web GUI.
- M5-Py → DEFERRED to v1.0+ per DR-3.
- M8-Py → absorbed into v0.2 PR #3 (no separate apps/api dir; FastAPI
lives inside my-deepagent/src/my_deepagent/api/).
Open question (recorded in DR-3): v1.0 ADR will compare Temporal vs Hatchet
vs in-house Postgres-based workflow runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the production backing store from SQLite to PostgreSQL 16, per DR-2.
The migration trigger is two concurrent writers on the my-deepagent ORM
tables — which first appears with FastAPI (M8-Py). Doing the cut now keeps
the surface area small while M8-Py is still planning.
Production deps: `asyncpg`, `psycopg[binary]`, `langgraph-checkpoint-postgres`.
Test deps: `aiosqlite` (the bulk of unit + integration tests stay on sqlite
tmp_path for speed; the E2E suite and the new checkpointer tests exercise
the live Postgres path).
Highlights
- `persistence/db.py`: dialect-aware connect listener. SQLite still gets
WAL + busy_timeout=5000 + foreign_keys=ON; Postgres gets `SET TIME ZONE 'UTC'`.
Added `Database.dialect_name` + `drop_schema` (test-only).
- `persistence/checkpointer.py`: SqliteSaver → AsyncPostgresSaver. API is
now async (`async with`) and takes a connection string. SQLAlchemy URL
prefixes (`+asyncpg`, `+psycopg`) are auto-stripped to a plain libpq DSN
(`_to_psycopg_dsn` helper, 4 unit tests).
- `persistence/upsert.py` (new): `insert_for(session)` — dialect-aware UPSERT
helper. Picks `postgresql.insert` or `sqlite.insert` based on the bound
engine. Replaces 5 hardcoded `sqlite_insert` call sites in `budget.py`,
`recovery.py`, `cli/doctor.py`.
- `persistence/models.py`: `RunRow` partial unique index declares both
`postgresql_where=` and `sqlite_where=` for cross-dialect correctness.
- `config.py`: default `database_url` now
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. v3
`devflow` DB preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `cli/doctor.py` check 8: dialect-aware DB liveness probe. Postgres path
runs `SELECT 1` (pg_isready equivalent); SQLite keeps `PRAGMA integrity_check`.
- `alembic/env.py`: env-aware URL resolution (`MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > default). Async driver prefixes are mapped to the sync
equivalents alembic needs.
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite migrations
(`79945fdc2649`, `839f2233e346`) deleted — v0.2 starts a clean history.
- `tests/conftest.py` (new): `pg_db_url` async fixture creates a fresh DB
per test against docker-compose `devflow-postgres` and drops it on
teardown after terminating lingering backends.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter unit tests + 3 async context-manager integration tests).
- `tests/integration/test_e2e_workflow.py`: switched to `pg_db_url`. Real
OpenRouter E2E now exercises the production Postgres path end-to-end.
Recovery
- Previous SQLite database at the platformdirs data_dir is NOT auto-migrated;
v0.1.0 was the only release that wrote to it. Set
`MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to read it.
- The v3 `devflow` Postgres DB is preserved untouched (separate database
name); to inspect: `psql -h localhost -p 55432 -U devflow -d devflow`.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s)
- pytest E2E real OpenRouter on Postgres: 1 PASS (122.93 s, ~$0.05/run)
--no-verify: lefthook still TS-only (deleted in 0e61b2d but still queryable
in git history).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier v4 r1 wording implied Postgres would re-enter "with Temporal." That
was a false equivalence: Temporal worker (M5-Py) runs against its own
backing store (`temporal` namespace) and does not touch `my-deepagent`'s
`runs` / `run_phases` / `llm_calls` ORM tables, so M5-Py does not force a
DB migration. The actual trigger for Postgres is a *second concurrent
writer* on the my-deepagent DB, which first appears with FastAPI in M8-Py
(and the later web GUI). SQLite WAL allows only one concurrent writer.
Changes:
- §1.3 Database: replaced "Postgres parked indefinitely" with explicit
migration-trigger table (CLI=1 writer → SQLite; Temporal worker=still 1
writer → SQLite; FastAPI=2 writers → Postgres required). Sequencing:
v0.2 PR #1 (Postgres baseline regen) lands ahead of M8-Py for a clean cut.
- §22 Decision Log: added DR-2 documenting this correction.
- §23 Kickoff Order: inserted "v0.2 PR #1 — Postgres migration" between
Step-0-purge and M5-Py; annotated M5-Py and M8-Py with their DB
implications.
Also clarifies that `temporalio` is listed in plan-v4-draft.md but is not
yet pulled into `my-deepagent/pyproject.toml`; install happens with M5-Py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>