Files
dev-puppeteer/my-deepagent/CHANGELOG.md
chungyeong 5e9656e8a3 feat(my-deepagent): v0.3 PR #6 — sub-agent session linkage (/agents, /spawn)
deepagents 의 langchain-internal `task` tool 과 별개로, my-deepagent 만의
**persisted** session forking 구현.  Child 는 자체 `InteractiveSessionRow` 를
가져 `mydeepagent --session <id>` 로 독립 resume / Web GUI 트리 탐색 가능.
부모의 `project_key` 그대로 상속해 memory · skills 디렉터리 공유.
Depth limit = MAX_SUBAGENT_DEPTH = 3.

핵심 동작:
- `spawn_subagent_session(db, parent_session_id, persona, initial_title)` —
  단일 트랜잭션 단위로:
  (1) 부모 존재·`state == "active"` 확인
  (2) `depth = parent.depth + 1`, 초과 시 `MyDeepAgentError(human_required)`
  (3) `AgentPersonaRow` upsert (compute_hash 같으면 재사용)
  (4) 부모의 `project_key` 상속 + `parent_session_id`, `depth` 세팅
  → 새 `child_id` 반환.
- `list_subagents(db, parent_session_id)` — 직접 자식만 (`started_at` 순),
  grandchild 는 caller 가 트리 순회.

데이터·라이브러리:
- `subagents.py` (신규): 위 두 함수 + `MAX_SUBAGENT_DEPTH = 3`.

REPL 통합 (`cli/interactive.py`):
- `_register_subagent_slash`: `/agents` (직접 자식 목록), `/spawn <persona>`
  (자식 생성 + resume 안내).

테스트 (`tests/integration/test_subagents.py`, 8 케이스):
- Happy path (project_key 상속, depth=1)
- 같은 부모에 자식 2개 → 둘 다 depth=1
- Depth chain spawn 3 회 후 4번째 거부 (`subagent_depth_exceeded`)
- 존재 안 하는 부모 → `parent_session_missing`
- 부모 state="ended" → `parent_session_ended`
- `list_subagents` direct only (grandchild 제외)
- 자식 없으면 빈 리스트
- 같은 persona hash → 동일 persona_id 재사용

게이트:
- ruff check / format --check / mypy: PASS
- pytest -q --ignore=tests/integration/test_e2e_workflow.py
  --ignore=tests/integration/test_openrouter_smoke.py: 665 passed (8 신규 포함)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:52:00 +09:00

587 lines
40 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
## [Unreleased]
### Added
- **v0.3 PR #6 — Sub-agent session linkage (`/agents` / `/spawn <persona>`)**.
Claude Code의 sub-agent (task tool) 와 별개로, my-deepagent 만의 **persisted**
session forking. 부모 session 의 thread 컨텍스트에 langchain-internal 로
spawn 되는 deepagents `task` 도구와 달리, 여기서 만든 child 는 자체
`InteractiveSessionRow` 를 가지고 `mydeepagent --session <id>` 로 별도
resume / Web GUI 트리 탐색이 가능. 부모의 `project_key` 를 그대로 상속해
memory · skills 디렉터리 공유. depth limit = `MAX_SUBAGENT_DEPTH = 3`.
- `subagents.py` (신규):
- `spawn_subagent_session(db, parent_session_id, persona, initial_title)`
트랜잭션 단일 단위:
(1) 부모 존재·`state == "active"` 확인
(2) `depth = parent.depth + 1`, 초과 시 `MyDeepAgentError(human_required,
"subagent_depth_exceeded")`
(3) `AgentPersonaRow` upsert (`compute_hash` 같으면 재사용)
(4) 부모의 `project_key` 그대로 상속 + `parent_session_id`, `depth` 세팅
→ 새 `child_id` 반환.
- `list_subagents(db, parent_session_id)` — 직접 자식만 (`started_at` 순)
반환. grandchild 는 포함 안 함 (caller 가 트리 순회).
- `cli/interactive.py`:
- `_register_subagent_slash`: `/agents` (직접 자식 목록), `/spawn <persona>`
(자식 생성 + resume 안내 메시지) 등록.
- `tests/integration/test_subagents.py` (신규, 8 케이스):
- Happy path: 자식 row 생성 + `parent_session_id`/`depth=1`/`project_key`
상속 검증
- 같은 부모에 자식 2개 → 둘 다 depth=1
- Depth chain spawn 3 회 → 4번째에서 거부 (`subagent_depth_exceeded`)
- 존재 안 하는 부모 → `parent_session_missing`
- 부모 state="ended" → `parent_session_ended`
- `list_subagents`: direct only, no grandchild
- 빈 부모 → 빈 리스트
- 같은 persona hash → 동일 `persona_id` 재사용
### Added
- **v0.3 PR #5 — Plan mode (`/plan` / `/approve` / `/reject`)**. Claude Code의
plan mode 등가. `/plan` 진입 시 `write_file` / `edit_file` / `execute` /
`bash` / `task` (sub-agent) 도구가 차단되고 `read_file` / `glob` / `grep` /
`ls` / `write_todos`만 허용. LLM 은 차단된 도구를 호출하면 `ToolMessage(
status="error")` 를 받고 자체적으로 계획만 다듬도록 유도. `/approve`
쓰기 허용, `/reject` 시 thread 리셋 + 쓰기 허용.
- `middleware/plan_mode.py` (신규):
- `PlanModeMiddleware(is_active: Callable[[], bool])``awrap_tool_call` /
`wrap_tool_call` 에서 plan_mode 활성 + 차단 도구면 synthetic
`ToolMessage(status="error", content=...)` 반환. raise 하지 않음
(LLM이 무한 루프 없이 다른 도구로 전환할 수 있도록).
- `BLOCKED_TOOLS_IN_PLAN_MODE` 상수: write_file / edit_file / bash /
execute / run_command / shell / task. read_file·write_todos 등 안전한
도구는 화이트리스트.
- `cli/interactive.py`:
- `InteractiveSession._plan_mode: bool`. `set_plan_mode(enabled)` async →
flag 토글 + thread_suffix bump + `InteractiveSessionRow.plan_mode` 영속
(PR #1에서 이미 컬럼 추가했음). resume 시 row.plan_mode 로 복원.
- `build_agent_if_needed`에서 `PlanModeMiddleware(is_active=lambda: ...)`
를 middleware 리스트 첫 자리에 삽입 — closure 가 self._plan_mode 를 읽으니
슬래시 토글 후 agent 재빌드 필요 없음.
- `_register_plan_mode_slash`: `/plan`, `/approve`, `/reject` 등록.
- `tests/integration/test_plan_mode.py` (신규, 9 케이스):
- inactive → 모든 도구 패스스루
- active → write_file / execute / task 차단 (status=error, tool_call_id
유지, 메시지에 도구명 + "Plan-mode" 포함)
- active → read_file / glob / grep / ls / write_todos 허용
- closure 토글로 동작 변경 (rebuild 없이)
- 동기 wrap_tool_call 도 동일 동작
- BLOCKED_TOOLS_IN_PLAN_MODE 상수 sanity
### Added
- **v0.3 PR #4 — Agent Skills (LLM-routing, no embeddings)**. Anthropic Agent
Skills 명세를 그대로 따르는 progressive-disclosure 패턴. deepagents
`SkillsMiddleware`가 디렉터리를 스캔해 `(name, description)` 인덱스만
시스템 프롬프트에 인젝션하면 LLM이 필요한 skill을 골라 `read_file`로 본문을
읽음. 임베딩·벡터 검색 없음 — Claude Code의 실제 동작과 동일.
- `skills.py` (신규):
- `user_skills_dir(config)``<config.data_dir>/skills/`
- `ensure_skills_initialized(dir)` — 디렉터리 생성, idempotent. 예제
skill 시드하지 않음 (빈 디렉터리가 정상 신규 상태).
- `list_installed_skills(dir)``<name>/SKILL.md`를 스캔해 frontmatter
파싱. malformed (frontmatter 없음/YAML 깨짐/name-dir mismatch/10MB 초과)
는 silently skip. `SkillInfo(name, description, path)` 리스트.
- `read_skill_body(dir, name)``/skill <name>`의 본문 표시용.
- `resolve_skill_sources(config)` — deepagents 에 전달할 source 리스트
빌드 (현재는 user-scope 1개; 후속 PR이 project-scope 추가 가능).
- `session.py`:
- `build_agent(..., skills_sources_override: list[str] | None = None)`
신규 kwarg. `persona.skills`와 합쳐 deepagents `skills=` kwarg로 전달
(empty 면 kwarg 생략 → `SkillsMiddleware` 미생성).
- `_resolve_skill_sources` 헬퍼 추출.
- `cli/interactive.py`:
- `InteractiveSession.__init__`에서 `user_skills_dir` 부트스트랩 후
`self.skills_dir`로 보관.
- `build_agent_if_needed`가 매 재빌드 시 `resolve_skill_sources(config)`
현재 디렉터리 상태를 전달.
- `_register_skills_slash`: `/skills` (설치된 skill 목록), `/skill <name>`
(전체 SKILL.md 본문 표시) 슬래시 등록.
- `tests/integration/test_skills.py` (신규, 15 케이스):
- Bootstrap idempotency, 빈 디렉터리 기본 상태
- list: 정렬, SKILL.md 누락 스킵, YAML 깨짐 스킵, name-dir mismatch 스킵,
description 200자 트렁케이트, 누락된 디렉터리는 빈 리스트
- read_skill_body: 정상/누락/빈 이름
- resolve_skill_sources: user-scope 1개 반환
- **integration**: `build_agent(..., skills_sources_override=[...])`
실제로 `create_deep_agent(skills=...)` 까지 전달되는지 monkeypatch 검증
- **v0.3 PR #3 — auto-memory (project-scoped `MEMORY.md` + entry files)**.
Claude Code의 auto-memory + `/remember`/`/forget` 슬래시 등가. 세션이 시작될
`<config.data_dir>/projects/<project_key>/memory/` 디렉터리를 부트스트랩
(idempotent) 하고, 그 안의 모든 `*.md` 파일을 deepagents `memory=` kwarg로
전달. 같은 repo 경로(= 같은 `project_key`)는 세션 간 동일 memory를 본다.
- `memory.py` (신규):
- `project_memory_dir(config, project_key)``<data_dir>/projects/<key>/memory`
- `ensure_memory_initialized(dir)``MEMORY.md` (index) 생성, idempotent
- `list_memory_paths(dir)` — 모든 `*.md` 정렬, index 파일 맨 앞 배치
(deepagents가 순서대로 concat하므로 index가 시스템 프롬프트의 ToC 역할)
- `add_memory_entry(dir, content, name=...)``<slug>.md` 작성 + index에
pointer 한 줄 append. 슬러그 충돌 시 `-2`, `-3` suffix. 빈 콘텐츠 거부.
- `remove_memory_entry(dir, slug_or_filename)` — 파일 삭제 + index 라인 prune.
`MEMORY.md` 자체는 삭제 거부.
- `memory_entries_summary(dir)``[(name, char_count), ...]` index 제외.
- `session.py`:
- `build_agent(..., memory_paths_override: list[str] | None = None)` 신규
kwarg. `persona.memory_files`와 합쳐 deepagents `memory=` kwarg로 전달
(없으면 kwarg 자체를 생략 → `MemoryMiddleware` 미생성).
- 복잡도 제어를 위해 `_resolve_memory_paths` 헬퍼 추출 (C901 회피).
- `cli/interactive.py`:
- `InteractiveSession(...)` 시그니처에 `project_key: str` 추가, `__init__`
에서 `project_memory_dir(...)` + `ensure_memory_initialized(...)` 호출.
- `build_agent_if_needed`가 매 재빌드 시 `list_memory_paths(memory_dir)`
현재 디렉터리 상태를 다시 읽어 deepagents에 전달. `/remember`/`/forget`
`clear_agent_cache()`를 호출하면 다음 턴에 새 파일 목록 반영.
- `_register_memory_slash`: `/remember <text>`, `/forget <slug>`, `/memory`
슬래시 등록. `/memory`는 현재 저장된 항목 목록 표시.
- `tests/integration/test_memory.py` (신규, 22 케이스):
- Bootstrap idempotency, index 자동 생성
- add_memory_entry: 파일·index 동시 작성, 충돌 처리, 빈 입력 거부, name override
- remove_memory_entry: slug/filename 매칭, 없는 항목, index 자체 보호
- list_memory_paths: index 우선, 누락된 디렉터리는 빈 리스트
- memory_entries_summary: index 제외, 누락된 디렉터리
- project_memory_dir: project_key 격리, empty key 거부
- _slugify: 영문, 유니코드 fallback, max_len 잘라내기
- **integration**: `build_agent(..., memory_paths_override=[...])`가 실제로
`create_deep_agent(memory=...)` 까지 전달되는지 monkeypatch로 검증
(false-positive 였던 plan §사전검증 #5 해소)
### Added
- **v0.3 PR #2 — context compaction (auto + manual `/compact`)**.
Claude Code의 auto-compact + `/compact` 슬래시 등가. 세션 누적 토큰이
활성 모델 윈도우의 70%를 넘으면 자동으로 가장 오래된 비-system, 비-archived
메시지를 cheap 모델(`openrouter:deepseek/deepseek-chat` 기본)로 1회 요약 →
`MessageRow(is_summary=True, role=system)` 1줄 삽입 + 원본은 archive.
LangGraph thread는 `thread_suffix` bump로 새 컨텍스트 시작 (재인입 비용 회피).
- `monitoring/token_budget.py` (신규): `tiktoken cl100k_base`로 추정.
`MODEL_CONTEXT_LIMITS` 모델별 윈도우 (DeepSeek 64k, Claude Sonnet/Haiku/Opus
200k, GPT-4o 128k). 미등록 모델은 32k 기본값 — 보수적으로 compaction
조기 트리거. `COMPACTION_THRESHOLD = 0.7` 상수. `count_tokens()`는 빈
문자열·예외 모두 안전 (실패 시 char/4 fallback).
- `compaction.py` (신규): `should_compact()` / `compact_session()` /
`CompactionResult`. `_SESSION_LOCKS: dict[str, asyncio.Lock]`
세션별 직렬화 — 동시 compaction 호출 시 두 번째는 첫 번째 종료를 기다림.
`KEEP_RECENT_K = 10`, `MIN_COMPACTABLE = 4`. LLM 호출은 DB session
바깥에서 (asyncpg connection 점유 회피). archived rows는 negative seq
band (`-(original.seq + 1)`)로 옮겨 summary가 `to_compact[0].seq`
자리에 자연스럽게 들어감 (UNIQUE constraint 충돌 회피).
- `cli/interactive.py`:
- `_approx_token_count`를 tiktoken-based로 교체 (이전: 단순 `len // 4`).
- 매 ainvoke 후 `should_compact(session_row)` 체크 → 임계 초과 시 자동
`compact_session()` 호출 → 성공 시 `clear_agent_cache()`로 thread bump.
한 줄 stdout 알림 (`context compacted — N messages archived, summary K tokens`).
- `/compact` 슬래시 등록 (`_register_compaction_slash`). 수동 강제 compaction.
충분한 메시지가 없으면 (`< MIN_COMPACTABLE`) 사유 출력.
- `tests/integration/test_compaction.py` (신규, 7 케이스):
1. `should_compact` 70% 임계 아래/위/미등록 모델 분기 (3개)
2. `MIN_COMPACTABLE` 미만이면 LLM 호출 없이 거부 (stub-call 카운트 검증)
3. Happy path: 14개 메시지 → 4개 archive(negative seq) + summary at seq=1 +
10개 live 유지 + 토큰 카운터 산술 (1000 - 4*20 + summary_tokens) 검증
4. 동일 `session_id`에 동시 호출 2개 → Lock 직렬화 (LLM 호출 윈도우 겹침
없음 또는 두 번째 short-circuit) 검증
5. 없는 `session_id``session_not_found`
- `pyproject.toml`: `tiktoken>=0.7` 명시 (이전엔 langchain-openai 경유
transitive였음 — 직접 의존 표시).
- **v0.3 PR #1 — interactive session persistence + LangGraph saver wiring**.
v0.3의 토대. REPL/GUI 모두 장기 대화 영속 가능하도록 데이터 모델·CLI·HTTP
API를 함께 도입. Claude Code의 `claude --resume` 등가.
- `persistence/models.py`:
- 신규 `MessageRow` 테이블 — `(session_id, seq)` UNIQUE, role/content/
tool_calls/token_count/is_summary/archived/ts. LangGraph checkpoint가
source of truth이고 이 테이블은 GUI/CLI 빠른 조회 mirror (divergence
rebuild 가정 없음).
- `InteractiveSessionRow`에 컬럼 8개 추가: `total_input_tokens`,
`total_output_tokens`, `model`, `project_key`, `title`, `plan_mode`
(PR #5용), `parent_session_id` + `depth` (PR #6용, self FK CASCADE).
- `alembic/versions/684e70f4536a_v0_3_pr_1_session_messages_8_columns.py`
(신규): `op.batch_alter_table` 사용 — SQLite ALTER constraint 미지원을
우회. 자동생성이 제안한 LangGraph 자체 테이블 (`checkpoints` /
`checkpoint_writes` / `checkpoint_blobs` / `checkpoint_migrations`)
drop 라인은 의도적으로 제거 (langgraph-checkpoint-postgres가 자체 관리).
`server_default` 박아서 기존 row가 NULL/0/false로 안전하게 채워짐.
- `cli/interactive.py`:
- REPL 진입 시 `get_checkpointer_ctx(config.database_url)` 컨텍스트 열고
REPL 전체 수명 동안 유지. `build_agent(..., checkpointer=saver)`
deepagents에 LangGraph saver wire. v0.2 PR #10에서 추가됐던
`CostMiddleware` / `AuditToolMiddleware` 보존.
- `_invoke_and_stream`이 ainvoke 전후로 `MessageRow` 명시적 insert
(`role=user` → ainvoke → `role=assistant`). `last_message_at` +
`total_*_tokens` 누적 + 첫 user 메시지로 `title` 자동 setter (50자
truncate).
- `InteractiveSession` 클래스에 `thread_suffix` 도입. `/model` / `/agent`
/ `/clear` 호출 시 suffix bump → LangGraph thread_id = `{session_id}:{suffix}`
로 새 deepagents 컨텍스트 시작 (compaction과 같은 패턴, PR #2에서 재사용
예정).
- 신규 `--session <id|prefix>` 옵션 처리: 기존 row 로드 (`state == "ended"`이면
거부) 또는 신규 row insert (`AgentPersonaRow` upsert + `project_key` =
`sha256(realpath(repo_path))[:16]`).
- `/clear` 슬래시 갱신: 현재 세션의 모든 `MessageRow.archived=True` + 새
thread 시작. 세션 자체는 살아있음 (`sessions show <id> --all`로 조회
가능).
- `cli/sessions.py` (신규): `mydeepagent sessions list/show/resume/end`.
`show <id> [--all]`이 archived 메시지까지 표시. 6+ char prefix 매칭 +
중복 시 명시적 에러.
- `cli/main.py`: `--session` 옵션 + `sessions` 서브명령 + `interactive_command`
시그니처 확장.
- `api/models.py`: `SessionSummary` / `MessageInfo` / `SessionDetail` /
`CreateSessionRequest` / `PostMessageRequest` / `SessionAck` DTO 신규
(모두 `extra="forbid"`).
- `api/routes/sessions.py` (신규):
- `GET /api/sessions?limit=&state=` — list
- `GET /api/sessions/{id}?all=true` — detail + 마지막 200 메시지
- `POST /api/sessions` — 신규 세션 생성 (persona_name / model_override /
repo_path)
- `POST /api/sessions/{id}/messages` — 사용자 메시지 append (v0.3 PR #1
범위에선 동기 persist만; PR #7 Web GUI에서 background ainvoke 추가
예정)
- `GET /api/sessions/{id}/stream` — SSE. 0.5s polling, `last-event-id`
헤더 + `?last_seq=` 둘 다 지원. 종결 시 `event: done` 보내고 close.
- `POST /api/sessions/{id}/end` — 세션 종결 마킹.
- `api/app.py`: sessions 라우터 마운트 (`/api/sessions`).
- `tests/integration/test_session_persist.py` (신규, 5 케이스): create +
post + persist / list 멤버십 / prefix resolution + 404 / end 후 메시지
거부 / archived 메시지 ?all=true로 surfacing.
- 회귀: ruff/mypy --strict / pytest 608 PASS / E2E real OpenRouter on
Postgres 82.07s (베이스라인 60122s 범위 내).
### Fixed
- **bugfix(engine): two production bugs surfaced by manual Web-GUI verification
(`mydeepagent serve` + real OpenRouter run via /api/runs)**.
1. `_compose_final_report` wrote the report files to disk and returned the
path but **did not update `RunRow.final_report_path`**. CLI users
received the path via the `RunResult` return value and never noticed;
API/GUI users read from the DB and got `null`. Fix: after writing the
JSON / MD report, update the RunRow column in a fresh session inside
`_compose_final_report` itself so both consumers see the path.
2. `_run_approval_gate` built `idempotency_key = f"{phase_key}:{artifact_name}"`
for `approval_requests`. The column has a UNIQUE constraint, so the
**second run** of the same workflow against a non-empty DB raised
`asyncpg.UniqueViolationError` on the first approval gate — the
background task died, the run stayed `executing` forever, the GUI
never updated. Confirmed by reading the server log after the GUI got
stuck on a 2nd run. Fix: prefix with `run_id`
(`f"{run_id}:{phase_key}:{artifact_name}"`) so each run has its own
key namespace; same-run replay still collides idempotently as intended.
E2E + integration tests did not catch this because each test uses a
fresh sqlite tmp_path or per-test Postgres DB — no second run ever
hit the same table.
### Added
- **v0.2 PR #3 — FastAPI + SSE + minimal Web GUI (`mydeepagent serve`)**.
Localhost Web UI for run start / list / detail / resume / abort + live
event stream. Closes the v0.1.0 gap "GUI 미존재" from the user's first
session requirements. No auth, no multi-tenant; single uvicorn worker
(per DR-3).
- `pyproject.toml`: runtime deps `fastapi>=0.115`,
`uvicorn[standard]>=0.30`, `sse-starlette>=2.1` (8 transitive deps).
- `src/my_deepagent/api/` (new tree):
- `app.py``create_app(config=None) -> FastAPI` factory. lifespan
stores `db`/`config`/`personas`/`workflows` on `app.state`.
`CORSMiddleware(allow_origin_regex=r"^http://localhost(:\d+)?$")`.
Static frontend mounted under `/static`, plus `/`, `/{page}.html`.
- `models.py` — pydantic v2 DTOs (`RunSummary`, `RunDetail`,
`PhaseInfo`, `ArtifactInfo`, `EventInfo`, `StartRunRequest`,
`StartRunResponse`, `PersonaSummary`, `WorkflowSummary`,
`BudgetSummary`, `BudgetScopeEntry`). All `extra="forbid"` so typos
surface at 422 deserialization time.
- `deps.py``get_db`, `get_config`, `get_personas`, `get_workflows`,
`seed_root`. Annotated[...] wrappers in each route module.
- `runner.py``start_new_run` / `start_resume` /
`is_running`. Pre-allocates a UUID and passes it to
`WorkflowEngine.run(pre_allocated_run_id=...)` so the route can
return the run_id before the phase loop starts. In-memory
`_tasks: dict[UUID, asyncio.Task]` prevents GC of in-flight tasks.
- `sse.py``run_events_stream(db, run_id, last_event_id)`.
0.5 s polling against `run_events.seq > last_event_id`; emits
`ServerSentEvent` per row; sends `event: done` and HTTP-200-closes
when run reaches terminal state.
- `routes/runs.py` — GET `/api/runs?limit=&state=`, GET `/api/runs/{id}`,
POST `/api/runs` (start), POST `/api/runs/{id}/resume`,
POST `/api/runs/{id}/abort`, GET `/api/runs/{id}/events` (SSE).
`Last-Event-ID` HTTP header honored alongside `?last_event_id=`.
- `routes/personas.py` — GET `/api/personas`.
- `routes/workflows.py` — GET `/api/workflows`.
- `routes/budget.py` — GET `/api/budget` (day / runs / personas
buckets with cap + warn thresholds from `Config`).
- `src/my_deepagent/cli/serve.py` (new) — `mydeepagent serve [--host
127.0.0.1] [--port 8000]`. Loud stderr warning when host is not
loopback (the API is unauthenticated). Uses uvicorn factory form +
forces `workers=1`.
- `src/my_deepagent/cli/main.py` — `serve` command registered.
- `src/my_deepagent/engine.py` — `WorkflowEngine.run` gained
`pre_allocated_run_id: UUID | None = None` so the FastAPI runner can
return the run_id immediately. Default behavior unchanged.
- `static/` (new) — vanilla HTML/JS/CSS, no build system:
- `index.html` — 런 목록 + 예산 (data-page="index")
- `new.html` — 신규 run 폼 (workflow select, repo path, requirements,
per-role persona override) (data-page="new")
- `run.html` — run 상세 + SSE 이벤트 라이브 + resume/abort 버튼
(data-page="run")
- `app.js` — fetch + EventSource. **XSS policy hardcoded at the top
of the file**: `element.textContent` only, `innerHTML` /
`insertAdjacentHTML` / `outerHTML` forbidden.
- `style.css` — dark theme, single file.
- Tests (new):
- `tests/integration/test_api_read.py` — 5 cases (list empty, get 404,
personas seed count, workflows seed, budget empty).
- `tests/integration/test_api_write.py` — 5 cases (missing template
400, extra field 422, resume 404, abort 404, mock-runner happy path).
- `tests/integration/test_api_sse.py` — 1 case: seed terminal run +
events, drain stream, assert types present and stream closes.
- `tests/integration/test_api_static.py` — 5 cases: index/new/run
HTML 200, app.js content-type + XSS-policy substring, style.css
content-type.
All tests use `httpx.ASGITransport` + `app.router.lifespan_context`
(httpx does not auto-trigger FastAPI lifespan) and sqlite tmp_path.
- **v0.2 PR #2b — `mydeepagent runs resume <id>` real implementation**.
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Reuses
v0.2 PR #2a's LangGraph wiring + sweep_orphan_runs's DB state machine,
no Temporal (DR-3).
- `src/my_deepagent/engine.py`:
- New `WorkflowEngine.resume(run_id)` async method. Loads `RunRow`,
rejects terminal states with `MyDeepAgentError.human_required("run_already_terminal")`,
reloads `worktree_root` + `WorkflowTemplate` (via `_reload_template`) +
bindings (via `_reload_bindings`) from DB. Does **not** call
`bind_personas` again — locks in the original binding so consent /
pool changes don't silently shift roles.
- New `_execute_run` helper (shared phase loop) extracted from `run()`.
Skips already-`completed` phases (emits `phase.skipped` event) and
re-executes the rest. Both `run` (new) and `resume` dispatch through
it.
- New helpers: `_get_run_or_raise`, `_reload_template`,
`_reload_bindings` (rebuilds `{role_id: Binding}` from
`run_bindings` ⨝ `agent_personas`; corrupt persona rows are logged
and skipped, surfacing as `run_metadata_missing` if no bindings remain),
`_get_completed_phase_keys`.
- New `RunEventType.RUN_RESUMED` and `RunEventType.PHASE_SKIPPED` are
now actually emitted (the enum members existed already from v0.1.0).
- `src/my_deepagent/cli/runs.py` `_runs_resume_async`: stub → real impl.
Validates run exists + non-terminal, loads seed personas + artifact
schemas (`docs/schemas/`), constructs `WorkflowEngine` with a
"abort-on-new-approval" callback, calls `engine.resume(UUID(id))`,
prints final state + report path. Catches `MyDeepAgentError` and prints
a red error with exit 1.
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase workflow: phase 1 succeeds, phase 2 fails → flip run row
back to executing → resume → phase 2 completes; assert phase 1 was
skipped (`phase.skipped` event present) and `run.resumed` event emitted.
2. Terminal run → `resume()` raises `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → raises `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → raises `MyDeepAgentError(code="run_metadata_missing")`.
5. workflow_templates.definition is malformed → raises `MyDeepAgentError(code="template_load_failed")`.
- E2E real OpenRouter regression PASS 78.52 s (baseline 71122 s);
within DR-3 acceptance threshold (+20%).
- **v0.2 PR #2a — LangGraph `AsyncPostgresSaver` engine wiring** (foundation
for `runs resume`). v0.2 PR #1 added the dependency; this commit actually
uses it.
- `src/my_deepagent/engine.py`:
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None` (defaults
to `config.database_url`).
- New `_maybe_open_saver` async context: opens `get_checkpointer_ctx` for
`postgresql{,+asyncpg,+psycopg}://` URLs, yields `None` for `sqlite+aiosqlite://`
(test affordance — production always Postgres per DR-2 / DR-3).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases via `self._saver` — opening per-phase would re-connect
5+ times for no isolation gain (checkpoints are keyed by `thread_id`, not
saver instance).
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes `config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}`
to `agent.ainvoke`. Same `thread_id` format already used by
`LlmCallRow.thread_id` (cost ledger), so one key namespace covers both.
- `tests/integration/test_engine_checkpointer_wiring.py` (new):
1. **Contract 1 — engine wiring**: `build_agent` receives a non-None saver;
`agent.ainvoke` receives `config.configurable.thread_id` in the
expected `run:<uuid>:phase:<uuid>` format.
2. **Contract 2 — LangGraph thread isolation**: two distinct `thread_id`s
write independent rows in the auto-created `checkpoints` table; aput /
aget round-trip preserves per-thread identity (sanity check against
future deepagents wrap regressions).
- `tests/integration/test_engine.py` — 5 mock-agent tests: fake `_ainvoke`
signature widened with `**_kwargs` to accept the new `config=` arg.
- E2E real OpenRouter regression PASS 75.99 s (baseline 71122 s); within
DR-3 acceptance threshold (+20%).
- **v0.2 PR #1 — Postgres migration**: production backing store switched from
SQLite to PostgreSQL 16 ahead of M8-Py (FastAPI) per DR-2.
- `pyproject.toml`: `asyncpg>=0.30` + `psycopg[binary]>=3.2` +
`langgraph-checkpoint-postgres>=2.0.0` added to runtime; `aiosqlite>=0.20`
moved to `[dependency-groups].dev` (test-only); `langgraph-checkpoint-sqlite`
removed.
- `src/my_deepagent/persistence/db.py`: dialect-aware connect listener —
SQLite still gets `WAL` + `busy_timeout=5000` + `foreign_keys=ON`, Postgres
gets `SET TIME ZONE 'UTC'`. New `Database.dialect_name` property + `drop_schema`
method for tests.
- `src/my_deepagent/persistence/checkpointer.py`: `SqliteSaver` →
`AsyncPostgresSaver`. API is now async (`async with`) and takes a
connection string; SQLAlchemy URL prefixes (`postgresql+asyncpg://`,
`postgresql+psycopg://`) are auto-stripped to a plain libpq DSN. New
`_to_psycopg_dsn` helper covered by 4 unit tests.
- `src/my_deepagent/persistence/upsert.py` (new): `insert_for(session)` —
dialect-aware UPSERT helper. Picks `postgresql.insert` or `sqlite.insert`
based on the bound engine's dialect. Replaces 5 hardcoded `sqlite_insert`
call sites in `budget.py`, `recovery.py`, and `cli/doctor.py`.
- `src/my_deepagent/config.py`: `database_url` default switched from
`sqlite+aiosqlite:///<data_dir>/database.sqlite3` to
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. The v3
`devflow` DB is preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `src/my_deepagent/persistence/models.py`: `RunRow.__table_args__` partial
unique index now declares **both** `postgresql_where=` and `sqlite_where=`
so the index is partial on both dialects.
- `src/my_deepagent/cli/doctor.py`: check 8 (`disk+db`) is now dialect-aware
— Postgres path runs `SELECT 1` (pg_isready equivalent: proves
reachability + auth + DB exists); SQLite path keeps
`PRAGMA integrity_check`. Doctor docstring updated.
- `alembic/env.py`: env-aware URL resolution — `MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > Postgres default. Async driver prefixes
(`+asyncpg`, `+aiosqlite`) are mapped to the sync equivalents alembic
needs (`+psycopg`, plain `sqlite`).
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite baseline
`79945fdc2649` + partial-index migration `839f2233e346` deleted.
- `tests/conftest.py` (new): `pg_db_url` async fixture. Creates a fresh
Postgres database per test (against docker-compose `devflow-postgres`)
via the maintenance `postgres` DB; drops on teardown after terminating
any lingering backends. Used by the E2E suite and the new checkpointer
tests.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter tests + 3 async context-manager tests).
- `tests/integration/test_e2e_workflow.py`: switched from `sqlite+aiosqlite`
tmp_path to `pg_db_url`. Real OpenRouter E2E now exercises the production
Postgres path end-to-end (~122 s, ~$0.05/run).
### Migration trigger (per DR-2)
- The bound is *two concurrent writers* on `runs` / `run_phases` / `llm_calls`.
Today the CLI is the only writer — but M8-Py (FastAPI) introduces a second
one, and SQLite WAL allows only a single concurrent writer. Doing the move
*before* M8-Py lands gives the test surface time to harden.
- Recovery: previous SQLite database at
`~/Library/Application Support/my-deepagent/database.sqlite3` (macOS) /
`$XDG_DATA_HOME/my-deepagent/database.sqlite3` is **not migrated** —
v0.1.0 was the only release that wrote to it and v0.2 starts a fresh
history. Set `MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to
read the legacy file if needed.
### Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s) — bulk on sqlite tmp_path, new
checkpointer suite on Postgres `pg_db_url`
- pytest E2E real OpenRouter: 1 PASS (122.93 s) on Postgres backend
## [0.1.0] - 2026-05-16
First tagged milestone of the Python rewrite. The pre-Python-rewrite TypeScript
monorepo has been removed (commit `0e61b2d`); recovery is available via the
`pre-python-rewrite` tag at `c9fed71`.
### Added
- Step 15 — End-to-end real OpenRouter integration: `tests/integration/test_e2e_workflow.py`
runs `spec-and-review@1` workflow (spec → review → verify) end-to-end against real
OpenRouter DeepSeek in ~76s for ~$0.05 per run. `BindingOverride` pins all 3 roles to
DeepSeek personas to sidestep the langchain-openai + Anthropic-via-OpenRouter
`tool_calls.args` JSON-string ValidationError (known v0.1.0 limit). New seed personas:
`openrouter-deepseek-spec-writer@1` (capabilities: spec_write, phase_planning;
max_cost_per_call_usd=0.01) and `openrouter-deepseek-code-reviewer@1` (capabilities:
code_review, evidence_check; max_cost_per_call_usd=0.01). Persona count test updated
to 12. `WorkflowEngine._build_envelope` now inlines the artifact JSON Schema directly
in the prompt so the LLM sees exact required fields. `WorkflowEngine._record_llm_call`
fills every NOT NULL `LlmCallRow` column (thread_id, persona_version, role, turn_index,
cached_tokens, reasoning_tokens, cost_usd_input/output, etc.). `CostMiddleware` now
probes both `usage_metadata` and `response_metadata.token_usage` (prompt_tokens /
completion_tokens fallback) to capture OpenAI-compatible streamed responses forwarded
by OpenRouter.
- Step 12 — Doctor full 8-check + OpenRouter pricing fetch: `mydeepagent doctor`
now runs 8 checks (python / uv / git / workspace_root / config+governance /
openrouter_api_key / openrouter_ping + pricing upsert / disk+sqlite integrity).
`mydeepagent pricing` lists the cached OpenRouter pricing matrix from the
persisted `model_pricing` table. `mydeepagent run` preview now reads from the
persisted `model_pricing` table when populated, falling back to the static seed
otherwise. 26 new tests (23 unit + 3 integration).
- Step 11 — Audit log + secret scrubbing: append-only `{state_dir}/audit.jsonl`
recording every tool call (name/args/duration/error). `AuditToolMiddleware` now
ships with a built-in JSONL recorder (`file_recorder`), attached automatically in
`WorkflowEngine` and Interactive REPL. `structlog` configured project-wide via
`my_deepagent.logging.configure_logging`, with a `_scrub_processor` that redacts
OpenRouter / Anthropic / OpenAI / LangSmith / GitHub / GitLab API keys plus
generic Bearer tokens before they reach stderr or JSON sinks. `audit.py` provides
`append_audit_record` (O_APPEND, 0o600 permissions), `read_audit_records` (with
optional limit, corrupt-line skip), and `make_audit_recorder` async factory.
19 new tests (8 audit unit, 9 logging unit, 3 audit-middleware integration).
- Step 10 — Interactive REPL: `mydeepagent` (no subcommand) launches a prompt_toolkit
REPL with `--agent` / `--model` overrides, slash commands (`/help`, `/quit`, `/exit`,
`/agent`, `/model`, `/clear`, `/stats`, `/budget`, `/runs`), file refs
(`@path/to/file.py` expansion with repo-root containment check), and
`CostMiddleware`-wired agent calls so spending is metered per interactive session.
`slash.py` implements `parse_slash` + `SlashRegistry`. `CostMiddleware` gains
`interactive_session_id` parameter. 21 new tests (10 slash unit, 5 file-ref unit,
3 CLI integration, 3 updated CLI unit).
- Step 9 — Crash recovery + concurrency: `sweep_orphan_runs(db)` in
`my_deepagent.recovery` marks non-terminal runs/phases as failed at app startup so
active-run uniqueness slots (partial unique index `ux_active_run_repo_base`) are freed;
`mydeepagent runs list/show/resume` CLI in `my_deepagent.cli.runs` (list with optional
`--state` filter, show by full UUID or 6+ char prefix, resume stub with exit-2 hint);
SIGTERM/SIGINT graceful shutdown in `WorkflowEngine` (`install_signal_handlers`,
`_on_signal`, `_force_cancel_inflight`; 30s grace then cancel in-flight tasks);
auto-sweep on `mydeepagent run` before any new phase begins. 21 new tests.
- Step 8 — Budget guardrails: `BudgetTracker` (SQLite WAL ledger via `BudgetLedgerRow`,
on_hit policy block/warn_continue/prompt, per-run + per-day + per-persona-daily
scopes) in `my_deepagent.budget`; cost preview before `mydeepagent run` (rich table
with per-phase est.) via `my_deepagent.monitoring.cost_estimator`;
`CostMiddleware` integrated with `BudgetTracker` (pre-call assert + post-call record);
`WorkflowEngine` accepts optional `budget_tracker` and `pricing` kwargs (backward-
compatible); CLI: `mydeepagent budget` (ledger), `mydeepagent stats --by model|persona|day`,
`mydeepagent costs` (alias); `--no-preview` flag on `mydeepagent run`.
28 new tests.
- Step 7 — Workflow engine: `WorkflowEngine` in `my_deepagent.engine` orchestrates
phase loop, artifact watcher (write_file/edit_file detection), jsonschema validation
with one repair retry, approval gate, and final report compose (JSON + Markdown).
`ArtifactWatcherMiddleware` in `my_deepagent.middleware.artifact_watcher` intercepts
write_file/edit_file tool calls targeting the expected artifact path.
`RunEventType` + `run_idempotency_key` in `my_deepagent.run_event` (closed event set,
deterministic idempotency keys per plan v2.0 §13.1).
`cli/run.py` exposes `mydeepagent run <workflow.yaml>`.
`tui/approval.py` prompts the user for approve/reject/request_changes/abort.
FK-safe persistence: WorkflowTemplateRow and AgentPersonaRow upserted before RunRow
to satisfy SQLite FK ordering constraints.
18 new tests: 12 engine unit/integration tests + 6 artifact watcher tests.
- Step 6 — Distribution: `mydeepagent init/login/logout/keys/doctor` CLI commands;
platformdirs-based data dirs; OS keyring (macOS Keychain / Linux Secret Service /
Windows Credential Store) for API keys via `my_deepagent.keys`; first-run
governance consent in `governance.py`; secret resolution priority
(config → env → keyring → error) in `my_deepagent.secrets`; i18n catalog
(ko / en) under `my_deepagent.i18n` controlled by `MYDEEPAGENT_LANG`.
- persistence/models.py (P0-1): partial unique index `ux_active_run_repo_base` on `runs(repo_path, base_branch) WHERE state NOT IN ('completed','failed','aborted')` — prevents duplicate active runs per repo/branch
- persistence/models.py (P0-3): FK constraints added to `RunRow.template_id` (RESTRICT), `RunBindingRow.persona_id` (RESTRICT), `InteractiveSessionRow.persona_id` (RESTRICT), `RunEventRow.phase_id` (CASCADE), `ApprovalRequestRow.phase_id` (CASCADE), `ArtifactRow.phase_id` (CASCADE), `ToolCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `LlmCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `PhaseFeedbackRow.run_id/phase_id` (CASCADE)
- alembic/versions/839f2233e346: new migration adding partial unique index and all FK constraints above; uses SQLite table-rebuild pattern with PRAGMA foreign_keys=OFF/ON guard
- persistence/checkpointer.py (P0-4): removed `get_checkpointer` (leaking connection helper); only `get_checkpointer_ctx` context manager is now exported
- tests/integration/test_checkpointer.py: 5 tests for checkpointer ctx lifecycle (file creation, parent dir, connection cleanup, lock-free concurrent use)
- tests/integration/test_persistence.py: 7 new P0 verification tests (active-run partial index blocks/allows, cascade-delete of phase_feedback+run_phases, RESTRICT on template delete, index exists in sqlite_master)
- tests/unit/test_session.py: full rewrite to deepagents dataclass API — FilesystemPermission attribute access (.mode/.paths/.operations), build_backend type dispatch (5 cases), _map_operations deduplication (8 cases), _spec_to_permission mapping, updated _subagent_to_dict and _resolve_openrouter_api_key tests; 47 unit tests total
- tests/integration/test_openrouter_smoke.py: real OpenRouter/DeepSeek smoke test (3 tests, ~$0.001-$0.003/run, max_tokens=50); skipped automatically when no API key is configured; validates ChatOpenAI response, usage_metadata tokens, and deepagents CompiledStateGraph end-to-end
- pyproject.toml: registered `integration` pytest marker to silence --strict-markers error
- v0.1.0 scaffolding (Step 0): src/tests/docs trees, ruff/mypy/pre-commit/alembic config
- Seed assets copied to docs/schemas/ (personas/workflows/artifacts validated)
- Core module (Step 1): config, enums, errors, hash + unit tests
- Persona / Workflow / Binding module (Step 2): pydantic schemas, YAML loaders, deterministic auto-select, override, consent store with atomic write
- Step 1 review patches (P0/P1): exception chain context suppression, classmethod LSP fix, workspace_root realpath canonicalization, config_invalid error mapping
### Changed
- deepagents 0.6.1 LocalShellBackend + permissions conflict workaround: removed `permissions` block from all 10 seed personas; `SafetyShellMiddleware` now enforces destructive-command + secret-path policy at the tool layer for local_shell backend agents.
- `build_agent` automatically prepends `SafetyShellMiddleware` to every agent and skips `permissions` kwarg when `deepagents_backend == "local_shell"`.
- `SafetyShellMiddleware` extended with secret-path enforcement: `read_file`/`write_file`/`edit_file`/`ls` tool calls are blocked when `file_path`/`path` matches any `DENY_PATH_PATTERNS` glob (wcmatch GLOBSTAR|IGNORECASE|DOTGLOB).
- All env vars require `MYDEEPAGENT_` prefix (e.g. `MYDEEPAGENT_OPENROUTER_API_KEY`, `MYDEEPAGENT_BUDGET_DAILY_USD`). `.env.example` updated accordingly. This isolates my-deepagent's env namespace from other tools.
- Persona / Workflow / FilesystemPermission models now store list-valued fields as tuples (deep immutability — prevents post-construction mutation that would invalidate compute_hash()).
### Known limitations (v0.1.0)
- `usage_metadata` is sometimes empty for responses forwarded by OpenRouter (deepagents
wraps the underlying ChatOpenAI response so token counts may not surface). The
`CostMiddleware` recorder still fires and a `LlmCallRow` row is persisted, but
`input_tokens` / `output_tokens` may read as 0 — the E2E test treats this as a known
limit. v0.2 will probe more response shapes (raw chunks / callbacks).
- Anthropic models via OpenRouter currently fail with a `tool_calls.args` JSON-string
vs dict ValidationError inside `langchain-openai`. Workaround: pin DeepSeek personas
via `BindingOverride`. Tracking for v0.2.
- `mydeepagent runs resume <run_id>` is a stub (exit-2 hint only); workflow replay
from a half-run state is not yet implemented.