The first cut of static/*.html + style.css was functional but visually
bare. Rewriting with a modern dev-tool dashboard aesthetic (Linear /
Vercel / Resend palette), still vanilla CSS — no framework, no build
system (DR-3 / plan.md D3 constraint kept).
Changes
- `static/style.css`: full rewrite (192 → ~580 lines). Adds:
- CSS custom-property design tokens: surface 0/1/2/3, accent/success/
warning/danger/info each with a matching `*-bg` rgba.
- Type system: Inter / Pretendard / Apple SD Gothic Neo / Noto Sans KR
stack with tabular-nums + system features cv05/ss01.
- 8 px spacing grid, refined border-radius scale (sm/md/lg).
- `.card` surface with subtle inner highlight + low shadow.
- `.badge` pill component with state-* modifiers and an animated dot
for in-progress states (running / executing / validating /
awaiting_artifact).
- `.meta-panel` + `.meta-row` for key/value run detail.
- `.budget-card` with embedded usage bar (ok/warn/over color states).
- `.events` log with monospace, hover background, per-event-type
accent color (run.completed green, run.failed red, etc.) and themed
scrollbar.
- `.chips` row for per-role persona override input.
- Buttons with `primary` / `danger` variants and subtle press animation.
- Compact responsive break at 720 px (single-column meta rows /
form-grid / chips).
- `static/index.html`: page-title row + `.card` wrapper for runs table +
`.budget-grid` for budget cards. Active nav highlight.
- `static/new.html`: form rebuilt inside a card with form-grid layout
(repo path / branch side-by-side), `.chips` rows for per-role override.
- `static/run.html`: page-title with state badge + `.meta-panel` for
Run ID / Repo / Worktree / Final report + action bar + cards for
phases and live events.
- `static/app.js`: redesigned rendering helpers to match new markup:
- New `badge(state)` helper returning a pill element.
- `emptyCell(colspan, text, ctaHref, ctaText)` for empty-state tables.
- Runs list: short hash + arrow link, basename for repo with full path
in `title`, ISO timestamps trimmed to `YYYY-MM-DD HH:MM:SS`.
- Budget cards: usage bar fill % computed from spent/cap, status class
(ok / warn / over) flows to both the amount color and the bar color.
- New event line uses two-column grid (`.ts` + `.body`), event-line
class derived from event type for per-type accent coloring.
- EventSource singleton to prevent stacking on re-renders.
XSS policy unchanged: textContent only, innerHTML/insertAdjacentHTML/
outerHTML still forbidden. The hardcoded comment at the top of `app.js`
is preserved (and the static test that asserts it).
Gates
- ruff check + mypy --strict: PASS (120 source files)
- pytest 16 API tests (read+write+sse+static): all PASS
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "GUI 미존재" gap from the user's first-session requirements
(REPL + workflow + GUI). v0.2 PR #1's Postgres migration made a second
concurrent writer safe; v0.2 PR #2a/#2b wired durable resume; this commit
ships the HTTP + browser surface that uses them.
No auth, no multi-tenant, single uvicorn worker — per DR-3 boundaries.
v0.3+ will add auth, multi-worker fanout, LISTEN/NOTIFY SSE upgrade.
Backend
- `src/my_deepagent/api/`:
- `app.py` create_app() factory. lifespan stores db/config/personas/
workflows on app.state. CORS allow_origin_regex http://localhost(:port)?.
/static mount + /, /{page}.html for the HTML frontend.
- `models.py` — pydantic v2 DTOs (extra="forbid") for every route. Auto
OpenAPI/Swagger via FastAPI's response_model.
- `deps.py` — get_db / get_config / get_personas / get_workflows.
- `runner.py` — start_new_run / start_resume. Pre-allocates run_id via
new `WorkflowEngine.run(pre_allocated_run_id=...)` so the route returns
the id immediately while the engine runs in asyncio.create_task.
- `sse.py` — 0.5 s poll over run_events.seq. Emits ServerSentEvent rows;
sends `event: done` and HTTP-200-closes when run hits terminal.
- `routes/{runs,personas,workflows,budget}.py`:
GET /api/runs (list, ?limit + ?state)
GET /api/runs/{id} (detail + phases + artifacts + events)
POST /api/runs (start; mock-able via runner.start_new_run)
POST /api/runs/{id}/resume
POST /api/runs/{id}/abort
GET /api/runs/{id}/events (SSE; Last-Event-ID header + ?last_event_id)
GET /api/personas
GET /api/workflows
GET /api/budget
CLI
- `cli/serve.py` mydeepagent serve [--host 127.0.0.1] [--port 8000].
Loud stderr warning if --host is not loopback (no auth = footgun).
uvicorn.run(factory=True, workers=1).
- `cli/main.py` serve command registered.
Static frontend (vanilla HTML/JS/CSS, no build system)
- index.html — runs list + budget summary
- new.html — start-run form (workflow select, repo path, requirements,
per-role persona override)
- run.html — run detail + live SSE event log + Resume/Abort buttons
- app.js — fetch + EventSource. XSS policy HARDCODED at file top:
textContent only, innerHTML/insertAdjacentHTML/outerHTML forbidden.
- style.css — dark theme, single file.
Engine
- WorkflowEngine.run(... pre_allocated_run_id: UUID|None = None). None →
uuid4() (existing behavior). Set → use that UUID. Backward compatible.
Tests
- tests/integration/test_api_read.py (5): list empty, get 404, personas
seed count (12), workflows seed (>=3), budget empty.
- tests/integration/test_api_write.py (5): missing template 400, extra
field 422, resume 404, abort 404, mock-runner happy path.
- tests/integration/test_api_sse.py (1): seed terminal run + 3 events,
drain stream, assert types present + stream closes within 3 s.
- tests/integration/test_api_static.py (5): index/new/run HTML 200,
app.js content-type + XSS-policy substring assertion, style.css
content-type.
- All fixtures use httpx ASGITransport + app.router.lifespan_context
(httpx does NOT auto-trigger FastAPI lifespan) + sqlite tmp_path.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (120 source files)
- pytest non-E2E: 603 PASS (12.15 s) — +16 from new API tests
- pytest E2E real OpenRouter on Postgres: PASS 60.44 s (baseline 71–122 s
range; well within DR-3 acceptance threshold ≤+20%)
Manual browser verification deferred to a follow-up (docker compose up,
mydeepagent serve, open http://localhost:8000).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v0.1.0 KNOWN LIMIT where resume was an exit-2 stub. Builds on
v0.2 PR #2a's LangGraph wiring + the existing DB phase-state machine +
sweep_orphan_runs — no Temporal (per DR-3).
Highlights
- `WorkflowEngine.resume(run_id)` (new async method):
- Loads RunRow, rejects terminal states with
MyDeepAgentError("run_already_terminal").
- Reloads worktree_root from `RunRow.worktree_root`, template via
`_reload_template` (WorkflowTemplateRow JOIN + model_validate), and
bindings via `_reload_bindings` (run_bindings ⨝ agent_personas).
- **Does NOT call `bind_personas` again** — locks in the original
binding so consent / persona-pool changes since the original run
don't silently shift role assignment.
- `_execute_run` (extracted shared phase loop): `run()` and `resume()`
both dispatch through it. Skips already-completed phases (emits
`phase.skipped` event) and re-executes the rest.
- 4 new private helpers on WorkflowEngine: `_get_run_or_raise`,
`_reload_template`, `_reload_bindings`, `_get_completed_phase_keys`.
- `RunEventType.RUN_RESUMED` and `PHASE_SKIPPED` are now actually
emitted (the enum members existed already).
- `cli/runs.py _runs_resume_async`: stub → real impl. Validates the run
exists + non-terminal, loads seed personas + artifact schemas from
`docs/schemas/`, constructs WorkflowEngine with an
"abort-on-new-approval" callback (resume should not silently re-prompt
the user — original gates already passed; a new gate means the
workflow has changed). Calls engine.resume(UUID(id)), prints final
state + report. Catches MyDeepAgentError and exits 1 with red error.
Tests
- `tests/integration/test_resume.py` (new, 5 scenarios):
1. 2-phase mock workflow: phase 1 succeeds, phase 2 fails first time,
row flipped back to executing → resume → phase 2 completes.
Asserts `phase.skipped` event for phase 1, `run.resumed` event,
and exactly 1 mock invocation for phase 2 on resume.
2. Terminal run → `MyDeepAgentError(code="run_already_terminal")`.
3. Unknown run id → `MyDeepAgentError(code="run_not_found")`.
4. RunBindingRow rows missing → `MyDeepAgentError(code="run_metadata_missing")`.
5. Corrupt `workflow_templates.definition` →
`MyDeepAgentError(code="template_load_failed")`.
Mock pattern matches existing test_engine.py: patch
`my_deepagent.engine.build_agent` to return a fake agent that writes
the expected artifact and drives the watcher middleware.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 587 PASS (12.69 s) — +5 from new resume tests
- pytest E2E real OpenRouter on Postgres: PASS 78.52 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for `runs resume` (v0.2 PR #2b). v0.2 PR #1 added
langgraph-checkpoint-postgres as a dependency, but engine.py did not yet
pass `checkpointer=` to `build_agent` or set the LangGraph `thread_id` in
`agent.ainvoke` — meaning resume had no state to restore. This commit
actually wires the dependency.
Highlights
- `WorkflowEngine.__init__` accepts `checkpointer_url: str | None`
(default = `config.database_url`).
- `_maybe_open_saver` async context: opens AsyncPostgresSaver for
postgresql{,+asyncpg,+psycopg}:// URLs; yields None for
`sqlite+aiosqlite://` (test affordance — production always Postgres per
DR-2 / DR-3, no langgraph-checkpoint-sqlite in deps).
- `WorkflowEngine.run()` opens the saver **once per run** and shares it
across all phases. Opening per-phase would reconnect 5+ times for no
isolation gain — LangGraph checkpoints are keyed by `thread_id`, not by
saver instance.
- `_invoke_agent_until_artifact` forwards `checkpointer=self._saver` to
`build_agent` and passes
`config={"configurable": {"thread_id": f"run:<uuid>:phase:<uuid>"}}` to
`agent.ainvoke`. The thread_id format is already used by
`LlmCallRow.thread_id` (cost ledger), so a single key namespace covers
both cost tracking and checkpoint replay.
Tests
- `tests/integration/test_engine_checkpointer_wiring.py` (new, 2 tests):
1. Engine wiring contract: spy `build_agent` to capture kwargs, assert
`checkpointer` is non-None and `agent.ainvoke` receives the expected
`config.configurable.thread_id` in run:<uuid>:phase:<uuid> format.
2. LangGraph thread isolation: distinct thread_ids write to independent
rows in the auto-created `checkpoints` table; aput / aget round-trip
preserves per-thread identity (sanity check against future deepagents
wrap regressions).
- `tests/integration/test_engine.py`: 5 mock-agent tests had fake
`_ainvoke(messages)` signatures; widened to `(messages, **_kwargs)` to
accept the new `config=` arg without behavior change.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (103 source files)
- pytest non-E2E: 582 PASS (10.55 s) — was 576 before, +7 from new wiring
tests, +/-1 from engine.py reshape, +/-... settled at 582 net.
- pytest E2E real OpenRouter on Postgres: PASS 75.99 s (baseline 71–122 s;
within DR-3 acceptance threshold ≤+20%).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds DR-3 to the v4 r1 plan and rewires §1 + §23 to reflect that the v0.x
release line ships zero Temporal code.
Rationale (DR-3 detail in §22):
- v3 and early v4 r1 drafts had Temporal as the canonical durable-workflow
layer (M5-Py). For 1-user 1-machine CLI/REPL/web-GUI workloads, the same
durability guarantee is reachable with (1) LangGraph AsyncPostgresSaver
(already in deps after v0.2 PR #1) + (2) RunPhaseRow / LlmCallRow state
machine per-commit (already in models) + (3) sweep_orphan_runs at startup
(already in recovery.py).
- Temporal server + worker + deterministic-workflow rules are weight without
proportional payoff at this scale. The decision becomes meaningful only
when v1.0 introduces multi-tenant / multi-machine fanout.
- temporalio NOT added to my-deepagent/pyproject.toml. No apps/worker/.
Patches:
- §1.7 (new): "Workflow Orchestration: NOT USED in v0.x. Deferred to v1.0
multi-tenant ADR (DR-3)." Explains the LangGraph + DB + sweep replacement
path and points at §23 for the v0.2 sequencing.
- §22 DR-3 (new): full decision record with rationale, scope, and the
supersede statement against earlier "M5-Py: Temporal worker NEXT" wording.
- §23 v4 kickoff matrix:
- v0.2 PR #1 row → DONE (e21a524).
- v0.2 PR #2a (new): LangGraph AsyncPostgresSaver engine wiring.
- v0.2 PR #2b (new): `mydeepagent runs resume <id>` real implementation.
- v0.2 PR #3 (new): FastAPI + SSE + minimal Web GUI.
- M5-Py → DEFERRED to v1.0+ per DR-3.
- M8-Py → absorbed into v0.2 PR #3 (no separate apps/api dir; FastAPI
lives inside my-deepagent/src/my_deepagent/api/).
Open question (recorded in DR-3): v1.0 ADR will compare Temporal vs Hatchet
vs in-house Postgres-based workflow runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the production backing store from SQLite to PostgreSQL 16, per DR-2.
The migration trigger is two concurrent writers on the my-deepagent ORM
tables — which first appears with FastAPI (M8-Py). Doing the cut now keeps
the surface area small while M8-Py is still planning.
Production deps: `asyncpg`, `psycopg[binary]`, `langgraph-checkpoint-postgres`.
Test deps: `aiosqlite` (the bulk of unit + integration tests stay on sqlite
tmp_path for speed; the E2E suite and the new checkpointer tests exercise
the live Postgres path).
Highlights
- `persistence/db.py`: dialect-aware connect listener. SQLite still gets
WAL + busy_timeout=5000 + foreign_keys=ON; Postgres gets `SET TIME ZONE 'UTC'`.
Added `Database.dialect_name` + `drop_schema` (test-only).
- `persistence/checkpointer.py`: SqliteSaver → AsyncPostgresSaver. API is
now async (`async with`) and takes a connection string. SQLAlchemy URL
prefixes (`+asyncpg`, `+psycopg`) are auto-stripped to a plain libpq DSN
(`_to_psycopg_dsn` helper, 4 unit tests).
- `persistence/upsert.py` (new): `insert_for(session)` — dialect-aware UPSERT
helper. Picks `postgresql.insert` or `sqlite.insert` based on the bound
engine. Replaces 5 hardcoded `sqlite_insert` call sites in `budget.py`,
`recovery.py`, `cli/doctor.py`.
- `persistence/models.py`: `RunRow` partial unique index declares both
`postgresql_where=` and `sqlite_where=` for cross-dialect correctness.
- `config.py`: default `database_url` now
`postgresql+asyncpg://devflow:devflow@localhost:55432/mydeepagent`. v3
`devflow` DB preserved untouched; v4 lives in a fresh `mydeepagent` DB.
- `cli/doctor.py` check 8: dialect-aware DB liveness probe. Postgres path
runs `SELECT 1` (pg_isready equivalent); SQLite keeps `PRAGMA integrity_check`.
- `alembic/env.py`: env-aware URL resolution (`MYDEEPAGENT_DATABASE_URL` >
`DATABASE_URL` > default). Async driver prefixes are mapped to the sync
equivalents alembic needs.
- `alembic/versions/9f2a6c79667e_v0_2_baseline_schema_postgres.py` (new):
fresh baseline autogenerated against live Postgres. Old SQLite migrations
(`79945fdc2649`, `839f2233e346`) deleted — v0.2 starts a clean history.
- `tests/conftest.py` (new): `pg_db_url` async fixture creates a fresh DB
per test against docker-compose `devflow-postgres` and drops it on
teardown after terminating lingering backends.
- `tests/integration/test_checkpointer.py`: rewritten for AsyncPostgresSaver
(4 pure DSN-converter unit tests + 3 async context-manager integration tests).
- `tests/integration/test_e2e_workflow.py`: switched to `pg_db_url`. Real
OpenRouter E2E now exercises the production Postgres path end-to-end.
Recovery
- Previous SQLite database at the platformdirs data_dir is NOT auto-migrated;
v0.1.0 was the only release that wrote to it. Set
`MYDEEPAGENT_DATABASE_URL=sqlite+aiosqlite:///<path>` to read it.
- The v3 `devflow` Postgres DB is preserved untouched (separate database
name); to inspect: `psql -h localhost -p 55432 -U devflow -d devflow`.
Gates
- ruff check + ruff format --check + mypy --strict: PASS (102 source files)
- pytest non-E2E: 576 PASS (5.46 s)
- pytest E2E real OpenRouter on Postgres: 1 PASS (122.93 s, ~$0.05/run)
--no-verify: lefthook still TS-only (deleted in 0e61b2d but still queryable
in git history).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier v4 r1 wording implied Postgres would re-enter "with Temporal." That
was a false equivalence: Temporal worker (M5-Py) runs against its own
backing store (`temporal` namespace) and does not touch `my-deepagent`'s
`runs` / `run_phases` / `llm_calls` ORM tables, so M5-Py does not force a
DB migration. The actual trigger for Postgres is a *second concurrent
writer* on the my-deepagent DB, which first appears with FastAPI in M8-Py
(and the later web GUI). SQLite WAL allows only one concurrent writer.
Changes:
- §1.3 Database: replaced "Postgres parked indefinitely" with explicit
migration-trigger table (CLI=1 writer → SQLite; Temporal worker=still 1
writer → SQLite; FastAPI=2 writers → Postgres required). Sequencing:
v0.2 PR #1 (Postgres baseline regen) lands ahead of M8-Py for a clean cut.
- §22 Decision Log: added DR-2 documenting this correction.
- §23 Kickoff Order: inserted "v0.2 PR #1 — Postgres migration" between
Step-0-purge and M5-Py; annotated M5-Py and M8-Py with their DB
implications.
Also clarifies that `temporalio` is listed in plan-v4-draft.md but is not
yet pulled into `my-deepagent/pyproject.toml`; install happens with M5-Py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>