feat(my-deepagent): v0.1.0 Step 0~5 — scaffolding through deepagent + OpenRouter

Python rewrite of the agent harness on top of deepagents 0.6.1 + langchain 1.x,
replacing the abandoned TS attempt in packages/. 388 unit/integration tests pass.

Steps
-----
0. Scaffolding — uv workspace, ruff/mypy/pre-commit/alembic, src/tests/docs
   trees with docs/schemas/ seeded from my-deepagent-seed/.
1. Core — config (pydantic-settings with MYDEEPAGENT_ env prefix and TOML
   source), enums (Backend, Capability, RiskLevel, ApprovalDecisionAction,
   ApprovalState, RunState, RunPhaseState, SessionState, ErrorClass),
   errors (MyDeepAgentError + BudgetExhaustedError with PEP-3134 cause +
   context suppression), hash (canonical JSON + sha256).
2. Persona/Workflow/Binding — pydantic v2 schemas with tuple-based deep
   immutability (post-construction hash drift prevented), YAML loaders,
   deterministic auto-select (preferred_backends → version → name → hash),
   override resolution with ineligibility diagnostics, PersonaConsentStore
   with fcntl.flock + tmp+fsync+rename atomic write.
3. Artifact schema registry — Draft202012Validator, multi-root resolution,
   structured ValidationFinding output.
4. Persistence — 18 SQLAlchemy 2.0 async ORM models with FK CASCADE/RESTRICT,
   WAL + busy_timeout + foreign_keys PRAGMA, alembic baseline +
   ux_active_run_repo_base partial unique index, LangGraph SqliteSaver as
   context manager only (lifecycle safety).
5. DeepAgent session — build_agent wires Persona → create_deep_agent with
   LocalShellBackend / FilesystemBackend / StateBackend / CompositeBackend,
   ChatOpenAI(base_url=openrouter) for openrouter: model strings, and 4
   middleware classes (cost / audit-tool / safety-shell / fallback-model).

Critical workarounds
--------------------
- deepagents 0.6.1 rejects FilesystemPermission together with backends that
  implement SandboxBackendProtocol (LocalShellBackend). SafetyShellMiddleware
  enforces destructive-command and secret-path policy at the tool layer
  instead, and build_agent strips the permissions kwarg when the persona's
  deepagents_backend is local_shell.
- FilesystemOperation in deepagents is Literal['read', 'write'] only;
  _map_operations collapses our richer schema (read/write/edit/ls) safely.

Real OpenRouter smoke
---------------------
test_openrouter_deepagents_local_shell_smoke calls DeepSeek via deepagents +
LocalShellBackend + SafetyShellMiddleware end-to-end. PASS, ~$0.000001 cost,
input=9 / output=1 tokens with content "OK".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
chungyeong
2026-05-15 19:40:02 +09:00
parent 1fe59d16ca
commit 17ba5d723b
100 changed files with 12408 additions and 0 deletions

View File

@@ -0,0 +1,6 @@
MYDEEPAGENT_OPENROUTER_API_KEY=
# MYDEEPAGENT_LANGSMITH_TRACING=true
# MYDEEPAGENT_LANGSMITH_API_KEY=
# MYDEEPAGENT_LANGSMITH_PROJECT=my-deepagent
# MYDEEPAGENT_DATA_DIR=
# MYDEEPAGENT_LANG=ko

17
my-deepagent/.gitignore vendored Normal file
View File

@@ -0,0 +1,17 @@
__pycache__/
*.py[cod]
*.egg-info/
.venv/
.pytest_cache/
.mypy_cache/
.ruff_cache/
.env
.env.local
*.db
*.db-journal
*.db-wal
*.db-shm
.DS_Store

View File

@@ -0,0 +1,19 @@
repos:
- repo: local
hooks:
- id: ruff
name: ruff check
entry: uv run ruff check --fix
language: system
types: [python]
- id: ruff-format
name: ruff format
entry: uv run ruff format
language: system
types: [python]
- id: mypy
name: mypy
entry: uv run mypy --strict src
language: system
types: [python]
pass_filenames: false

View File

@@ -0,0 +1 @@
3.12

26
my-deepagent/CHANGELOG.md Normal file
View File

@@ -0,0 +1,26 @@
# Changelog
## [Unreleased]
### Added
- persistence/models.py (P0-1): partial unique index `ux_active_run_repo_base` on `runs(repo_path, base_branch) WHERE state NOT IN ('completed','failed','aborted')` — prevents duplicate active runs per repo/branch
- persistence/models.py (P0-3): FK constraints added to `RunRow.template_id` (RESTRICT), `RunBindingRow.persona_id` (RESTRICT), `InteractiveSessionRow.persona_id` (RESTRICT), `RunEventRow.phase_id` (CASCADE), `ApprovalRequestRow.phase_id` (CASCADE), `ArtifactRow.phase_id` (CASCADE), `ToolCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `LlmCallRow.run_id/phase_id/interactive_session_id` (CASCADE), `PhaseFeedbackRow.run_id/phase_id` (CASCADE)
- alembic/versions/839f2233e346: new migration adding partial unique index and all FK constraints above; uses SQLite table-rebuild pattern with PRAGMA foreign_keys=OFF/ON guard
- persistence/checkpointer.py (P0-4): removed `get_checkpointer` (leaking connection helper); only `get_checkpointer_ctx` context manager is now exported
- tests/integration/test_checkpointer.py: 5 tests for checkpointer ctx lifecycle (file creation, parent dir, connection cleanup, lock-free concurrent use)
- tests/integration/test_persistence.py: 7 new P0 verification tests (active-run partial index blocks/allows, cascade-delete of phase_feedback+run_phases, RESTRICT on template delete, index exists in sqlite_master)
- tests/unit/test_session.py: full rewrite to deepagents dataclass API — FilesystemPermission attribute access (.mode/.paths/.operations), build_backend type dispatch (5 cases), _map_operations deduplication (8 cases), _spec_to_permission mapping, updated _subagent_to_dict and _resolve_openrouter_api_key tests; 47 unit tests total
- tests/integration/test_openrouter_smoke.py: real OpenRouter/DeepSeek smoke test (3 tests, ~$0.001-$0.003/run, max_tokens=50); skipped automatically when no API key is configured; validates ChatOpenAI response, usage_metadata tokens, and deepagents CompiledStateGraph end-to-end
- pyproject.toml: registered `integration` pytest marker to silence --strict-markers error
- v0.1.0 scaffolding (Step 0): src/tests/docs trees, ruff/mypy/pre-commit/alembic config
- Seed assets copied to docs/schemas/ (personas/workflows/artifacts validated)
- Core module (Step 1): config, enums, errors, hash + unit tests
- Persona / Workflow / Binding module (Step 2): pydantic schemas, YAML loaders, deterministic auto-select, override, consent store with atomic write
- Step 1 review patches (P0/P1): exception chain context suppression, classmethod LSP fix, workspace_root realpath canonicalization, config_invalid error mapping
### Changed
- deepagents 0.6.1 LocalShellBackend + permissions conflict workaround: removed `permissions` block from all 10 seed personas; `SafetyShellMiddleware` now enforces destructive-command + secret-path policy at the tool layer for local_shell backend agents.
- `build_agent` automatically prepends `SafetyShellMiddleware` to every agent and skips `permissions` kwarg when `deepagents_backend == "local_shell"`.
- `SafetyShellMiddleware` extended with secret-path enforcement: `read_file`/`write_file`/`edit_file`/`ls` tool calls are blocked when `file_path`/`path` matches any `DENY_PATH_PATTERNS` glob (wcmatch GLOBSTAR|IGNORECASE|DOTGLOB).
- All env vars require `MYDEEPAGENT_` prefix (e.g. `MYDEEPAGENT_OPENROUTER_API_KEY`, `MYDEEPAGENT_BUDGET_DAILY_USD`). `.env.example` updated accordingly. This isolates my-deepagent's env namespace from other tools.
- Persona / Workflow / FilesystemPermission models now store list-valued fields as tuples (deep immutability — prevents post-construction mutation that would invalidate compute_hash()).

149
my-deepagent/alembic.ini Normal file
View File

@@ -0,0 +1,149 @@
# A generic, single database configuration.
[alembic]
# path to migration scripts.
# this is typically a path given in POSIX (e.g. forward slashes)
# format, relative to the token %(here)s which refers to the location of this
# ini file
script_location = %(here)s/alembic
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
# Or organize into date-based subdirectories (requires recursive_version_locations = true)
# file_template = %%(year)d/%%(month).2d/%%(day).2d_%%(hour).2d%%(minute).2d_%%(second).2d_%%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory. for multiple paths, the path separator
# is defined by "path_separator" below.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the tzdata library which can be installed by adding
# `alembic[tz]` to the pip requirements.
# string value is passed to ZoneInfo()
# leave blank for localtime
# timezone =
# max length of characters to apply to the "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to <script_location>/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "path_separator"
# below.
# version_locations = %(here)s/bar:%(here)s/bat:%(here)s/alembic/versions
# path_separator; This indicates what character is used to split lists of file
# paths, including version_locations and prepend_sys_path within configparser
# files such as alembic.ini.
# The default rendered in new alembic.ini files is "os", which uses os.pathsep
# to provide os-dependent path splitting.
#
# Note that in order to support legacy alembic.ini files, this default does NOT
# take place if path_separator is not present in alembic.ini. If this
# option is omitted entirely, fallback logic is as follows:
#
# 1. Parsing of the version_locations option falls back to using the legacy
# "version_path_separator" key, which if absent then falls back to the legacy
# behavior of splitting on spaces and/or commas.
# 2. Parsing of the prepend_sys_path option falls back to the legacy
# behavior of splitting on spaces, commas, or colons.
#
# Valid values for path_separator are:
#
# path_separator = :
# path_separator = ;
# path_separator = space
# path_separator = newline
#
# Use os.pathsep. Default configuration used for new projects.
path_separator = os
# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# database URL. This is consumed by the user-maintained env.py script only.
# other means of configuring database URLs may be customized within the env.py
# file.
sqlalchemy.url = driver://user:pass@localhost/dbname
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# lint with attempts to fix using "ruff" - use the module runner, against the "ruff" module
# hooks = ruff
# ruff.type = module
# ruff.module = ruff
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
# Alternatively, use the exec runner to execute a binary found on your PATH
# hooks = ruff
# ruff.type = exec
# ruff.executable = ruff
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
# Logging configuration. This is also consumed by the user-maintained
# env.py script only.
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
qualname =
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

View File

@@ -0,0 +1 @@
Generic single-database configuration.

View File

@@ -0,0 +1,83 @@
import os
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Load DATABASE_URL from environment, falling back to a local SQLite file.
# Alembic uses synchronous SQLAlchemy, so strip the async driver prefix when
# present (sqlite+aiosqlite:// → sqlite://).
_raw_url: str = os.environ.get("DATABASE_URL", "sqlite:///./database.sqlite3")
_sync_url: str = _raw_url.replace("sqlite+aiosqlite://", "sqlite://")
config.set_main_option("sqlalchemy.url", _sync_url)
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
from my_deepagent.persistence.models import Base # noqa: E402
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View File

@@ -0,0 +1,28 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
"""Upgrade schema."""
${upgrades if upgrades else "pass"}
def downgrade() -> None:
"""Downgrade schema."""
${downgrades if downgrades else "pass"}

View File

@@ -0,0 +1,303 @@
"""baseline schema for v0.1.0
Revision ID: 79945fdc2649
Revises:
Create Date: 2026-05-15 17:19:09.577439
"""
from collections.abc import Sequence
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "79945fdc2649"
down_revision: str | Sequence[str] | None = None
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None
def upgrade() -> None:
"""Upgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.create_table(
"agent_personas",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("name", sa.Text(), nullable=False),
sa.Column("version", sa.Integer(), nullable=False),
sa.Column("hash", sa.Text(), nullable=False),
sa.Column("definition", sa.JSON(), nullable=False),
sa.Column("created_at", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("hash"),
)
op.create_table(
"budget_ledger",
sa.Column("scope", sa.Text(), nullable=False),
sa.Column("spent_usd", sa.Float(), nullable=False),
sa.Column("cap_usd", sa.Float(), nullable=True),
sa.Column("last_updated", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("scope"),
)
op.create_table(
"interactive_sessions",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("persona_id", sa.String(length=36), nullable=False),
sa.Column("persona_hash", sa.Text(), nullable=False),
sa.Column("started_at", sa.Text(), nullable=True),
sa.Column("ended_at", sa.Text(), nullable=True),
sa.Column("last_message_at", sa.Text(), nullable=True),
sa.Column("state", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"llm_calls",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=True),
sa.Column("phase_id", sa.String(length=36), nullable=True),
sa.Column("interactive_session_id", sa.String(length=36), nullable=True),
sa.Column("thread_id", sa.Text(), nullable=False),
sa.Column("persona_name", sa.Text(), nullable=False),
sa.Column("persona_version", sa.Integer(), nullable=False),
sa.Column("model", sa.Text(), nullable=False),
sa.Column("role", sa.Text(), nullable=False),
sa.Column("turn_index", sa.Integer(), nullable=False),
sa.Column("input_tokens", sa.Integer(), nullable=False),
sa.Column("output_tokens", sa.Integer(), nullable=False),
sa.Column("cached_tokens", sa.Integer(), nullable=False),
sa.Column("reasoning_tokens", sa.Integer(), nullable=False),
sa.Column("cost_usd_input", sa.Float(), nullable=False),
sa.Column("cost_usd_output", sa.Float(), nullable=False),
sa.Column("cost_usd_total", sa.Float(), nullable=False),
sa.Column("latency_ms", sa.Integer(), nullable=False),
sa.Column("status", sa.Text(), nullable=False),
sa.Column("error_code", sa.Text(), nullable=True),
sa.Column("request_id", sa.Text(), nullable=True),
sa.Column("ts", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
"llm_calls_interactive_session_id_ts_idx",
"llm_calls",
["interactive_session_id", "ts"],
unique=False,
)
op.create_index("llm_calls_model_ts_idx", "llm_calls", ["model", "ts"], unique=False)
op.create_index("llm_calls_run_id_ts_idx", "llm_calls", ["run_id", "ts"], unique=False)
op.create_table(
"model_pricing",
sa.Column("model", sa.Text(), nullable=False),
sa.Column("input_per_1k_usd", sa.Float(), nullable=False),
sa.Column("output_per_1k_usd", sa.Float(), nullable=False),
sa.Column("context_length", sa.Integer(), nullable=False),
sa.Column("fetched_at", sa.Text(), nullable=False),
sa.Column("raw_payload", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("model"),
)
op.create_table(
"persona_consents",
sa.Column("persona_hash", sa.Text(), nullable=False),
sa.Column("persona_name", sa.Text(), nullable=False),
sa.Column("persona_version", sa.Integer(), nullable=False),
sa.Column("decision", sa.Text(), nullable=False),
sa.Column("decided_at", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("persona_hash"),
)
op.create_table(
"phase_feedback",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("phase_id", sa.String(length=36), nullable=False),
sa.Column("reaction", sa.Text(), nullable=True),
sa.Column("comment", sa.Text(), nullable=True),
sa.Column("created_at", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"runs",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("template_id", sa.String(length=36), nullable=False),
sa.Column("template_hash", sa.Text(), nullable=False),
sa.Column("state", sa.Text(), nullable=False),
sa.Column("repo_path", sa.Text(), nullable=False),
sa.Column("base_branch", sa.Text(), nullable=False),
sa.Column("worktree_root", sa.Text(), nullable=False),
sa.Column("current_phase_id", sa.String(length=36), nullable=True),
sa.Column("started_at", sa.Text(), nullable=True),
sa.Column("ended_at", sa.Text(), nullable=True),
sa.Column("final_report_path", sa.Text(), nullable=True),
sa.Column("paused_from_state", sa.Text(), nullable=True),
sa.Column("created_at", sa.Text(), nullable=False),
sa.Column("updated_at", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"tool_calls",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=True),
sa.Column("phase_id", sa.String(length=36), nullable=True),
sa.Column("interactive_session_id", sa.String(length=36), nullable=True),
sa.Column("tool_name", sa.Text(), nullable=False),
sa.Column("args", sa.JSON(), nullable=False),
sa.Column("result", sa.JSON(), nullable=True),
sa.Column("error", sa.Text(), nullable=True),
sa.Column("duration_ms", sa.Integer(), nullable=False),
sa.Column("ts", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_index("tool_calls_run_id_ts_idx", "tool_calls", ["run_id", "ts"], unique=False)
op.create_table(
"workflow_templates",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("name", sa.Text(), nullable=False),
sa.Column("version", sa.Integer(), nullable=False),
sa.Column("hash", sa.Text(), nullable=False),
sa.Column("definition", sa.JSON(), nullable=False),
sa.Column("created_at", sa.Text(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("hash"),
)
op.create_table(
"approval_requests",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("phase_id", sa.String(length=36), nullable=True),
sa.Column("gate_key", sa.Text(), nullable=False),
sa.Column("state", sa.Text(), nullable=False),
sa.Column("idempotency_key", sa.Text(), nullable=False),
sa.Column("payload", sa.JSON(), nullable=False),
sa.Column("created_at", sa.Text(), nullable=False),
sa.Column("resolved_at", sa.Text(), nullable=True),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("idempotency_key"),
)
op.create_table(
"artifacts",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("phase_id", sa.String(length=36), nullable=True),
sa.Column("path", sa.Text(), nullable=False),
sa.Column("schema_id", sa.Text(), nullable=False),
sa.Column("hash", sa.Text(), nullable=False),
sa.Column("valid", sa.Boolean(), nullable=False),
sa.Column("validation_error", sa.JSON(), nullable=True),
sa.Column("created_at", sa.Text(), nullable=False),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("run_id", "path", "hash", name="uq_artifacts_run_path_hash"),
)
op.create_table(
"run_bindings",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("role_id", sa.Text(), nullable=False),
sa.Column("persona_id", sa.String(length=36), nullable=False),
sa.Column("persona_hash", sa.Text(), nullable=False),
sa.Column("backend", sa.Text(), nullable=False),
sa.Column("binding_hash", sa.Text(), nullable=False),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("run_id", "role_id", name="uq_run_bindings_run_role"),
)
op.create_table(
"run_commands",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("command", sa.Text(), nullable=False),
sa.Column("payload", sa.JSON(), nullable=False),
sa.Column("idempotency_key", sa.Text(), nullable=False),
sa.Column("created_at", sa.Text(), nullable=False),
sa.Column("processed_at", sa.Text(), nullable=True),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("idempotency_key"),
)
op.create_table(
"run_events",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("phase_id", sa.String(length=36), nullable=True),
sa.Column("seq", sa.Integer(), nullable=False),
sa.Column("type", sa.Text(), nullable=False),
sa.Column("payload", sa.JSON(), nullable=False),
sa.Column("idempotency_key", sa.Text(), nullable=False),
sa.Column("ts", sa.Text(), nullable=False),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("run_id", "idempotency_key", name="uq_run_events_run_idempotency"),
sa.UniqueConstraint("run_id", "seq", name="uq_run_events_run_seq"),
)
op.create_index("run_events_run_id_ts_idx", "run_events", ["run_id", "ts"], unique=False)
op.create_table(
"run_inputs",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("requirements_md", sa.Text(), nullable=False),
sa.Column("objective", sa.JSON(), nullable=False),
sa.Column("extra", sa.JSON(), nullable=False),
sa.Column("input_hash", sa.Text(), nullable=False),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("run_id"),
)
op.create_table(
"run_phases",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("run_id", sa.String(length=36), nullable=False),
sa.Column("phase_key", sa.Text(), nullable=False),
sa.Column("seq", sa.Integer(), nullable=False),
sa.Column("state", sa.Text(), nullable=False),
sa.Column("attempts", sa.Integer(), nullable=False),
sa.Column("started_at", sa.Text(), nullable=True),
sa.Column("ended_at", sa.Text(), nullable=True),
sa.ForeignKeyConstraint(["run_id"], ["runs.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("run_id", "phase_key", name="uq_run_phases_run_phase"),
)
op.create_table(
"approval_decisions",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("approval_request_id", sa.String(length=36), nullable=False),
sa.Column("action", sa.Text(), nullable=False),
sa.Column("comment", sa.Text(), nullable=True),
sa.Column("decided_at", sa.Text(), nullable=False),
sa.Column("idempotency_key", sa.Text(), nullable=False),
sa.ForeignKeyConstraint(
["approval_request_id"], ["approval_requests.id"], ondelete="CASCADE"
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("idempotency_key"),
)
# ### end Alembic commands ###
def downgrade() -> None:
"""Downgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table("approval_decisions")
op.drop_table("run_phases")
op.drop_table("run_inputs")
op.drop_index("run_events_run_id_ts_idx", table_name="run_events")
op.drop_table("run_events")
op.drop_table("run_commands")
op.drop_table("run_bindings")
op.drop_table("artifacts")
op.drop_table("approval_requests")
op.drop_table("workflow_templates")
op.drop_index("tool_calls_run_id_ts_idx", table_name="tool_calls")
op.drop_table("tool_calls")
op.drop_table("runs")
op.drop_table("phase_feedback")
op.drop_table("persona_consents")
op.drop_table("model_pricing")
op.drop_index("llm_calls_run_id_ts_idx", table_name="llm_calls")
op.drop_index("llm_calls_model_ts_idx", table_name="llm_calls")
op.drop_index("llm_calls_interactive_session_id_ts_idx", table_name="llm_calls")
op.drop_table("llm_calls")
op.drop_table("interactive_sessions")
op.drop_table("budget_ledger")
op.drop_table("agent_personas")
# ### end Alembic commands ###

View File

@@ -0,0 +1,638 @@
"""add active-run partial unique index and FK constraints
Revision ID: 839f2233e346
Revises: 79945fdc2649
Create Date: 2026-05-15 18:51:14.343577
Notes:
- P0-1: Adds partial unique index ux_active_run_repo_base on runs(repo_path, base_branch)
WHERE state NOT IN ('completed', 'failed', 'aborted'). SQLAlchemy autogenerate
cannot detect sqlite_where clauses, so this index is managed manually.
- P0-3: Adds FK constraints that were missing in the baseline migration:
* runs.template_id -> workflow_templates.id RESTRICT
* run_bindings.persona_id -> agent_personas.id RESTRICT
* interactive_sessions.persona_id -> agent_personas.id RESTRICT
* run_events.phase_id -> run_phases.id CASCADE
* approval_requests.phase_id -> run_phases.id CASCADE
* artifacts.phase_id -> run_phases.id CASCADE
* tool_calls.run_id -> runs.id CASCADE
* tool_calls.phase_id -> run_phases.id CASCADE
* tool_calls.interactive_session_id -> interactive_sessions.id CASCADE
* llm_calls.run_id -> runs.id CASCADE
* llm_calls.phase_id -> run_phases.id CASCADE
* llm_calls.interactive_session_id -> interactive_sessions.id CASCADE
* phase_feedback.run_id -> runs.id CASCADE
* phase_feedback.phase_id -> run_phases.id CASCADE
- runs.current_phase_id intentionally has NO FK: it forms a circular reference with
run_phases.run_id. SQLite does not support deferrable FK constraints in the same
way as PostgreSQL, so referential integrity for this column is enforced by
application code rather than the database.
- SQLite does not support ADD CONSTRAINT via ALTER TABLE. All FK additions are done
by recreating the affected tables (copy-data-drop-rename pattern).
"""
from __future__ import annotations
from collections.abc import Sequence
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "839f2233e346"
down_revision: str | Sequence[str] | None = "79945fdc2649"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None
def upgrade() -> None:
"""Upgrade schema.
SQLite does not support ALTER TABLE ... ADD CONSTRAINT, so each table that needs
a new FK is rebuilt using the standard SQLite table-rename pattern:
1. Disable FK enforcement during rebuild (PRAGMA foreign_keys=OFF).
2. Create new table with correct FK constraints.
3. Copy data from old table.
4. Drop old table.
5. Rename new table to original name.
6. Re-enable FK enforcement (PRAGMA foreign_keys=ON).
Indexes and unique constraints referencing the old table are also recreated.
"""
# Disable FK enforcement during table rebuild to avoid constraint violations
# while the old tables (with no FK columns) are temporarily inconsistent.
op.execute("PRAGMA foreign_keys=OFF")
# ------------------------------------------------------------------
# runs: add template_id FK (RESTRICT) + P0-1 partial unique index.
# Rebuild because SQLite cannot ADD CONSTRAINT.
# The partial unique index is created after the rebuild (not before)
# because DROP TABLE would destroy any pre-existing index on the old table.
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE runs_new (
id TEXT NOT NULL,
template_id TEXT NOT NULL
REFERENCES workflow_templates (id) ON DELETE RESTRICT,
template_hash TEXT NOT NULL,
state TEXT NOT NULL,
repo_path TEXT NOT NULL,
base_branch TEXT NOT NULL,
worktree_root TEXT NOT NULL,
current_phase_id TEXT,
started_at TEXT,
ended_at TEXT,
final_report_path TEXT,
paused_from_state TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
PRIMARY KEY (id)
)
"""
)
op.execute(
"INSERT INTO runs_new SELECT id, template_id, template_hash, state, "
"repo_path, base_branch, worktree_root, current_phase_id, "
"started_at, ended_at, final_report_path, paused_from_state, "
"created_at, updated_at FROM runs"
)
op.execute("DROP TABLE runs")
op.execute("ALTER TABLE runs_new RENAME TO runs")
# P0-1: partial unique index — created after the rebuild.
op.execute(
"CREATE UNIQUE INDEX ux_active_run_repo_base "
"ON runs (repo_path, base_branch) "
"WHERE state NOT IN ('completed', 'failed', 'aborted')"
)
# ------------------------------------------------------------------
# run_bindings: add persona_id FK (RESTRICT)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE run_bindings_new (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
role_id TEXT NOT NULL,
persona_id TEXT NOT NULL
REFERENCES agent_personas (id) ON DELETE RESTRICT,
persona_hash TEXT NOT NULL,
backend TEXT NOT NULL,
binding_hash TEXT NOT NULL,
PRIMARY KEY (id),
UNIQUE (run_id, role_id)
)
"""
)
op.execute(
"INSERT INTO run_bindings_new SELECT id, run_id, role_id, persona_id, "
"persona_hash, backend, binding_hash FROM run_bindings"
)
op.execute("DROP TABLE run_bindings")
op.execute("ALTER TABLE run_bindings_new RENAME TO run_bindings")
# ------------------------------------------------------------------
# interactive_sessions: add persona_id FK (RESTRICT)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE interactive_sessions_new (
id TEXT NOT NULL,
persona_id TEXT NOT NULL
REFERENCES agent_personas (id) ON DELETE RESTRICT,
persona_hash TEXT NOT NULL,
started_at TEXT,
ended_at TEXT,
last_message_at TEXT,
state TEXT NOT NULL,
PRIMARY KEY (id)
)
"""
)
op.execute(
"INSERT INTO interactive_sessions_new SELECT id, persona_id, persona_hash, "
"started_at, ended_at, last_message_at, state FROM interactive_sessions"
)
op.execute("DROP TABLE interactive_sessions")
op.execute("ALTER TABLE interactive_sessions_new RENAME TO interactive_sessions")
# ------------------------------------------------------------------
# run_events: add phase_id FK (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE run_events_new (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT
REFERENCES run_phases (id) ON DELETE CASCADE,
seq INTEGER NOT NULL,
type TEXT NOT NULL,
payload JSON NOT NULL,
idempotency_key TEXT NOT NULL,
ts TEXT NOT NULL,
UNIQUE (run_id, seq),
UNIQUE (run_id, idempotency_key)
)
"""
)
op.execute(
"INSERT INTO run_events_new SELECT id, run_id, phase_id, seq, type, "
"payload, idempotency_key, ts FROM run_events"
)
op.execute("DROP INDEX IF EXISTS run_events_run_id_ts_idx")
op.execute("DROP TABLE run_events")
op.execute("ALTER TABLE run_events_new RENAME TO run_events")
op.execute("CREATE INDEX run_events_run_id_ts_idx ON run_events (run_id, ts)")
# ------------------------------------------------------------------
# approval_requests: add phase_id FK (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE approval_requests_new (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT
REFERENCES run_phases (id) ON DELETE CASCADE,
gate_key TEXT NOT NULL,
state TEXT NOT NULL,
idempotency_key TEXT NOT NULL,
payload JSON NOT NULL,
created_at TEXT NOT NULL,
resolved_at TEXT,
PRIMARY KEY (id),
UNIQUE (idempotency_key)
)
"""
)
op.execute(
"INSERT INTO approval_requests_new SELECT id, run_id, phase_id, gate_key, "
"state, idempotency_key, payload, created_at, resolved_at FROM approval_requests"
)
op.execute("DROP TABLE approval_requests")
op.execute("ALTER TABLE approval_requests_new RENAME TO approval_requests")
# ------------------------------------------------------------------
# artifacts: add phase_id FK (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE artifacts_new (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT
REFERENCES run_phases (id) ON DELETE CASCADE,
path TEXT NOT NULL,
schema_id TEXT NOT NULL,
hash TEXT NOT NULL,
valid INTEGER NOT NULL,
validation_error JSON,
created_at TEXT NOT NULL,
PRIMARY KEY (id),
UNIQUE (run_id, path, hash)
)
"""
)
op.execute(
"INSERT INTO artifacts_new SELECT id, run_id, phase_id, path, schema_id, "
"hash, valid, validation_error, created_at FROM artifacts"
)
op.execute("DROP TABLE artifacts")
op.execute("ALTER TABLE artifacts_new RENAME TO artifacts")
# ------------------------------------------------------------------
# tool_calls: add run_id / phase_id / interactive_session_id FKs (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE tool_calls_new (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT
REFERENCES run_phases (id) ON DELETE CASCADE,
interactive_session_id TEXT
REFERENCES interactive_sessions (id) ON DELETE CASCADE,
tool_name TEXT NOT NULL,
args JSON NOT NULL,
result JSON,
error TEXT,
duration_ms INTEGER NOT NULL,
ts TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO tool_calls_new SELECT id, run_id, phase_id, interactive_session_id, "
"tool_name, args, result, error, duration_ms, ts FROM tool_calls"
)
op.execute("DROP INDEX IF EXISTS tool_calls_run_id_ts_idx")
op.execute("DROP TABLE tool_calls")
op.execute("ALTER TABLE tool_calls_new RENAME TO tool_calls")
op.execute("CREATE INDEX tool_calls_run_id_ts_idx ON tool_calls (run_id, ts)")
# ------------------------------------------------------------------
# llm_calls: add run_id / phase_id / interactive_session_id FKs (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE llm_calls_new (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT
REFERENCES run_phases (id) ON DELETE CASCADE,
interactive_session_id TEXT
REFERENCES interactive_sessions (id) ON DELETE CASCADE,
thread_id TEXT NOT NULL,
persona_name TEXT NOT NULL,
persona_version INTEGER NOT NULL,
model TEXT NOT NULL,
role TEXT NOT NULL,
turn_index INTEGER NOT NULL,
input_tokens INTEGER NOT NULL,
output_tokens INTEGER NOT NULL,
cached_tokens INTEGER NOT NULL,
reasoning_tokens INTEGER NOT NULL,
cost_usd_input REAL NOT NULL,
cost_usd_output REAL NOT NULL,
cost_usd_total REAL NOT NULL,
latency_ms INTEGER NOT NULL,
status TEXT NOT NULL,
error_code TEXT,
request_id TEXT,
ts TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO llm_calls_new SELECT id, run_id, phase_id, interactive_session_id, "
"thread_id, persona_name, persona_version, model, role, turn_index, "
"input_tokens, output_tokens, cached_tokens, reasoning_tokens, "
"cost_usd_input, cost_usd_output, cost_usd_total, latency_ms, status, "
"error_code, request_id, ts FROM llm_calls"
)
op.execute("DROP INDEX IF EXISTS llm_calls_run_id_ts_idx")
op.execute("DROP INDEX IF EXISTS llm_calls_interactive_session_id_ts_idx")
op.execute("DROP INDEX IF EXISTS llm_calls_model_ts_idx")
op.execute("DROP TABLE llm_calls")
op.execute("ALTER TABLE llm_calls_new RENAME TO llm_calls")
op.execute("CREATE INDEX llm_calls_run_id_ts_idx ON llm_calls (run_id, ts)")
op.execute(
"CREATE INDEX llm_calls_interactive_session_id_ts_idx "
"ON llm_calls (interactive_session_id, ts)"
)
op.execute("CREATE INDEX llm_calls_model_ts_idx ON llm_calls (model, ts)")
# ------------------------------------------------------------------
# phase_feedback: add run_id / phase_id FKs (CASCADE)
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE phase_feedback_new (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT NOT NULL
REFERENCES run_phases (id) ON DELETE CASCADE,
reaction TEXT,
comment TEXT,
created_at TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO phase_feedback_new SELECT id, run_id, phase_id, "
"reaction, comment, created_at FROM phase_feedback"
)
op.execute("DROP TABLE phase_feedback")
op.execute("ALTER TABLE phase_feedback_new RENAME TO phase_feedback")
# Re-enable FK enforcement now that all tables have been rebuilt.
op.execute("PRAGMA foreign_keys=ON")
def downgrade() -> None:
"""Downgrade schema.
Reverses all FK additions and drops the partial unique index.
Tables that were rebuilt are reverted to their pre-upgrade structure
(no FK constraints on the affected columns).
"""
op.execute("PRAGMA foreign_keys=OFF")
# ------------------------------------------------------------------
# Revert phase_feedback
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE phase_feedback_old (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL,
phase_id TEXT NOT NULL,
reaction TEXT,
comment TEXT,
created_at TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO phase_feedback_old SELECT id, run_id, phase_id, "
"reaction, comment, created_at FROM phase_feedback"
)
op.execute("DROP TABLE phase_feedback")
op.execute("ALTER TABLE phase_feedback_old RENAME TO phase_feedback")
# ------------------------------------------------------------------
# Revert llm_calls
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE llm_calls_old (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT,
phase_id TEXT,
interactive_session_id TEXT,
thread_id TEXT NOT NULL,
persona_name TEXT NOT NULL,
persona_version INTEGER NOT NULL,
model TEXT NOT NULL,
role TEXT NOT NULL,
turn_index INTEGER NOT NULL,
input_tokens INTEGER NOT NULL,
output_tokens INTEGER NOT NULL,
cached_tokens INTEGER NOT NULL,
reasoning_tokens INTEGER NOT NULL,
cost_usd_input REAL NOT NULL,
cost_usd_output REAL NOT NULL,
cost_usd_total REAL NOT NULL,
latency_ms INTEGER NOT NULL,
status TEXT NOT NULL,
error_code TEXT,
request_id TEXT,
ts TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO llm_calls_old SELECT id, run_id, phase_id, interactive_session_id, "
"thread_id, persona_name, persona_version, model, role, turn_index, "
"input_tokens, output_tokens, cached_tokens, reasoning_tokens, "
"cost_usd_input, cost_usd_output, cost_usd_total, latency_ms, status, "
"error_code, request_id, ts FROM llm_calls"
)
op.execute("DROP INDEX IF EXISTS llm_calls_run_id_ts_idx")
op.execute("DROP INDEX IF EXISTS llm_calls_interactive_session_id_ts_idx")
op.execute("DROP INDEX IF EXISTS llm_calls_model_ts_idx")
op.execute("DROP TABLE llm_calls")
op.execute("ALTER TABLE llm_calls_old RENAME TO llm_calls")
op.execute("CREATE INDEX llm_calls_run_id_ts_idx ON llm_calls (run_id, ts)")
op.execute(
"CREATE INDEX llm_calls_interactive_session_id_ts_idx "
"ON llm_calls (interactive_session_id, ts)"
)
op.execute("CREATE INDEX llm_calls_model_ts_idx ON llm_calls (model, ts)")
# ------------------------------------------------------------------
# Revert tool_calls
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE tool_calls_old (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT,
phase_id TEXT,
interactive_session_id TEXT,
tool_name TEXT NOT NULL,
args JSON NOT NULL,
result JSON,
error TEXT,
duration_ms INTEGER NOT NULL,
ts TEXT NOT NULL
)
"""
)
op.execute(
"INSERT INTO tool_calls_old SELECT id, run_id, phase_id, interactive_session_id, "
"tool_name, args, result, error, duration_ms, ts FROM tool_calls"
)
op.execute("DROP INDEX IF EXISTS tool_calls_run_id_ts_idx")
op.execute("DROP TABLE tool_calls")
op.execute("ALTER TABLE tool_calls_old RENAME TO tool_calls")
op.execute("CREATE INDEX tool_calls_run_id_ts_idx ON tool_calls (run_id, ts)")
# ------------------------------------------------------------------
# Revert artifacts
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE artifacts_old (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT,
path TEXT NOT NULL,
schema_id TEXT NOT NULL,
hash TEXT NOT NULL,
valid INTEGER NOT NULL,
validation_error JSON,
created_at TEXT NOT NULL,
PRIMARY KEY (id),
UNIQUE (run_id, path, hash)
)
"""
)
op.execute(
"INSERT INTO artifacts_old SELECT id, run_id, phase_id, path, schema_id, "
"hash, valid, validation_error, created_at FROM artifacts"
)
op.execute("DROP TABLE artifacts")
op.execute("ALTER TABLE artifacts_old RENAME TO artifacts")
# ------------------------------------------------------------------
# Revert approval_requests
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE approval_requests_old (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT,
gate_key TEXT NOT NULL,
state TEXT NOT NULL,
idempotency_key TEXT NOT NULL,
payload JSON NOT NULL,
created_at TEXT NOT NULL,
resolved_at TEXT,
PRIMARY KEY (id),
UNIQUE (idempotency_key)
)
"""
)
op.execute(
"INSERT INTO approval_requests_old SELECT id, run_id, phase_id, gate_key, "
"state, idempotency_key, payload, created_at, resolved_at FROM approval_requests"
)
op.execute("DROP TABLE approval_requests")
op.execute("ALTER TABLE approval_requests_old RENAME TO approval_requests")
# ------------------------------------------------------------------
# Revert run_events
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE run_events_old (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
phase_id TEXT,
seq INTEGER NOT NULL,
type TEXT NOT NULL,
payload JSON NOT NULL,
idempotency_key TEXT NOT NULL,
ts TEXT NOT NULL,
UNIQUE (run_id, seq),
UNIQUE (run_id, idempotency_key)
)
"""
)
op.execute(
"INSERT INTO run_events_old SELECT id, run_id, phase_id, seq, type, "
"payload, idempotency_key, ts FROM run_events"
)
op.execute("DROP INDEX IF EXISTS run_events_run_id_ts_idx")
op.execute("DROP TABLE run_events")
op.execute("ALTER TABLE run_events_old RENAME TO run_events")
op.execute("CREATE INDEX run_events_run_id_ts_idx ON run_events (run_id, ts)")
# ------------------------------------------------------------------
# Revert interactive_sessions
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE interactive_sessions_old (
id TEXT NOT NULL,
persona_id TEXT NOT NULL,
persona_hash TEXT NOT NULL,
started_at TEXT,
ended_at TEXT,
last_message_at TEXT,
state TEXT NOT NULL,
PRIMARY KEY (id)
)
"""
)
op.execute(
"INSERT INTO interactive_sessions_old SELECT id, persona_id, persona_hash, "
"started_at, ended_at, last_message_at, state FROM interactive_sessions"
)
op.execute("DROP TABLE interactive_sessions")
op.execute("ALTER TABLE interactive_sessions_old RENAME TO interactive_sessions")
# ------------------------------------------------------------------
# Revert run_bindings
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE run_bindings_old (
id TEXT NOT NULL,
run_id TEXT NOT NULL
REFERENCES runs (id) ON DELETE CASCADE,
role_id TEXT NOT NULL,
persona_id TEXT NOT NULL,
persona_hash TEXT NOT NULL,
backend TEXT NOT NULL,
binding_hash TEXT NOT NULL,
PRIMARY KEY (id),
UNIQUE (run_id, role_id)
)
"""
)
op.execute(
"INSERT INTO run_bindings_old SELECT id, run_id, role_id, persona_id, "
"persona_hash, backend, binding_hash FROM run_bindings"
)
op.execute("DROP TABLE run_bindings")
op.execute("ALTER TABLE run_bindings_old RENAME TO run_bindings")
# ------------------------------------------------------------------
# Revert runs (remove template_id FK)
# ------------------------------------------------------------------
op.execute("DROP INDEX IF EXISTS ux_active_run_repo_base")
op.execute(
"""
CREATE TABLE runs_old (
id TEXT NOT NULL,
template_id TEXT NOT NULL,
template_hash TEXT NOT NULL,
state TEXT NOT NULL,
repo_path TEXT NOT NULL,
base_branch TEXT NOT NULL,
worktree_root TEXT NOT NULL,
current_phase_id TEXT,
started_at TEXT,
ended_at TEXT,
final_report_path TEXT,
paused_from_state TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
PRIMARY KEY (id)
)
"""
)
op.execute(
"INSERT INTO runs_old SELECT id, template_id, template_hash, state, "
"repo_path, base_branch, worktree_root, current_phase_id, "
"started_at, ended_at, final_report_path, paused_from_state, "
"created_at, updated_at FROM runs"
)
op.execute("DROP TABLE runs")
op.execute("ALTER TABLE runs_old RENAME TO runs")
op.execute("PRAGMA foreign_keys=ON")

Binary file not shown.

View File

View File

@@ -0,0 +1,114 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "common/final-report@1",
"title": "Common Final Report",
"description": "워크플로 실행 최종 보고서",
"type": "object",
"required": ["runId", "templateHash", "status", "phases", "endedAt"],
"additionalProperties": false,
"properties": {
"runId": {
"type": "string",
"format": "uuid",
"description": "실행 고유 식별자 (UUID)"
},
"templateHash": {
"type": "string",
"pattern": "^[a-f0-9]{64}$",
"description": "워크플로 템플릿의 sha256 해시 (hex)"
},
"status": {
"type": "string",
"enum": ["completed", "failed", "aborted"],
"description": "실행 최종 상태"
},
"inputs": {
"type": "object",
"description": "실행 입력값 (선택)"
},
"phases": {
"type": "array",
"items": {
"type": "object",
"required": ["key", "state"],
"additionalProperties": false,
"properties": {
"key": {
"type": "string",
"description": "phase 키"
},
"state": {
"type": "string",
"enum": ["pending", "running", "completed", "failed", "skipped"],
"description": "phase 실행 상태"
},
"started_at": {
"type": "string",
"format": "date-time",
"description": "시작 시각 (선택)"
},
"ended_at": {
"type": "string",
"format": "date-time",
"description": "종료 시각 (선택)"
},
"attempts": {
"type": "integer",
"minimum": 0,
"description": "시도 횟수 (선택)"
}
}
},
"description": "각 phase 실행 기록"
},
"approvals": {
"type": "array",
"items": {
"type": "object"
},
"description": "승인 기록 목록 (선택)"
},
"findings": {
"type": "array",
"items": {
"type": "object"
},
"description": "수집된 finding 목록 (선택)"
},
"artifacts": {
"type": "array",
"items": {
"type": "object",
"required": ["path", "schema"],
"additionalProperties": false,
"properties": {
"path": {
"type": "string",
"description": "산출물 파일 경로"
},
"schema": {
"type": "string",
"description": "산출물 JSON Schema ID"
},
"hash": {
"type": "string",
"description": "산출물 파일 해시 (선택)"
}
}
},
"description": "생성된 산출물 목록 (선택)"
},
"unresolved": {
"type": "array",
"items": {
"type": "string"
},
"description": "미해결 항목 목록 (선택)"
},
"endedAt": {
"type": "string",
"format": "date-time",
"description": "실행 종료 시각"
}
}
}

View File

@@ -0,0 +1,80 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "dev/phase-plan@1",
"title": "Dev Phase Plan",
"description": "실행 단계 계획 (spec 기반 phase 분해)",
"type": "object",
"required": ["runId", "phaseKey", "phases"],
"additionalProperties": false,
"properties": {
"runId": {
"type": "string",
"format": "uuid",
"description": "실행 고유 식별자 (spec.json과 동일한 UUID)"
},
"phaseKey": {
"type": "string",
"minLength": 1,
"description": "현재 phase 키 (통상 planning)"
},
"phases": {
"type": "array",
"items": {
"type": "object",
"required": ["key", "title", "role", "instructions"],
"additionalProperties": false,
"properties": {
"key": {
"type": "string",
"pattern": "^[a-z][a-z0-9-]*$",
"description": "단계 고유 식별자 (영소문자, 하이픈 허용)"
},
"title": {
"type": "string",
"minLength": 1,
"description": "단계 제목"
},
"role": {
"type": "string",
"minLength": 1,
"description": "담당 역할 ID"
},
"instructions": {
"type": "string",
"minLength": 10,
"description": "담당자에 대한 구체적인 지시사항"
},
"expected_artifact": {
"type": "object",
"required": ["path", "schema"],
"additionalProperties": false,
"properties": {
"path": {
"type": "string",
"description": "산출물 파일 경로"
},
"schema": {
"type": "string",
"description": "산출물 JSON Schema ID"
}
},
"description": "이 단계에서 생성할 산출물 (선택)"
},
"depends_on": {
"type": "array",
"items": {
"type": "string"
},
"description": "이 단계 실행 전에 완료돼야 할 선행 단계 키 목록 (선택)"
}
}
},
"description": "실행 단계 목록"
},
"estimated_duration_hours": {
"type": "number",
"minimum": 0,
"description": "전체 예상 소요 시간 (시간 단위, 선택)"
}
}
}

View File

@@ -0,0 +1,76 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "dev/review-finding-batch@1",
"title": "Dev Review Finding Batch",
"description": "코드 리뷰 또는 검증 결과 finding 묶음",
"type": "object",
"required": ["runId", "phaseKey", "reviewerRole", "findings", "summary"],
"additionalProperties": false,
"properties": {
"runId": {
"type": "string",
"format": "uuid",
"description": "실행 고유 식별자 (UUID)"
},
"phaseKey": {
"type": "string",
"minLength": 1,
"description": "현재 phase 키 (예: review, verify)"
},
"reviewerRole": {
"type": "string",
"minLength": 1,
"description": "리뷰어 역할 (예: code-reviewer, verifier, security-auditor)"
},
"findings": {
"type": "array",
"items": {
"type": "object",
"required": ["severity", "category", "summary"],
"additionalProperties": false,
"properties": {
"severity": {
"type": "string",
"enum": ["info", "low", "medium", "high", "critical"],
"description": "심각도"
},
"category": {
"type": "string",
"enum": ["correctness", "evidence", "style", "security", "performance", "other"],
"description": "finding 카테고리"
},
"summary": {
"type": "string",
"minLength": 1,
"description": "문제 요약 (보안 finding은 OWASP 카테고리 prefix 권장)"
},
"filePath": {
"type": "string",
"description": "해당 파일 경로 (선택)"
},
"line": {
"type": "integer",
"minimum": 1,
"description": "해당 라인 번호 (선택)"
},
"evidence": {
"type": "string",
"description": "증거 코드 또는 설명 (선택)"
},
"verifierStatus": {
"type": "string",
"enum": ["unverified", "confirmed", "rejected"],
"default": "unverified",
"description": "verifier의 검증 상태"
}
}
},
"description": "발견된 finding 목록"
},
"summary": {
"type": "string",
"minLength": 10,
"description": "전체 리뷰 요약"
}
}
}

View File

@@ -0,0 +1,46 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "dev/spec@1",
"title": "Dev Spec",
"description": "요구사항 분석 및 구현 접근법 명세",
"type": "object",
"required": ["runId", "phaseKey", "requirements", "acceptance_criteria", "approach", "risks"],
"additionalProperties": false,
"properties": {
"runId": {
"type": "string",
"format": "uuid",
"description": "실행 고유 식별자 (UUID)"
},
"phaseKey": {
"type": "string",
"minLength": 1,
"description": "현재 phase 키 (예: spec, diagnose, fix)"
},
"requirements": {
"type": "string",
"minLength": 10,
"description": "요구사항 상세 설명"
},
"acceptance_criteria": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"description": "수락 기준 목록 (측정 가능하고 검증 가능해야 함)"
},
"approach": {
"type": "string",
"minLength": 10,
"description": "구현 또는 접근 방법 설명"
},
"risks": {
"type": "array",
"items": {
"type": "string"
},
"description": "위험 요소 목록 (없으면 빈 배열)"
}
}
}

View File

@@ -0,0 +1,54 @@
name: default-interactive
version: 1
description: "interactive 모드 만능 어시스턴트. 탐색·수정·실행 모두 지원."
backend: openrouter
model: "openrouter:anthropic/claude-haiku-4-5"
provider_origin: "US/Anthropic"
capabilities:
- spec_write
- code_edit
- code_review
- evidence_check
- command_execute
max_risk_level: high
system_prompt: |
당신은 my-deepagent의 기본 interactive 어시스턴트입니다. 한국어로 대화합니다.
## 역할
사용자의 요청을 받아 코드 탐색, 수정, 실행 안내를 모두 수행합니다.
## deepagents 도구 사용법
- write_todos: 작업을 시작하기 전 반드시 write_todos로 계획을 번호 목록으로 작성합니다.
- read_file: 코드 파일을 읽어 현재 상태를 파악합니다.
- glob: 파일 패턴으로 관련 파일 목록을 찾습니다.
- grep: 특정 패턴을 코드베이스에서 검색합니다.
- edit_file: 기존 파일을 수정합니다. 변경 범위는 최소화합니다.
- write_file: 새 파일을 작성합니다.
- task: 복잡한 하위 작업을 subagent에게 위임합니다.
- execute: 명령어 실행이 필요할 때 사용자에게 안내합니다.
## 행동 원칙
- 항상 read_file/glob/grep으로 기존 코드를 파악한 뒤 수정합니다.
- 큰 변경은 write_todos로 단계별 계획 후 진행합니다.
- 완료 전 계획의 모든 항목이 구현됐는지 확인합니다.
- 모르면 솔직하게 말하고 사용자와 방향을 결정합니다.
allowed_tools:
- read_file
- write_file
- edit_file
- ls
- glob
- grep
- write_todos
- task
deepagents_backend: local_shell
fallback_model: "openrouter:deepseek/deepseek-chat"
max_cost_per_call_usd: 0.05
model_params:
max_tokens: 2048
temperature: 0.3
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,66 @@
name: openrouter-claude-architect
version: 1
description: "시니어 아키텍트. 스택 선정·큰 리팩토링·데이터 모델 변경. 항상 trade-off 명시."
backend: openrouter
model: "openrouter:anthropic/claude-opus-4-1"
provider_origin: "US/Anthropic"
capabilities:
- spec_write
- phase_planning
- code_edit
max_risk_level: high
system_prompt: |
당신은 my-deepagent의 시니어 Architect입니다. 한국어로 대화합니다.
## 역할
크고 위험한 기술적 결정을 담당합니다:
- 기술 스택 선정 및 변경
- 대규모 리팩토링 계획
- 데이터 모델 설계 및 변경
- 시스템 경계 및 인터페이스 설계
## deepagents 도구 사용법
- write_todos: 반드시 먼저 분석 범위와 의사결정 기준을 write_todos로 작성합니다.
- read_file: 기존 아키텍처·설정·코드를 충분히 읽습니다.
- glob: 전체 프로젝트 구조를 파악합니다.
- grep: 의존성·패턴·사용처를 검색합니다.
- write_file: 아키텍처 결정 기록(ADR)을 artifacts/에 저장합니다.
- edit_file: 아키텍처 레벨의 코드 변경을 수행합니다.
- task: 구체적인 구현은 code-editor 또는 다른 전문 subagent에게 위임합니다.
## 의사결정 원칙
- 모든 결정에 trade-off를 명시합니다.
- 항상 대안 2~3개를 제시하고 선택 이유를 설명합니다.
- "지금 당장은 과도하지만 나중에 필요할 것" 같은 추측 기반 결정은 하지 않습니다.
- 결정 전 충분한 근거를 read_file/grep으로 수집합니다.
- 불가역적 변경은 사용자 승인 후 진행합니다.
## 보고 형식
결정 사항:
선택: [선택한 접근법]
이유: [구체적 근거]
대안 A: [접근법] — trade-off: [장단점]
대안 B: [접근법] — trade-off: [장단점]
리스크: [알려진 위험]
allowed_tools:
- read_file
- write_file
- edit_file
- ls
- glob
- grep
- write_todos
- task
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-sonnet-4-6"
max_cost_per_call_usd: 0.50
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false
task:
allowed_decisions: [approve, reject]

View File

@@ -0,0 +1,54 @@
name: openrouter-claude-code-editor
version: 1
description: "코드 수정 전문. read → plan → edit → verify 순서 엄수."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- code_edit
- test_first_development
- command_execute
max_risk_level: medium
system_prompt: |
당신은 my-deepagent의 Code Editor입니다. 한국어로 대화합니다.
## 역할
코드를 안전하고 정확하게 수정합니다. 항상 컨텍스트 파악 → 계획 → 수정 → 검증 순서를 지킵니다.
## deepagents 도구 사용법
- read_file: 수정할 파일과 관련 파일을 반드시 먼저 읽습니다.
- glob: 수정에 영향받는 파일들을 검색합니다.
- grep: 함수·변수 사용처를 검색해 영향 범위를 파악합니다.
- write_todos: 컨텍스트 파악 후 반드시 번호 목록으로 수정 계획을 작성합니다.
- edit_file: 기존 파일의 일부를 수정합니다. 최소한의 변경만 합니다.
- write_file: 새 파일을 작성하거나 전체를 새로 작성할 때 사용합니다.
- task: 복잡한 하위 작업을 subagent에게 위임합니다.
- execute: 테스트 실행 명령어를 사용자에게 안내합니다.
## 코드 수정 원칙
- 수정 전 반드시 read_file로 현재 코드를 파악합니다.
- write_todos로 계획 작성 후 단계별로 수정합니다.
- 한 번에 너무 큰 변경은 금지합니다. 단계적으로 진행합니다.
- test_first_development: 수정 전 테스트 케이스를 먼저 작성합니다.
- 수정 후 execute로 테스트 실행을 안내합니다.
- TODO, FIXME, 스텁 코드는 완성 전에 완료 선언하지 않습니다.
allowed_tools:
- read_file
- write_file
- edit_file
- ls
- glob
- grep
- write_todos
- task
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.15
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,75 @@
name: openrouter-claude-code-reviewer
version: 1
description: "시니어 코드 리뷰어. dev/review-finding-batch@1 형식으로 review.json 작성."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- code_review
- evidence_check
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 시니어 Code Reviewer입니다. 한국어로 대화합니다.
## 역할
코드를 꼼꼼히 리뷰하고 dev/review-finding-batch@1 JSON Schema에 맞는 review.json을 작성합니다.
보안 관련 항목은 security-auditor subagent에게 task로 위임합니다.
## deepagents 도구 사용법
- write_todos: 리뷰 시작 전 반드시 번호 목록으로 리뷰 계획을 작성합니다.
- read_file: 리뷰할 파일들을 읽습니다.
- glob: 리뷰 대상 파일 목록을 검색합니다.
- grep: 패턴 검색으로 문제 가능성이 있는 코드를 찾습니다.
- write_file: 완성된 review.json을 artifacts/review.json에 작성합니다.
- task: 보안 리뷰는 security-auditor subagent에게 위임합니다.
## review.json 작성 규칙
- runId: UUID 형식
- phaseKey: "review"
- reviewerRole: "code-reviewer"
- findings: 발견된 문제 목록
- severity: info | low | medium | high | critical
- category: correctness | evidence | style | security | performance | other
- summary: 문제 요약 (구체적으로)
- filePath: 해당 파일 경로 (선택)
- line: 해당 라인 번호 (선택)
- evidence: 증거 코드 또는 설명 (선택)
- verifierStatus: "unverified" (초기값)
- summary: 전체 리뷰 요약 (10자 이상)
## 리뷰 원칙
- 증거(evidence) 없는 주관적 비판은 하지 않습니다.
- 각 finding은 구체적인 파일 경로와 라인 번호를 포함합니다.
- 보안 이슈는 task로 security-auditor에게 위임합니다.
- 완성된 리뷰는 반드시 write_file로 artifacts/review.json에 저장합니다.
allowed_tools:
- read_file
- ls
- glob
- grep
- write_todos
- write_file
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.10
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
subagents:
- name: security-auditor
description: "보안 관점 격리 리뷰. OWASP 카테고리 사용."
system_prompt: |
당신은 보안 리뷰 전문 subagent입니다. 한국어로 대화합니다.
코드를 OWASP 관점에서 검토하고 보안 이슈를 finding으로 보고합니다.
각 finding의 summary 앞에 반드시 OWASP 카테고리 prefix를 붙입니다.
예: "[A01:Broken Access Control] 관리자 엔드포인트에 인증이 없음"
allowed_tools:
- read_file
- glob
- grep
model: "openrouter:anthropic/claude-sonnet-4-6"
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,54 @@
name: openrouter-claude-debugger
version: 1
description: "버그 진단 전문. 재현 → 가설 → 검증 → 수정 순서 엄수."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- code_edit
- evidence_check
- command_execute
max_risk_level: medium
system_prompt: |
당신은 my-deepagent의 Debugger입니다. 한국어로 대화합니다.
## 역할
버그를 체계적으로 진단하고 수정합니다.
항상 재현 → 가설 수립 → 가설 검증 → 수정 순서를 지킵니다.
## deepagents 도구 사용법
- write_todos: 디버깅 시작 전 반드시 재현 조건·가설·검증 계획을 작성합니다.
- read_file: 버그가 발생한 파일과 관련 파일을 읽습니다.
- glob: 영향받는 파일 범위를 검색합니다.
- grep: 에러 메시지, 함수명, 변수명으로 관련 코드를 검색합니다.
- execute: 테스트·로그 확인 명령어를 사용자에게 안내합니다.
- edit_file: 최소한의 변경으로 버그를 수정합니다.
- write_file: 재현 스크립트 또는 진단 결과를 저장합니다.
- task: 로그 분석이 필요할 때 log-analyzer subagent에게 위임합니다.
## 디버깅 원칙
- 추측만으로 수정하지 않습니다. 반드시 가설을 검증합니다.
- 여러 가설이 있을 때는 가장 단순한 것부터 검증합니다.
- root cause를 dev/spec@1 형식으로 artifacts/diagnosis.json에 문서화합니다.
- 수정 후 execute로 회귀 테스트 실행을 안내합니다.
- "버그를 고쳤다"고 하려면 테스트로 검증이 완료돼야 합니다.
allowed_tools:
- read_file
- write_file
- edit_file
- ls
- glob
- grep
- write_todos
- task
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.15
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,58 @@
name: openrouter-claude-phase-planner
version: 1
description: "spec을 읽고 dev/phase-plan@1 형식으로 실행 단계 계획 작성."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- phase_planning
- task_dag_planning
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 Phase Planner입니다. 한국어로 대화합니다.
## 역할
artifacts/spec.json을 읽고 dev/phase-plan@1 JSON Schema에 맞는 phase-plan.json을 작성합니다.
## deepagents 도구 사용법
- write_todos: 작업 시작 전 반드시 번호 목록으로 계획을 작성합니다.
- read_file: artifacts/spec.json 및 관련 문서를 읽습니다.
- glob: 관련 파일을 검색합니다.
- grep: 코드베이스에서 패턴을 검색합니다.
- write_file: 완성된 phase-plan.json을 artifacts/phase-plan.json에 작성합니다.
## phase-plan.json 작성 규칙
- runId: spec.json과 동일한 UUID 사용
- phaseKey: "planning"
- phases: 각 실행 단계 배열
- key: 단계 고유 식별자 (영소문자-하이픈)
- title: 단계 제목
- role: 담당 역할 (spec_writer | reviewer | verifier | debugger | fixer 등)
- instructions: 해당 단계의 구체적인 지시사항
- expected_artifact: 선택사항 (path, schema)
- depends_on: 선택사항 (선행 단계 키 목록)
- estimated_duration_hours: 전체 예상 소요 시간 (선택사항)
## 행동 원칙
- spec의 acceptance_criteria를 단계별로 달성할 수 있게 phase를 설계합니다.
- 병렬 실행 가능한 단계는 depends_on 없이 배치합니다.
- 각 phase의 instructions는 담당자가 명확히 이해할 수 있도록 구체적으로 작성합니다.
- 완성된 plan은 반드시 write_file로 artifacts/phase-plan.json에 저장합니다.
allowed_tools:
- read_file
- write_file
- ls
- glob
- grep
- write_todos
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.10
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,61 @@
name: openrouter-claude-security-auditor
version: 1
description: "보안 전문 리뷰어. OWASP Top 10 기준 인증·권한·입력검증·비밀유출 중심."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- code_review
- evidence_check
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 Security Auditor입니다. 한국어로 대화합니다.
## 역할
코드를 OWASP Top 10 기준으로 보안 취약점을 분석하고 review.json을 작성합니다.
## 집중 영역
- A01: Broken Access Control (인증·권한 미흡)
- A02: Cryptographic Failures (암호화·비밀 유출)
- A03: Injection (SQL, Command, LDAP 등)
- A05: Security Misconfiguration (설정 오류)
- A06: Vulnerable Components (공급망 위험)
- A07: Authentication Failures (인증 우회)
- A09: Security Logging Failures (감사 로그 누락)
## deepagents 도구 사용법
- write_todos: 감사 시작 전 반드시 번호 목록으로 감사 계획을 작성합니다.
- read_file: 보안 감사 대상 파일을 읽습니다.
- glob: 설정 파일, 인증 관련 파일을 검색합니다.
- grep: 위험 패턴 (eval, exec, subprocess, os.system, sql 등)을 검색합니다.
- write_file: 완성된 security-review.json을 artifacts/security-review.json에 작성합니다.
- write_todos: 감사 단계를 계획합니다.
## finding 작성 규칙
- summary 앞에 반드시 OWASP 카테고리 prefix: "[A0X:Category] 요약"
- severity는 CVSS 관점에서 판단 (critical/high/medium/low/info)
- category는 "security" 사용
- evidence: 취약한 코드 라인 또는 설정값을 직접 인용
- 증거 없는 추측성 finding은 작성하지 않습니다.
## 행동 원칙
- grep으로 위험 패턴을 먼저 검색한 뒤 read_file로 맥락을 확인합니다.
- 하드코딩된 비밀값, 환경 변수 누출, 권한 없는 경로 접근을 집중적으로 검토합니다.
- 완성된 결과는 write_file로 반드시 저장합니다.
allowed_tools:
- read_file
- glob
- grep
- write_file
- write_todos
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.10
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,54 @@
name: openrouter-claude-spec-writer
version: 1
description: "시니어 spec writer. 요구사항 분석 → dev/spec@1 schema JSON 작성."
backend: openrouter
model: "openrouter:anthropic/claude-sonnet-4-6"
provider_origin: "US/Anthropic"
capabilities:
- spec_write
- phase_planning
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 시니어 Spec Writer입니다. 한국어로 대화합니다.
## 역할
사용자의 요구사항을 분석해 dev/spec@1 JSON Schema에 맞는 spec.json을 작성합니다.
## deepagents 도구 사용법
- write_todos: 작업 시작 전 반드시 번호 목록으로 계획을 작성합니다.
- read_file: 기존 코드·문서를 읽어 맥락을 파악합니다.
- glob: 관련 파일 목록을 검색합니다.
- grep: 특정 패턴을 코드베이스에서 찾습니다.
- write_file: 완성된 spec.json을 artifacts/spec.json 경로에 작성합니다.
## spec.json 작성 규칙
- runId: UUID 형식 (예: "00000000-0000-0000-0000-000000000001")
- phaseKey: 현재 phase 키 문자열
- requirements: 사용자 요구사항 상세 설명 (10자 이상)
- acceptance_criteria: 수락 기준 목록 (1개 이상, 구체적으로)
- approach: 구현 접근법 설명 (10자 이상)
- risks: 위험 요소 목록 (없으면 빈 배열 [])
## 행동 원칙
- 기존 코드베이스를 read_file/glob/grep으로 충분히 탐색한 뒤 spec을 작성합니다.
- acceptance_criteria는 측정 가능하고 검증 가능하게 작성합니다.
- 불명확한 요구사항은 합리적으로 가정하고 assumptions 섹션에 명시합니다.
- 완성된 spec은 반드시 write_file로 artifacts/spec.json에 저장합니다.
allowed_tools:
- read_file
- write_file
- ls
- glob
- grep
- write_todos
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.10
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,53 @@
name: openrouter-deepseek-log-analyzer
version: 1
description: "로그 파일·스택 트레이스 분석. 패턴 식별·빈도 집계·핵심 라인 추출."
backend: openrouter
model: "openrouter:deepseek/deepseek-chat"
provider_origin: "China/DeepSeek"
capabilities:
- evidence_check
- metric_extract
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 Log Analyzer입니다. 한국어로 대화합니다.
## 역할
로그 파일과 스택 트레이스를 분석해 패턴을 식별하고 핵심 정보를 추출합니다.
## deepagents 도구 사용법
- write_todos: 분석 시작 전 반드시 번호 목록으로 분석 계획을 작성합니다.
- read_file: 로그 파일을 읽습니다.
- glob: 로그 파일 목록을 검색합니다 (*.log, *.txt, stderr 등).
- grep: 에러 패턴, 예외 클래스, 특정 메시지를 검색합니다.
- write_file: 분석 결과를 artifacts/log-analysis.json에 작성합니다.
## 분석 항목
- 에러 유형별 빈도 집계 (가장 많이 나타나는 에러 우선)
- 스택 트레이스 패턴 식별 (같은 root cause 그룹화)
- 타임라인 재구성 (이벤트 순서)
- 핵심 라인 추출 (실제로 중요한 라인만)
- 연관 에러 파악 (한 에러가 다른 에러를 유발하는지)
## 출력 원칙
- 원본 로그를 전부 요약하지 않습니다. 핵심만 추출합니다.
- 빈도 높은 패턴을 먼저 보고합니다.
- 추측은 "추정:" prefix를 붙여 명확히 구분합니다.
- 완성된 분석 결과는 write_file로 artifacts/log-analysis.json에 저장합니다.
allowed_tools:
- read_file
- ls
- glob
- grep
- write_file
- write_todos
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.005
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,54 @@
name: openrouter-deepseek-verifier
version: 1
description: "review.json의 각 finding을 독립적으로 검증. verifierStatus 판정."
backend: openrouter
model: "openrouter:deepseek/deepseek-chat"
provider_origin: "China/DeepSeek"
capabilities:
- evidence_check
- objective_eval
max_risk_level: low
system_prompt: |
당신은 my-deepagent의 Verifier입니다. 한국어로 대화합니다.
## 역할
artifacts/review.json의 각 finding을 코드 증거를 통해 독립적으로 검증하고
verifierStatus를 confirmed 또는 rejected로 판정합니다.
## deepagents 도구 사용법
- write_todos: 검증 시작 전 반드시 finding 목록과 검증 계획을 작성합니다.
- read_file: review.json을 읽고 각 finding의 filePath를 읽어 증거를 확인합니다.
- glob: 관련 파일을 검색합니다.
- grep: finding에서 언급된 패턴을 실제 코드에서 확인합니다.
- write_file: 검증 결과를 artifacts/verification.json에 작성합니다.
## 검증 원칙
- 각 finding을 독립적으로 코드에서 직접 확인합니다.
- confirmed: 코드에서 실제로 해당 문제가 존재함을 확인한 경우
- rejected: 코드를 확인했을 때 해당 문제가 없거나 이미 처리된 경우
- 판정 근거를 evidence 필드에 명시합니다 (확인한 코드 라인 포함).
- 증거 없이 주관적으로 판정하지 않습니다.
- 완성된 검증 결과는 write_file로 artifacts/verification.json에 저장합니다.
## verification.json 형식
review.json과 동일한 dev/review-finding-batch@1 형식.
각 finding의 verifierStatus를 confirmed 또는 rejected로 업데이트.
reviewerRole을 "verifier"로 변경.
allowed_tools:
- read_file
- ls
- glob
- grep
- write_file
- write_todos
deepagents_backend: local_shell
fallback_model: "openrouter:anthropic/claude-haiku-4-5"
max_cost_per_call_usd: 0.005
model_params:
max_tokens: 4096
temperature: 0.2
top_p: 1.0
interrupt_on:
execute:
allowed_decisions: [approve, reject]
write_file: false

View File

@@ -0,0 +1,108 @@
name: bug-fix-with-reproduction
version: 1
description: "버그 재현 → 진단 → 수정 → 검증. 각 단계 artifact 생성."
roles:
- id: reproducer
required_capabilities:
- evidence_check
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-debugger@1"
- "openrouter-deepseek-log-analyzer@1"
- id: debugger
required_capabilities:
- code_edit
- evidence_check
- command_execute
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-debugger@1"
- id: fixer
required_capabilities:
- code_edit
- test_first_development
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-code-editor@1"
- id: verifier
required_capabilities:
- evidence_check
- objective_eval
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-deepseek-verifier@1"
phases:
- key: reproduce
title: "버그 재현 및 재현 조건 문서화"
risk: low
role: reproducer
expected_artifact:
path: artifacts/reproduction.json
schema: dev/spec@1
gates:
- reproduce_approved
timeout_seconds: 300
instructions: |
보고된 버그를 재현하고 재현 조건을 문서화합니다.
로그 파일이 있으면 read_file로 읽고 패턴을 분석합니다.
glob/grep으로 관련 코드를 검색합니다.
재현 조건·환경·입력값·실제 출력·기대 출력을 dev/spec@1 형식으로
artifacts/reproduction.json에 write_file로 저장합니다.
max_budget_usd: 0.20
- key: diagnose
title: "근본 원인 진단"
risk: low
role: debugger
expected_artifact:
path: artifacts/diagnosis.json
schema: dev/spec@1
gates:
- diagnose_approved
timeout_seconds: 360
instructions: |
artifacts/reproduction.json을 read_file로 읽고 근본 원인을 진단합니다.
가설을 세우고 read_file/grep으로 코드에서 검증합니다.
가장 단순한 가설부터 검증합니다.
root cause, 영향 범위, 수정 제안을 dev/spec@1 형식으로
artifacts/diagnosis.json에 write_file로 저장합니다.
max_budget_usd: 0.50
- key: fix
title: "버그 수정"
risk: medium
role: fixer
expected_artifact:
path: artifacts/fix.json
schema: dev/spec@1
gates:
- fix_approved
timeout_seconds: 600
instructions: |
artifacts/diagnosis.json을 read_file로 읽고 근본 원인을 수정합니다.
수정 전 테스트 케이스를 먼저 작성합니다 (test_first_development).
edit_file로 최소한의 변경만 적용합니다.
수정 내용, 변경된 파일 목록, 테스트 명령어를 dev/spec@1 형식으로
artifacts/fix.json에 write_file로 저장합니다.
max_budget_usd: 1.00
- key: verify
title: "수정 결과 검증"
risk: low
role: verifier
expected_artifact:
path: artifacts/verification.json
schema: dev/review-finding-batch@1
gates:
- verify_approved
timeout_seconds: 300
instructions: |
artifacts/fix.json을 read_file로 읽고 수정된 코드를 직접 확인합니다.
재현 조건이 해소됐는지, 회귀 위험은 없는지 검증합니다.
검증 결과를 dev/review-finding-batch@1 형식으로
artifacts/verification.json에 write_file로 저장합니다.
verifierStatus: confirmed = 수정 확인됨, rejected = 수정 불충분.
max_budget_usd: 0.20
default_gates: []
max_total_budget_usd: 3.0

View File

@@ -0,0 +1,63 @@
name: code-investigation
version: 1
description: "코드베이스 탐색 → 요약 보고서 생성. 구조 파악·의존성 분석·이슈 발굴."
roles:
- id: explorer
required_capabilities:
- evidence_check
- code_review
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-code-reviewer@1"
- "openrouter-deepseek-verifier@1"
- id: summarizer
required_capabilities:
- evidence_check
- final_report_compose
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-spec-writer@1"
phases:
- key: explore
title: "코드베이스 탐색 및 정보 수집"
risk: low
role: explorer
expected_artifact:
path: artifacts/exploration.json
schema: dev/spec@1
gates: []
timeout_seconds: 600
instructions: |
코드베이스를 체계적으로 탐색합니다.
glob으로 전체 파일 구조를 파악하고 read_file로 핵심 파일을 읽습니다.
grep으로 주요 패턴·의존성·진입점을 검색합니다.
발견한 내용 (구조, 주요 컴포넌트, 의존성, 잠재적 이슈)을
dev/spec@1 형식으로 artifacts/exploration.json에 write_file로 저장합니다.
requirements 필드: 탐색 목적
approach 필드: 탐색한 파일 목록 및 방법
acceptance_criteria 필드: 발견한 핵심 사실들
risks 필드: 발견한 잠재적 이슈들
max_budget_usd: 0.50
- key: summarize
title: "탐색 결과 최종 보고서 작성"
risk: low
role: summarizer
expected_artifact:
path: artifacts/report.json
schema: common/final-report@1
gates:
- report_approved
timeout_seconds: 300
instructions: |
artifacts/exploration.json을 read_file로 읽고 common/final-report@1 형식으로
최종 보고서를 작성합니다.
status: "completed"
phases: explore와 summarize 단계 정보
findings: exploration.json의 risks 항목을 finding으로 변환
artifacts: exploration.json 경로 포함
보고서를 write_file로 artifacts/report.json에 저장합니다.
max_budget_usd: 0.30
default_gates: []
max_total_budget_usd: 1.0

View File

@@ -0,0 +1,76 @@
name: spec-and-review
version: 1
description: "요구사항 → spec → 리뷰 → verifier 검증"
roles:
- id: spec_writer
required_capabilities:
- spec_write
- phase_planning
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-spec-writer@1"
- id: reviewer
required_capabilities:
- code_review
- evidence_check
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-claude-code-reviewer@1"
- id: verifier
required_capabilities:
- evidence_check
- objective_eval
preferred_backends:
- openrouter
fallback_personas:
- "openrouter-deepseek-verifier@1"
phases:
- key: spec
title: "요구사항 분석 및 Spec 작성"
risk: low
role: spec_writer
expected_artifact:
path: artifacts/spec.json
schema: dev/spec@1
gates:
- spec_approved
timeout_seconds: 300
instructions: |
사용자 요구사항을 분석해 dev/spec@1 schema에 맞는 spec.json을 작성하세요.
기존 코드는 read_file/glob/grep으로 탐색합니다.
완성된 spec.json은 write_file로 artifacts/spec.json에 저장합니다.
max_budget_usd: 0.50
- key: review
title: "Spec 리뷰"
risk: low
role: reviewer
expected_artifact:
path: artifacts/review.json
schema: dev/review-finding-batch@1
gates:
- review_approved
timeout_seconds: 300
instructions: |
artifacts/spec.json을 read_file로 읽고 dev/review-finding-batch@1 형식으로 review.json을 작성하세요.
각 finding은 severity, category, summary를 반드시 포함합니다.
완성된 review.json은 write_file로 artifacts/review.json에 저장합니다.
max_budget_usd: 0.50
- key: verify
title: "리뷰 결과 검증"
risk: low
role: verifier
expected_artifact:
path: artifacts/verification.json
schema: dev/review-finding-batch@1
gates:
- verify_approved
timeout_seconds: 180
instructions: |
artifacts/review.json을 read_file로 읽고 각 finding을 코드에서 직접 확인합니다.
verifierStatus를 confirmed 또는 rejected로 판정하고 근거를 evidence 필드에 기록합니다.
결과를 write_file로 artifacts/verification.json에 저장합니다.
max_budget_usd: 0.10
default_gates: []
max_total_budget_usd: 2.0

15
my-deepagent/mypy.ini Normal file
View File

@@ -0,0 +1,15 @@
[mypy]
python_version = 3.12
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
disallow_untyped_decorators = true
plugins = pydantic.mypy
[mypy-tests.*]
disallow_untyped_defs = false
[mypy-alembic.*]
ignore_errors = true

View File

@@ -0,0 +1,58 @@
[project]
name = "my-deepagent"
version = "0.1.0"
description = "Add your description here"
requires-python = ">=3.12"
dependencies = [
"aiosqlite>=0.20",
"alembic>=1.14",
"greenlet>=3.0",
"sqlalchemy[asyncio]>=2.0",
"httpx>=0.28",
"jsonschema>=4.23",
"keyring>=25.7",
"langchain>=0.3.0,<2.0.0",
"langchain-core>=0.3.0,<2.0.0",
"langchain-openai>=0.3.0,<2.0.0",
"langgraph>=0.2.0",
"langgraph-checkpoint-sqlite>=2.0.0",
"openai>=1.0.0",
"platformdirs>=4.9",
"prompt-toolkit>=3.0",
"pydantic>=2.9",
"pydantic-settings>=2.6",
"pyyaml>=6.0",
"rich>=13.9",
"structlog>=24.4",
"typer>=0.14",
"zstandard>=0.23",
"deepagents>=0.6.1,<0.7.0",
]
[project.scripts]
mydeepagent = "my_deepagent.cli.main:app"
[build-system]
requires = ["uv_build>=0.9.28,<0.10.0"]
build-backend = "uv_build"
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
addopts = "-v --strict-markers"
markers = [
"integration: marks tests as integration tests that make real external API calls (deselect with '-m not integration')",
]
[dependency-groups]
dev = [
"mypy>=1.13",
"pre-commit>=4.0",
"pytest>=8.3",
"pytest-asyncio>=0.24",
"pytest-httpx>=0.34",
"respx>=0.21",
"ruff>=0.8",
"types-jsonschema>=4.26.0.20260508",
"types-pyyaml>=6.0.12.20260510",
]

12
my-deepagent/ruff.toml Normal file
View File

@@ -0,0 +1,12 @@
target-version = "py312"
line-length = 100
[lint]
select = ["E", "W", "F", "I", "N", "B", "UP", "S", "C90", "RUF"]
ignore = ["S101", "S311"]
[lint.per-file-ignores]
"tests/**" = ["S", "B"]
[format]
quote-style = "double"

View File

@@ -0,0 +1,3 @@
"""my-deepagent: workflow harness + persona library + OpenRouter on top of deepagents."""
__version__ = "0.1.0"

View File

@@ -0,0 +1,150 @@
"""Artifact schema registry. Loads JSON Schema 2020-12 documents and validates artifacts.
Schemas live at:
{data_dir}/artifacts/<schema_id>.json (user)
docs/schemas/artifacts/<schema_id>.json (seed)
where <schema_id> is "<domain>/<name>@<version>" (e.g. "dev/spec@1").
"""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from jsonschema import Draft202012Validator, ValidationError
from jsonschema.exceptions import SchemaError
from .enums import ErrorClass
from .errors import MyDeepAgentError
@dataclass(frozen=True)
class ValidationFinding:
"""One JSON Schema validation error in a structured form."""
path: str # JSON pointer-ish: "/findings/0/severity"
message: str
validator: str # "enum", "required", "type", ...
expected: Any | None
@dataclass(frozen=True)
class ValidationResult:
ok: bool
errors: tuple[ValidationFinding, ...] = field(default_factory=tuple)
class ArtifactSchemaRegistry:
"""Loads + caches JSON Schema 2020-12 documents from one or more roots.
Roots are searched in order; first hit wins.
"""
def __init__(self, roots: list[Path]) -> None:
if not roots:
raise MyDeepAgentError(
ErrorClass.FATAL,
"config_invalid",
message="ArtifactSchemaRegistry requires at least one root",
)
self._roots = [Path(r) for r in roots]
self._cache: dict[str, dict[str, Any]] = {}
self._validator_cache: dict[str, Draft202012Validator] = {}
def _resolve_path(self, schema_id: str) -> Path:
"""Try each root for <root>/<schema_id>.json; return first existing."""
if not schema_id or "/" not in schema_id:
raise MyDeepAgentError(
ErrorClass.FATAL,
"artifact_schema_unknown",
message=(
f"invalid schema_id format: {schema_id!r}"
" (expected '<domain>/<name>@<version>')"
),
)
rel = Path(f"{schema_id}.json")
for root in self._roots:
candidate = root / rel
if candidate.is_file():
return candidate
raise MyDeepAgentError(
ErrorClass.FATAL,
"artifact_schema_unknown",
message=(f"schema not found: {schema_id} (searched: {[str(r) for r in self._roots]})"),
recovery_hint=f"add {schema_id}.json to one of the registry roots",
)
def load(self, schema_id: str) -> dict[str, Any]:
"""Return the parsed schema document. Cached after first load."""
if schema_id in self._cache:
return self._cache[schema_id]
path = self._resolve_path(schema_id)
try:
raw = path.read_text(encoding="utf-8")
schema: Any = json.loads(raw)
except (OSError, json.JSONDecodeError) as e:
raise MyDeepAgentError(
ErrorClass.FATAL,
"artifact_schema_load_failed",
message=f"failed to load schema {schema_id} from {path}: {e}",
cause=e,
) from e
if not isinstance(schema, dict):
raise MyDeepAgentError(
ErrorClass.FATAL,
"artifact_schema_load_failed",
message=f"schema {schema_id} must be a JSON object at {path}",
)
# Verify the schema document itself is a valid Draft 2020-12 schema.
try:
Draft202012Validator.check_schema(schema)
except SchemaError as e:
raise MyDeepAgentError(
ErrorClass.FATAL,
"artifact_schema_load_failed",
message=(f"schema {schema_id} is not a valid Draft 2020-12 schema: {e.message}"),
cause=e,
) from e
self._cache[schema_id] = schema
return schema
def _validator(self, schema_id: str) -> Draft202012Validator:
if schema_id not in self._validator_cache:
self._validator_cache[schema_id] = Draft202012Validator(self.load(schema_id))
return self._validator_cache[schema_id]
def validate(self, schema_id: str, data: Any) -> ValidationResult:
"""Validate *data* against *schema_id*.
Returns a structured :class:`ValidationResult` — never raises for
invalid data. Raises :class:`~my_deepagent.errors.MyDeepAgentError`
with code ``artifact_schema_unknown`` or ``artifact_schema_load_failed``
if the schema itself cannot be loaded.
"""
validator = self._validator(schema_id)
raw_errors: list[ValidationError] = list(validator.iter_errors(data))
if not raw_errors:
return ValidationResult(ok=True)
findings = tuple(
ValidationFinding(
path="/" + "/".join(str(p) for p in err.absolute_path),
message=err.message,
validator=str(err.validator),
expected=err.validator_value,
)
for err in raw_errors
)
return ValidationResult(ok=False, errors=findings)
def known_schema_ids(self) -> list[str]:
"""Enumerate all schemas found across all roots. Sorted, deduplicated."""
seen: set[str] = set()
for root in self._roots:
if not root.is_dir():
continue
for path in sorted(root.rglob("*.json")):
rel = path.relative_to(root).with_suffix("")
seen.add(str(rel))
return sorted(seen)

View File

@@ -0,0 +1,404 @@
"""Persona binding algorithm: auto-select, override, capability/risk validation, consent gate."""
from __future__ import annotations
import fcntl
import json
import os
from collections.abc import Iterator
from contextlib import contextmanager
from dataclasses import dataclass
from datetime import UTC, datetime
from pathlib import Path
from typing import Any, Literal, cast
from .enums import Backend, RiskLevel
from .errors import MyDeepAgentError
from .hash import sha256
from .persona import Persona
from .workflow import WorkflowRole, WorkflowTemplate
ConsentDecision = Literal["approve", "block", "once"]
_RISK_RANK: dict[RiskLevel, int] = {
RiskLevel.LOW: 0,
RiskLevel.MEDIUM: 1,
RiskLevel.HIGH: 2,
}
@dataclass(frozen=True)
class BackendAvailability:
"""Which backends are reachable in the current environment.
v0.1.0: openrouter availability is determined solely by API-key presence.
Other backends follow the same pattern — callers populate available_backends.
"""
available_backends: frozenset[Backend]
def is_available(self, backend: Backend) -> bool:
return backend in self.available_backends
@dataclass(frozen=True)
class BindingOverride:
"""Per-role persona override: role_id → "persona-name@version" spec string."""
persona_pinned: dict[str, str]
@classmethod
def parse(cls, raw: dict[str, str] | None) -> BindingOverride:
return cls(persona_pinned=dict(raw or {}))
@dataclass(frozen=True)
class Binding:
"""Resolved binding of a single workflow role to a concrete persona."""
role_id: str
persona: Persona
binding_hash: str
def is_persona_eligible_for_role(
persona: Persona,
role: WorkflowRole,
template: WorkflowTemplate,
) -> tuple[bool, str | None]:
"""Return (eligible, reason_if_not).
Checks three conditions in order:
1. The persona has all capabilities required by the role.
2. The persona's allowed_roles (if set) includes this role.
3. The persona's max_risk_level covers the highest phase risk for this role.
"""
required = set(role.required_capabilities)
have = set(persona.capabilities)
if not required.issubset(have):
missing = required - have
return False, f"missing capabilities: {sorted(c.value for c in missing)}"
if persona.allowed_roles is not None and role.id not in persona.allowed_roles:
return False, f"role {role.id!r} not in persona.allowed_roles"
max_phase_risk = max(
(ph.risk for ph in template.phases if ph.role == role.id),
default=RiskLevel.LOW,
)
if _RISK_RANK[max_phase_risk] > _RISK_RANK[persona.max_risk_level]:
return (
False,
(
f"phase risk {max_phase_risk.value} > "
f"persona max_risk_level {persona.max_risk_level.value}"
),
)
return True, None
def _auto_select(candidates: list[Persona], role: WorkflowRole) -> Persona:
"""Deterministic selection from eligible candidates.
Priority (ascending sort key):
1. preferred_backends index (lower = more preferred; non-preferred → last)
2. version descending (higher = newer)
3. name ascending (alphabetical tiebreak)
4. compute_hash ascending (hash tiebreak for identical name+version)
"""
def _key(p: Persona) -> tuple[int, int, str, str]:
try:
pref_idx = role.preferred_backends.index(p.backend)
except ValueError:
pref_idx = len(role.preferred_backends) + 1
return (pref_idx, -p.version, p.name, p.compute_hash())
return sorted(candidates, key=_key)[0]
class PersonaConsentStore:
"""Crash-safe + multi-process-safe JSON file store for per-persona consent decisions.
Storage: {path} -> {"<persona_hash>": {"decision": "approve|block|once", "decided_at": "..."}}
Concurrency guarantees:
* Writes are atomic via tmp-file + fsync + os.replace (POSIX rename is atomic).
* Cross-process safety via advisory ``fcntl.flock`` on a lock-file at ``{path}.lock``.
``set()`` / ``revoke()`` hold an exclusive lock for the read-modify-write cycle;
``get()`` uses a shared lock for consistent reads. This prevents lost-update
races between concurrent ``mydeepagent`` invocations on the same machine.
"""
def __init__(self, path: Path) -> None:
self._path = path
self._lock_path = path.with_suffix(path.suffix + ".lock")
@contextmanager
def _flock(self, exclusive: bool) -> Iterator[None]:
"""Acquire a POSIX advisory lock for the duration of the block."""
self._lock_path.parent.mkdir(parents=True, exist_ok=True)
fd = os.open(self._lock_path, os.O_RDWR | os.O_CREAT, 0o600)
try:
fcntl.flock(fd, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH)
try:
yield
finally:
fcntl.flock(fd, fcntl.LOCK_UN)
finally:
os.close(fd)
def _load(self) -> dict[str, Any]:
if not self._path.is_file():
return {}
try:
raw = self._path.read_text(encoding="utf-8")
data: object = json.loads(raw) if raw.strip() else {}
except (OSError, json.JSONDecodeError) as e:
raise MyDeepAgentError.fatal(
"internal_state_corruption",
message=f"failed to read consent store at {self._path}: {e}",
recovery_hint=(
f"delete {self._path} and re-run; "
"previously granted consents will be re-prompted"
),
cause=e,
) from e
if not isinstance(data, dict):
raise MyDeepAgentError.fatal(
"internal_state_corruption",
message=f"consent store must be a JSON object: {self._path}",
)
return data
def _write(self, data: dict[str, Any]) -> None:
"""Atomic crash-safe write. Caller must already hold the exclusive flock."""
self._path.parent.mkdir(parents=True, exist_ok=True)
tmp = self._path.with_suffix(self._path.suffix + ".tmp")
payload = json.dumps(data, indent=2, sort_keys=True, ensure_ascii=False)
fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
try:
os.write(fd, payload.encode("utf-8"))
os.fsync(fd)
finally:
os.close(fd)
os.replace(tmp, self._path)
def get(self, persona_hash: str) -> ConsentDecision | None:
"""Return stored decision or None if absent / unrecognised."""
with self._flock(exclusive=False):
entry = self._load().get(persona_hash)
if entry is None:
return None
decision = entry.get("decision") if isinstance(entry, dict) else None
if decision not in ("approve", "block", "once"):
return None
return cast(ConsentDecision, decision)
def set(self, persona_hash: str, decision: ConsentDecision) -> None:
"""Persist a consent decision. Exclusive lock + atomic write."""
with self._flock(exclusive=True):
data = self._load()
data[persona_hash] = {
"decision": decision,
"decided_at": datetime.now(UTC).isoformat(timespec="seconds"),
}
self._write(data)
def revoke(self, persona_hash: str) -> None:
"""Remove a previously stored consent decision. Exclusive lock. No-op if absent."""
with self._flock(exclusive=True):
data = self._load()
data.pop(persona_hash, None)
self._write(data)
def filter_consented_personas(
personas: list[Persona],
consent_store: PersonaConsentStore,
) -> list[Persona]:
"""Remove personas whose consent decision is 'block'.
'approve', 'once', and absent (None) decisions all allow the persona through.
"""
return [p for p in personas if consent_store.get(p.compute_hash()) != "block"]
def _parse_override_version(pinned_spec: str, version_str: str) -> int | None:
"""Parse the version component of an override spec. None if empty, raise otherwise."""
if not version_str:
return None
try:
return int(version_str)
except ValueError as e:
raise MyDeepAgentError.human_required(
"no_eligible_persona",
message=(f"override spec '{pinned_spec}' has non-integer version '{version_str}'"),
recovery_hint="use the format '<persona-name>@<integer-version>'",
cause=e,
) from e
def _resolve_override(
role: WorkflowRole,
template: WorkflowTemplate,
pinned_spec: str,
eligible: list[Persona],
persona_pool: list[Persona],
consent_store: PersonaConsentStore,
) -> Persona:
"""Resolve an override spec to a single eligible persona or raise human_required."""
name, _, version_str = pinned_spec.partition("@")
version = _parse_override_version(pinned_spec, version_str)
matches = [p for p in eligible if p.name == name and (version is None or p.version == version)]
if matches:
return matches[0] if len(matches) == 1 else _auto_select(matches, role)
# Distinguish: blocked vs. ineligible vs. simply absent.
pool_matches = [
p for p in persona_pool if p.name == name and (version is None or p.version == version)
]
if any(consent_store.get(p.compute_hash()) == "block" for p in pool_matches):
raise MyDeepAgentError.human_required(
"persona_blocked_by_user",
message=f"override persona '{pinned_spec}' is consent-blocked",
recovery_hint="run `mydeepagent consents revoke <persona>` to clear the block",
)
if pool_matches:
_, reason = is_persona_eligible_for_role(pool_matches[0], role, template)
raise MyDeepAgentError.human_required(
"no_eligible_persona",
message=(
f"override persona '{pinned_spec}' is ineligible for role '{role.id}': {reason}"
),
)
raise MyDeepAgentError.human_required(
"no_eligible_persona",
message=f"no eligible persona matches override '{pinned_spec}' for role '{role.id}'",
)
def _resolve_auto(
role: WorkflowRole,
template: WorkflowTemplate,
eligible: list[Persona],
persona_pool: list[Persona],
consent_store: PersonaConsentStore,
) -> Persona:
"""Auto-select from eligible or raise human_required with diagnostic context."""
if eligible:
return _auto_select(eligible, role)
any_blocked = any(
is_persona_eligible_for_role(p, role, template)[0]
and consent_store.get(p.compute_hash()) == "block"
for p in persona_pool
)
if any_blocked:
raise MyDeepAgentError.human_required(
"persona_blocked_by_user",
message=(f"all eligible personas for role '{role.id}' are blocked by user consent"),
)
raise MyDeepAgentError.human_required(
"no_eligible_persona",
message=f"no eligible persona for role '{role.id}'",
recovery_hint=(
f"add a persona with capabilities "
f"{sorted(c.value for c in role.required_capabilities)} "
"to docs/schemas/personas/"
),
)
def bind_personas(
template: WorkflowTemplate,
persona_pool: list[Persona],
available_backends: BackendAvailability,
consent_store: PersonaConsentStore,
override: BindingOverride | None = None,
) -> dict[str, Binding]:
"""Bind each workflow role to a concrete persona.
Resolution order per role:
1. Apply consent filter (remove 'block' personas).
2. Apply eligibility filter (capabilities, allowed_roles, risk level).
3. If override is set for this role, pick the pinned persona from eligible.
4. Otherwise, auto_select from eligible.
5. Validate backend availability.
6. Validate openrouter model non-empty.
Raises:
MyDeepAgentError (human_required, 'no_eligible_persona') — no match found.
MyDeepAgentError (human_required, 'persona_blocked_by_user') — all candidates blocked.
MyDeepAgentError (human_required, 'backend_unavailable') — backend not in environment.
MyDeepAgentError (human_required, 'model_unavailable') — openrouter model is blank.
"""
_override = override or BindingOverride.parse(None)
consented_pool = filter_consented_personas(persona_pool, consent_store)
bindings: dict[str, Binding] = {}
for role in template.roles:
eligible: list[Persona] = [
p for p in consented_pool if is_persona_eligible_for_role(p, role, template)[0]
]
if role.id in _override.persona_pinned:
chosen = _resolve_override(
role,
template,
_override.persona_pinned[role.id],
eligible,
persona_pool,
consent_store,
)
else:
chosen = _resolve_auto(role, template, eligible, persona_pool, consent_store)
# Backend availability check
if not available_backends.is_available(chosen.backend):
raise MyDeepAgentError.human_required(
"backend_unavailable",
message=(
f"backend '{chosen.backend.value}' is not available "
f"for persona '{chosen.name}@{chosen.version}'"
),
recovery_hint=_backend_recovery_hint(chosen.backend),
)
# Openrouter model non-empty check
if chosen.backend == Backend.OPENROUTER and not chosen.model.strip():
raise MyDeepAgentError.human_required(
"model_unavailable",
message=(
f"persona '{chosen.name}@{chosen.version}' "
"has empty model for openrouter backend"
),
recovery_hint=(
"set `model:` field in the persona yaml "
"(e.g. 'openrouter:deepseek/deepseek-chat')"
),
)
binding_hash = sha256(
{
"role_id": role.id,
"template_name": template.name,
"template_version": template.version,
"persona_hash": chosen.compute_hash(),
"backend": chosen.backend.value,
}
)
bindings[role.id] = Binding(role_id=role.id, persona=chosen, binding_hash=binding_hash)
return bindings
def _backend_recovery_hint(backend: Backend) -> str:
if backend == Backend.OPENROUTER:
return "run `mydeepagent login openrouter` to register an API key"
if backend in (Backend.ANTHROPIC, Backend.OPENAI, Backend.GOOGLE):
return f"run `mydeepagent login {backend.value}` to register an API key"
if backend == Backend.FAKE:
return (
"the 'fake' backend is for tests only; "
"add Backend.FAKE to the BackendAvailability set in your test harness"
)
return f"enable backend '{backend.value}' in config and ensure prerequisites"

View File

@@ -0,0 +1 @@
"""CLI doctor command for environment diagnostics. Implemented in Step 12."""

View File

@@ -0,0 +1 @@
"""CLI interactive subcommand. Implemented in Step 10."""

View File

@@ -0,0 +1 @@
"""Typer CLI entry point. Filled in Step 6."""

View File

@@ -0,0 +1 @@
"""CLI run command implementation. Implemented in Step 6."""

View File

@@ -0,0 +1 @@
"""CLI seed command for importing persona/workflow YAML assets. Implemented in Step 6."""

View File

@@ -0,0 +1 @@
"""CLI stats command for usage summary. Implemented in Step 12."""

View File

@@ -0,0 +1,109 @@
"""Application configuration loaded from env, .env, and TOML file via pydantic-settings."""
from __future__ import annotations
from pathlib import Path
from typing import Literal
from platformdirs import PlatformDirs
from pydantic import Field, ValidationError, field_validator
from pydantic_settings import (
BaseSettings,
PydanticBaseSettingsSource,
SettingsConfigDict,
TomlConfigSettingsSource,
)
from .enums import ErrorClass
from .errors import MyDeepAgentError
_DIRS = PlatformDirs("my-deepagent", "user", roaming=False)
class Config(BaseSettings):
"""Frozen application config. Source priority (high -> low): CLI/env, .env, TOML, defaults."""
model_config = SettingsConfigDict(
env_prefix="MYDEEPAGENT_",
env_file=".env",
env_file_encoding="utf-8",
toml_file=Path(_DIRS.user_config_dir) / "config.toml",
frozen=True,
extra="ignore",
)
# storage
database_url: str = Field(
default_factory=lambda: (
f"sqlite+aiosqlite:///{Path(_DIRS.user_data_dir) / 'database.sqlite3'}"
)
)
workspace_root: Path = Field(default_factory=Path.cwd)
data_dir: Path = Field(default_factory=lambda: Path(_DIRS.user_data_dir))
config_dir: Path = Field(default_factory=lambda: Path(_DIRS.user_config_dir))
state_dir: Path = Field(default_factory=lambda: Path(_DIRS.user_state_dir))
# logging / i18n
log_level: Literal["trace", "debug", "info", "warn", "error"] = "info"
lang: Literal["ko", "en"] = "ko"
# providers
openrouter_api_key: str | None = None
openrouter_base_url: str = "https://openrouter.ai/api/v1"
# observability
langsmith_tracing: bool = False
langsmith_api_key: str | None = None
langsmith_project: str = "my-deepagent"
# budget
budget_daily_usd: float = Field(default=5.0, ge=0)
budget_daily_warn_usd: float = Field(default=3.0, ge=0)
budget_run_usd: float = Field(default=1.0, ge=0)
budget_run_warn_usd: float = Field(default=0.5, ge=0)
budget_on_hit: Literal["prompt", "block", "warn_continue"] = "prompt"
# defaults
default_persona: str = "default-interactive"
@field_validator("workspace_root", "data_dir", "config_dir", "state_dir")
@classmethod
def _expand(cls, v: Path) -> Path:
return Path(v).expanduser().resolve()
@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
# priority: init > env > dotenv > toml > defaults
return (
init_settings,
env_settings,
dotenv_settings,
TomlConfigSettingsSource(settings_cls),
file_secret_settings,
)
def load_config(**overrides: object) -> Config:
"""Load Config with optional kwargs override.
Wraps pydantic ValidationError in MyDeepAgentError(fatal, config_invalid) per plan §18.
"""
try:
return Config(**overrides) # type: ignore[arg-type]
except ValidationError as e:
raise MyDeepAgentError(
ErrorClass.FATAL,
"config_invalid",
message=f"config validation failed: {e}",
recovery_hint=(
"check .env, environment variables, and ~/.config/my-deepagent/config.toml"
),
cause=e,
) from e

View File

@@ -0,0 +1 @@
"""LangGraph run engine orchestrator. Implemented in Step 7."""

View File

@@ -0,0 +1,92 @@
"""All closed-set enums used across the codebase."""
from enum import StrEnum
class Backend(StrEnum):
OPENROUTER = "openrouter"
ANTHROPIC = "anthropic"
OPENAI = "openai"
GOOGLE = "google"
FAKE = "fake"
class Capability(StrEnum):
SPEC_WRITE = "spec_write"
PHASE_PLANNING = "phase_planning"
TASK_DAG_PLANNING = "task_dag_planning"
CODE_EDIT = "code_edit"
TEST_FIRST_DEVELOPMENT = "test_first_development"
CODE_REVIEW = "code_review"
EVIDENCE_CHECK = "evidence_check"
COMMAND_EXECUTE = "command_execute"
BACKTEST_RUN = "backtest_run"
METRIC_EXTRACT = "metric_extract"
FAILURE_MINING = "failure_mining"
OBJECTIVE_EVAL = "objective_eval"
FINAL_REPORT_COMPOSE = "final_report_compose"
class RiskLevel(StrEnum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
class ApprovalDecisionAction(StrEnum):
APPROVE = "approve"
REJECT = "reject"
REQUEST_CHANGES = "request_changes"
ABORT = "abort"
class ApprovalState(StrEnum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
CHANGES_REQUESTED = "changes_requested"
ABORTED = "aborted"
PAUSED = "paused"
class RunState(StrEnum):
CREATED = "created"
BOUND = "bound"
PLANNING = "planning"
AWAITING_APPROVAL = "awaiting_approval"
EXECUTING = "executing"
PAUSED = "paused"
COMPLETED = "completed"
FAILED = "failed"
ABORTED = "aborted"
class RunPhaseState(StrEnum):
PENDING = "pending"
RUNNING = "running"
AWAITING_ARTIFACT = "awaiting_artifact"
VALIDATING = "validating"
AWAITING_APPROVAL = "awaiting_approval"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
class SessionState(StrEnum):
CREATED = "CREATED"
BOOTSTRAPPING = "BOOTSTRAPPING"
READY = "READY"
BUSY = "BUSY"
WAITING_FOR_APPROVAL = "WAITING_FOR_APPROVAL"
ARTIFACT_TIMEOUT = "ARTIFACT_TIMEOUT"
HUNG = "HUNG"
CRASHED = "CRASHED"
RESUMING = "RESUMING"
REBOOTSTRAPPED = "REBOOTSTRAPPED"
FAILED_NEEDS_HUMAN = "FAILED_NEEDS_HUMAN"
class ErrorClass(StrEnum):
RECOVERABLE = "recoverable"
HUMAN_REQUIRED = "human_required"
FATAL = "fatal"

View File

@@ -0,0 +1,79 @@
"""Domain errors. All exceptions raised by my-deepagent inherit MyDeepAgentError."""
from __future__ import annotations
from uuid import UUID
from .enums import ErrorClass
class MyDeepAgentError(Exception):
"""Base error with structured fields for classification, recovery hint, and context."""
def __init__(
self,
error_class: ErrorClass,
code: str,
*,
message: str | None = None,
run_id: UUID | None = None,
phase_id: UUID | None = None,
recovery_hint: str | None = None,
cause: BaseException | None = None,
) -> None:
super().__init__(message or code)
self.error_class = error_class
self.code = code
self.run_id = run_id
self.phase_id = phase_id
self.recovery_hint = recovery_hint
if cause is not None:
self.__cause__ = cause
self.__suppress_context__ = True
def __repr__(self) -> str:
parts = [f"class={self.error_class}", f"code={self.code}"]
if self.run_id is not None:
parts.append(f"run_id={self.run_id}")
if self.phase_id is not None:
parts.append(f"phase_id={self.phase_id}")
if self.recovery_hint:
parts.append(f"hint={self.recovery_hint!r}")
return f"MyDeepAgentError({', '.join(parts)})"
@classmethod
def recoverable(cls, code: str, **kwargs: object) -> MyDeepAgentError:
return MyDeepAgentError(ErrorClass.RECOVERABLE, code, **kwargs) # type: ignore[arg-type]
@classmethod
def human_required(cls, code: str, **kwargs: object) -> MyDeepAgentError:
return MyDeepAgentError(ErrorClass.HUMAN_REQUIRED, code, **kwargs) # type: ignore[arg-type]
@classmethod
def fatal(cls, code: str, **kwargs: object) -> MyDeepAgentError:
return MyDeepAgentError(ErrorClass.FATAL, code, **kwargs) # type: ignore[arg-type]
class BudgetExhaustedError(MyDeepAgentError):
"""Budget cap hit. Raised by BudgetTracker.assert_can_call when on_hit='block'."""
def __init__(
self,
scope: str,
projected_usd: float,
cap_usd: float,
*,
run_id: UUID | None = None,
recovery_hint: str | None = None,
) -> None:
super().__init__(
ErrorClass.HUMAN_REQUIRED,
"budget_exhausted",
message=f"budget '{scope}' exhausted: projected={projected_usd:.4f} cap={cap_usd:.4f}",
run_id=run_id,
recovery_hint=recovery_hint
or f"wait until the next period or extend the cap for scope '{scope}'",
)
self.scope = scope
self.projected_usd = projected_usd
self.cap_usd = cap_usd

View File

@@ -0,0 +1,28 @@
"""Canonical JSON serialization + sha256 hashing for content-addressed identity."""
from __future__ import annotations
import hashlib
import json
from typing import Any
def canonicalize(value: Any) -> str:
"""Return canonical JSON: keys sorted, no insignificant whitespace, UTF-16 codepoint order.
json.dumps with sort_keys=True uses Python's default dict key sort which is by Unicode
codepoint. For ASCII keys this is equivalent to UTF-16 codepoint order which is what
we want. For non-ASCII keys outside the BMP, this is a documented approximation.
"""
return json.dumps(
value,
sort_keys=True,
ensure_ascii=False,
separators=(",", ":"),
allow_nan=False,
)
def sha256(value: Any) -> str:
"""Return sha256 hex digest of canonical JSON of value."""
return hashlib.sha256(canonicalize(value).encode("utf-8")).hexdigest()

View File

@@ -0,0 +1 @@
"""Interactive REPL loop for TUI sessions. Implemented in Step 10."""

View File

@@ -0,0 +1,73 @@
"""AuditToolMiddleware: capture every tool call for audit log + DB.
Records: name, args, result/error, duration.
"""
from __future__ import annotations
import time
from typing import Any
from uuid import UUID
from langchain.agents.middleware import AgentMiddleware
class AuditToolMiddleware(AgentMiddleware):
"""Record every tool invocation for the audit log and DB sink (Step 8)."""
def __init__(
self,
run_id: UUID | None = None,
phase_id: UUID | None = None,
interactive_session_id: UUID | None = None,
recorder: Any | None = None,
) -> None:
super().__init__()
self.run_id = run_id
self.phase_id = phase_id
self.interactive_session_id = interactive_session_id
self.recorder = recorder
async def awrap_tool_call(self, request: Any, handler: Any) -> Any:
started = time.perf_counter()
# ToolCallRequest exposes tool_call dict with 'name' and 'args'
tool_call = getattr(request, "tool_call", {}) or {}
name: str = tool_call.get("name", "unknown") if isinstance(tool_call, dict) else "unknown"
args: dict[str, Any] = (
tool_call.get("args", {}) if isinstance(tool_call, dict) else {}
) or {}
try:
result = await handler(request)
except Exception as e:
await self._record(name, args, None, type(e).__name__, started)
raise
await self._record(name, args, result, None, started)
return result
async def _record(
self,
name: str,
args: dict[str, Any],
result: Any,
error: str | None,
started: float,
) -> None:
if self.recorder is None:
return
serializable_result: str | int | float | bool | dict[str, Any] | list[Any] | None
if isinstance(result, (str, int, float, bool, dict, list)) or result is None:
serializable_result = result
else:
serializable_result = str(result)
await self.recorder(
{
"tool_name": name,
"args": args,
"result": serializable_result,
"error": error,
"duration_ms": int((time.perf_counter() - started) * 1000),
"run_id": self.run_id,
"phase_id": self.phase_id,
"interactive_session_id": self.interactive_session_id,
}
)

View File

@@ -0,0 +1,87 @@
"""CostMiddleware: capture every LLM call's usage and accumulate cost into the SQLite ledger."""
from __future__ import annotations
import time
from typing import Any
from uuid import UUID
from langchain.agents.middleware import AgentMiddleware
from ..monitoring.pricing import PricingCache
class CostMiddleware(AgentMiddleware):
"""Wrap every model call. Compute cost from usage_metadata and persist.
Step 8 wires the DB writer via the recorder callback.
"""
def __init__(
self,
pricing: PricingCache,
model_name: str,
run_id: UUID | None = None,
phase_id: UUID | None = None,
persona_name: str | None = None,
recorder: Any | None = None, # callable(record) -> Awaitable[None] for DB sink (Step 8)
) -> None:
super().__init__()
self.pricing = pricing
self.model_name = model_name
self.run_id = run_id
self.phase_id = phase_id
self.persona_name = persona_name
self.recorder = recorder
async def awrap_model_call(self, request: Any, handler: Any) -> Any:
started = time.perf_counter()
try:
response = await handler(request)
except Exception as e:
await self._record(
input_tokens=0,
output_tokens=0,
latency_ms=int((time.perf_counter() - started) * 1000),
status="error",
error_code=type(e).__name__,
)
raise
usage = getattr(response, "usage_metadata", None) or {}
in_tokens = int(usage.get("input_tokens", 0) or 0)
out_tokens = int(usage.get("output_tokens", 0) or 0)
await self._record(
input_tokens=in_tokens,
output_tokens=out_tokens,
latency_ms=int((time.perf_counter() - started) * 1000),
status="ok",
error_code=None,
)
return response
async def _record(
self,
*,
input_tokens: int,
output_tokens: int,
latency_ms: int,
status: str,
error_code: str | None,
) -> None:
if self.recorder is None:
return
cost = self.pricing.compute_cost(self.model_name, input_tokens, output_tokens)
await self.recorder(
{
"model": self.model_name,
"run_id": self.run_id,
"phase_id": self.phase_id,
"persona_name": self.persona_name,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd_total": cost,
"latency_ms": latency_ms,
"status": status,
"error_code": error_code,
}
)

View File

@@ -0,0 +1,47 @@
"""FallbackModelMiddleware: retry the model call with a different model on transient HTTP errors."""
from __future__ import annotations
from typing import Any
import httpx
import openai
from langchain.agents.middleware import AgentMiddleware
class FallbackModelMiddleware(AgentMiddleware):
"""When the primary model raises a transient error, retry once with the fallback model.
Transient = HTTP 429, 5xx, network errors. Auth (401/AuthenticationError) and bad request
(400 model_not_found) are not retried — those need human intervention.
"""
def __init__(self, primary: Any, fallback: Any | None) -> None:
super().__init__()
self.primary = primary
self.fallback = fallback
async def awrap_model_call(self, request: Any, handler: Any) -> Any:
try:
return await handler(request)
except openai.AuthenticationError:
# 401 is human_required, not retryable.
raise
except (httpx.HTTPError, openai.RateLimitError, openai.APIConnectionError):
if self.fallback is None:
raise
# Best-effort: swap the model bound to the request and retry once.
patched = self._with_fallback_model(request)
return await handler(patched)
def _with_fallback_model(self, request: Any) -> Any:
"""Swap the bound model in the request for the fallback model.
ModelRequest exposes a `model` attribute (BaseChatModel instance).
We replace it with the fallback. The original request object is mutated
in place because ModelRequest.__setattr__ triggers a DeprecationWarning
only on ToolCallRequest; ModelRequest is a plain dataclass that allows assignment.
"""
if hasattr(request, "model"):
request.model = self.fallback
return request

View File

@@ -0,0 +1,126 @@
"""SafetyShellMiddleware: destructive command + secret-path enforcement at the tool layer.
Replaces deepagents.FilesystemPermission for personas using LocalShellBackend,
since deepagents 0.6.1 does not yet support permissions + execution-capable backends.
"""
from __future__ import annotations
import re
from pathlib import Path
from typing import Any
from langchain.agents.middleware import AgentMiddleware
from wcmatch import glob as wcglob
from ..errors import MyDeepAgentError
DESTRUCTIVE_PATTERNS: tuple[re.Pattern[str], ...] = tuple(
re.compile(p, re.IGNORECASE)
for p in (
r"\brm\s+-rf\b",
r"\bgit\s+reset\s+--hard\b",
r"\bgit\s+clean\b",
r"\bgit\s+push\s+--force(-with-lease)?\b",
r"\bgit\s+branch\s+-D\b",
r"\bdocker\s+volume\s+rm\b",
r"\bdocker\s+compose\s+down\s+-v\b",
r"\bDROP\s+(DATABASE|SCHEMA|TABLE)\b",
)
)
# Mirrors session.DEFAULT_DENY_PATHS but as relative glob patterns for wcmatch.
# Each sensitive directory is listed twice: once for the directory itself (no trailing
# slash — Path normalises it away) and once for everything inside it (**).
DENY_PATH_PATTERNS: tuple[str, ...] = (
"**/.env*",
"**/*.env*",
"**/*token*",
"**/*secret*",
"**/*credential*",
"**/*.pem",
"**/*.key",
"**/.ssh",
"**/.ssh/**",
"**/.aws",
"**/.aws/**",
"**/.config/gcloud",
"**/.config/gcloud/**",
"**/.kube",
"**/.kube/**",
"**/.gnupg",
"**/.gnupg/**",
)
_PATH_TOOLS: frozenset[str] = frozenset({"read_file", "write_file", "edit_file", "ls"})
# Tool names that carry shell commands.
_SHELL_TOOL_NAMES: frozenset[str] = frozenset({"shell", "execute", "run_command"})
_GLOB_FLAGS = wcglob.GLOBSTAR | wcglob.IGNORECASE | wcglob.DOTGLOB
def _is_denied_path(path: str) -> bool:
"""Return True iff the path matches any deny glob pattern."""
normalized = str(Path(path)).replace("\\", "/").lstrip("/")
for pat in DENY_PATH_PATTERNS:
if wcglob.globmatch(normalized, pat, flags=_GLOB_FLAGS):
return True
return False
class SafetyShellMiddleware(AgentMiddleware):
"""Hard-block destructive shell commands and secret-path file ops at the tool layer."""
async def awrap_tool_call(self, request: Any, handler: Any) -> Any:
name = self._tool_name(request)
args = self._tool_args(request)
if name in _SHELL_TOOL_NAMES:
self._check_shell(args)
elif name in _PATH_TOOLS:
self._check_path(name, args)
return await handler(request)
@staticmethod
def _tool_name(request: Any) -> str:
tool_call = getattr(request, "tool_call", None)
if isinstance(tool_call, dict):
return str(tool_call.get("name") or "")
return str(getattr(request, "name", "") or "")
@staticmethod
def _tool_args(request: Any) -> dict[str, Any]:
tool_call = getattr(request, "tool_call", None)
if isinstance(tool_call, dict):
return dict(tool_call.get("args") or {})
args = getattr(request, "args", None)
return dict(args) if isinstance(args, dict) else {}
def _check_shell(self, args: dict[str, Any]) -> None:
cmd = args.get("command") or args.get("argv") or ""
if isinstance(cmd, list):
cmd = " ".join(str(x) for x in cmd)
cmd_str = str(cmd)
for pat in DESTRUCTIVE_PATTERNS:
if pat.search(cmd_str):
raise MyDeepAgentError.human_required(
"destructive_command_blocked",
message=f"destructive shell command blocked: {cmd_str[:120]}",
recovery_hint=(
"this command is hard-blocked by my-deepagent's safety policy; "
"edit the persona system_prompt to avoid suggesting it"
),
)
def _check_path(self, tool_name: str, args: dict[str, Any]) -> None:
path = args.get("file_path") or args.get("path") or args.get("file") or ""
if not isinstance(path, str) or not path:
return
if _is_denied_path(path):
raise MyDeepAgentError.human_required(
"secret_access_blocked",
message=(f"access to secret-bearing path blocked: tool={tool_name} path={path!r}"),
recovery_hint=(
"this path matches a hard-blocked deny pattern (e.g. .env, *.key, .ssh/, .aws/)"
),
)

View File

@@ -0,0 +1 @@
"""LangSmith tracing integration helpers. Implemented in Step 12."""

View File

@@ -0,0 +1,99 @@
"""OpenRouter model pricing cache + cost computation.
v0.1.0: in-process dict cache + optional DB refresh. doctor와 background refresh가
업데이트 trigger (Step 12).
"""
from __future__ import annotations
from dataclasses import dataclass
import httpx
from ..errors import MyDeepAgentError
@dataclass(frozen=True)
class ModelPrice:
model: str # OpenRouter id, e.g. "deepseek/deepseek-chat"
input_per_1k_usd: float
output_per_1k_usd: float
context_length: int
class PricingCache:
"""In-memory cache of OpenRouter pricing. Caller refreshes via fetch_openrouter_pricing()."""
def __init__(self) -> None:
self._cache: dict[str, ModelPrice] = {}
def get(self, model: str) -> ModelPrice | None:
key = model.removeprefix("openrouter:")
return self._cache.get(key)
def set(self, prices: list[ModelPrice]) -> None:
for p in prices:
self._cache[p.model] = p
def compute_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
"""Return USD cost. Returns 0.0 if model price is unknown (logged separately)."""
price = self.get(model)
if price is None:
return 0.0
return (input_tokens / 1000.0) * price.input_per_1k_usd + (
output_tokens / 1000.0
) * price.output_per_1k_usd
async def fetch_openrouter_pricing(api_key: str, base_url: str) -> list[ModelPrice]:
"""Fetch the OpenRouter /models endpoint and parse pricing."""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
r = await client.get(
f"{base_url}/models",
headers={"Authorization": f"Bearer {api_key}"},
)
r.raise_for_status()
except httpx.HTTPError as e:
raise MyDeepAgentError.recoverable(
"network_blip",
message=f"failed to fetch openrouter pricing: {e}",
cause=e,
) from e
data: dict[str, object] = r.json()
return _parse_pricing_payload(data)
def _parse_pricing_payload(data: dict[str, object]) -> list[ModelPrice]:
"""Parse OpenRouter response.
Expected format::
{"data": [{"id": "...", "pricing": {"prompt": "...", "completion": "..."}, ...}]}
"""
models = data.get("data", [])
if not isinstance(models, list):
return []
out: list[ModelPrice] = []
for m in models:
if not isinstance(m, dict):
continue
model_id = m.get("id")
pricing = m.get("pricing") or {}
if not isinstance(model_id, str) or not isinstance(pricing, dict):
continue
try:
prompt_per_token = float(pricing.get("prompt", "0") or "0")
completion_per_token = float(pricing.get("completion", "0") or "0")
ctx_len = int(m.get("context_length", 0) or 0)
except (TypeError, ValueError):
continue
out.append(
ModelPrice(
model=model_id,
input_per_1k_usd=prompt_per_token * 1000.0,
output_per_1k_usd=completion_per_token * 1000.0,
context_length=ctx_len,
)
)
return out

View File

@@ -0,0 +1 @@
"""Run statistics aggregation and reporting. Implemented in Step 12."""

View File

@@ -0,0 +1,6 @@
"""Persistence layer: SQLAlchemy async ORM + LangGraph checkpointer."""
from .checkpointer import get_checkpointer_ctx
from .db import Database
__all__ = ["Database", "get_checkpointer_ctx"]

View File

@@ -0,0 +1,41 @@
"""LangGraph SqliteSaver wrapper. Use only as a context manager to ensure connection cleanup.
``SqliteSaver.from_conn_string`` is a ``@contextmanager`` classmethod that yields
a ``SqliteSaver`` instance and closes the underlying sqlite3 connection on exit.
Direct manual lifecycle management (entering context without ``with``) leaks connections
and is not supported by this module.
Usage::
with get_checkpointer_ctx(path) as saver:
graph = create_deep_agent(checkpointer=saver)
...
"""
from __future__ import annotations
from collections.abc import Iterator
from contextlib import contextmanager
from pathlib import Path
from langgraph.checkpoint.sqlite import SqliteSaver
@contextmanager
def get_checkpointer_ctx(checkpoints_db_path: Path) -> Iterator[SqliteSaver]:
"""Yield a SqliteSaver bound to *checkpoints_db_path*.
Creates the parent directory and the database file if they do not exist.
The underlying sqlite3 connection is closed automatically on context exit.
This is the only supported way to obtain a SqliteSaver in this project —
direct manual lifecycle management is not provided.
Args:
checkpoints_db_path: Filesystem path for the SQLite checkpoint database.
Yields:
SqliteSaver: Ready-to-use LangGraph checkpoint saver.
"""
checkpoints_db_path.parent.mkdir(parents=True, exist_ok=True)
with SqliteSaver.from_conn_string(str(checkpoints_db_path)) as saver:
yield saver

View File

@@ -0,0 +1,91 @@
"""Async SQLAlchemy engine + session factory with WAL mode and busy_timeout."""
from __future__ import annotations
from collections.abc import AsyncIterator
from contextlib import asynccontextmanager
from sqlalchemy import event
from sqlalchemy.ext.asyncio import (
AsyncEngine,
AsyncSession,
async_sessionmaker,
create_async_engine,
)
from .models import Base
def _attach_sqlite_pragmas(engine: AsyncEngine) -> None:
"""Attach a synchronous connect-event listener that enables WAL, busy_timeout, FK."""
@event.listens_for(engine.sync_engine, "connect")
def _set_sqlite_pragma(dbapi_connection: object, _conn_record: object) -> None:
# dbapi_connection is a raw sqlite3.Connection delivered by SQLAlchemy's
# pool event callback. The signature uses `object` to match the generic
# listener protocol; we cast to `Any` here to access DBAPI methods without
# introducing a hard import of `sqlite3` (which would break non-SQLite
# engines). The pragma calls are safe: they are no-ops on non-SQLite
# dialects and sqlite3.Connection always has `.cursor()`.
import sqlite3 # local import to avoid circular or non-SQLite coupling
conn: sqlite3.Connection = dbapi_connection # type: ignore[assignment]
cursor = conn.cursor()
cursor.execute("PRAGMA journal_mode=WAL")
cursor.execute("PRAGMA busy_timeout=5000")
cursor.execute("PRAGMA foreign_keys=ON")
cursor.close()
class Database:
"""Façade over async engine + session maker.
Usage::
db = Database("sqlite+aiosqlite:///path/to/db.sqlite3")
await db.init_schema() # dev/test: create all tables directly
async with db.session() as s: # production: use alembic upgrade head
result = await s.execute(...)
await db.dispose()
For production deployments, call ``alembic upgrade head`` instead of
``init_schema`` so that migration history is tracked.
"""
def __init__(self, database_url: str) -> None:
self._engine: AsyncEngine = create_async_engine(
database_url,
# NullPool avoids connection reuse issues in SQLite+aiosqlite tests.
poolclass=None, # use the default StaticPool-compatible pool
echo=False,
)
_attach_sqlite_pragmas(self._engine)
self._session_factory: async_sessionmaker[AsyncSession] = async_sessionmaker(
bind=self._engine,
expire_on_commit=False,
autoflush=False,
)
async def init_schema(self) -> None:
"""Create all ORM-defined tables.
For production, prefer ``alembic upgrade head``.
For tests, this is the fastest way to get a clean schema.
"""
async with self._engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
@asynccontextmanager
async def session(self) -> AsyncIterator[AsyncSession]:
"""Yield an async session; commit on success, rollback on exception."""
async with self._session_factory() as session:
try:
yield session
await session.commit()
except Exception:
await session.rollback()
raise
async def dispose(self) -> None:
"""Dispose the engine connection pool."""
await self._engine.dispose()

View File

@@ -0,0 +1,578 @@
"""SQLAlchemy 2.0 async ORM models for my-deepagent persistence layer."""
from __future__ import annotations
import uuid
from typing import Any
from sqlalchemy import (
JSON,
Boolean,
Float,
ForeignKey,
Index,
Integer,
String,
Text,
UniqueConstraint,
text,
)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
"""SQLAlchemy declarative base for my-deepagent."""
# ---------------------------------------------------------------------------
# workflow_templates
# ---------------------------------------------------------------------------
class WorkflowTemplateRow(Base):
"""Content-addressed workflow template definitions."""
__tablename__ = "workflow_templates"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
name: Mapped[str] = mapped_column(Text, nullable=False)
version: Mapped[int] = mapped_column(Integer, nullable=False)
hash: Mapped[str] = mapped_column(Text, nullable=False, unique=True)
definition: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<WorkflowTemplateRow id={self.id!r} name={self.name!r} version={self.version!r}>"
# ---------------------------------------------------------------------------
# agent_personas
# ---------------------------------------------------------------------------
class AgentPersonaRow(Base):
"""Content-addressed agent persona definitions."""
__tablename__ = "agent_personas"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
name: Mapped[str] = mapped_column(Text, nullable=False)
version: Mapped[int] = mapped_column(Integer, nullable=False)
hash: Mapped[str] = mapped_column(Text, nullable=False, unique=True)
definition: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<AgentPersonaRow id={self.id!r} name={self.name!r} version={self.version!r}>"
# ---------------------------------------------------------------------------
# runs
# ---------------------------------------------------------------------------
class RunRow(Base):
"""Top-level run record: one row per deepagent run invocation."""
__tablename__ = "runs"
__table_args__ = (
# Partial unique index: at most one active run per (repo_path, base_branch).
# An "active" run is any run whose state is not 'completed', 'failed', or 'aborted'.
# SQLite partial index uses a WHERE clause; autogenerate cannot detect this,
# so it is managed via a manual alembic migration.
Index(
"ux_active_run_repo_base",
"repo_path",
"base_branch",
unique=True,
sqlite_where=text("state NOT IN ('completed', 'failed', 'aborted')"),
),
)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
# FK to workflow_templates — RESTRICT prevents deleting a template that has runs.
template_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("workflow_templates.id", ondelete="RESTRICT"),
nullable=False,
)
template_hash: Mapped[str] = mapped_column(Text, nullable=False)
state: Mapped[str] = mapped_column(Text, nullable=False)
repo_path: Mapped[str] = mapped_column(Text, nullable=False)
base_branch: Mapped[str] = mapped_column(Text, nullable=False)
worktree_root: Mapped[str] = mapped_column(Text, nullable=False)
# current_phase_id references run_phases.id; however, runs.current_phase_id and
# run_phases.run_id form a circular FK pair. SQLite does not support deferrable
# constraints at the column level, and alembic cannot safely manage this circular
# dependency. Therefore current_phase_id carries NO ForeignKey constraint in the ORM.
# Callers must maintain referential integrity manually (i.e. always point to a valid
# run_phases.id that belongs to this run, or NULL).
current_phase_id: Mapped[str | None] = mapped_column(String(36), nullable=True)
started_at: Mapped[str | None] = mapped_column(Text, nullable=True)
ended_at: Mapped[str | None] = mapped_column(Text, nullable=True)
final_report_path: Mapped[str | None] = mapped_column(Text, nullable=True)
paused_from_state: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
updated_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<RunRow id={self.id!r} state={self.state!r}>"
# ---------------------------------------------------------------------------
# run_inputs
# ---------------------------------------------------------------------------
class RunInputRow(Base):
"""Input snapshot for a run (one-to-one with runs)."""
__tablename__ = "run_inputs"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
unique=True,
)
requirements_md: Mapped[str] = mapped_column(Text, nullable=False)
objective: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
extra: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
input_hash: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<RunInputRow id={self.id!r} run_id={self.run_id!r}>"
# ---------------------------------------------------------------------------
# run_bindings
# ---------------------------------------------------------------------------
class RunBindingRow(Base):
"""Per-role persona binding for a run."""
__tablename__ = "run_bindings"
__table_args__ = (UniqueConstraint("run_id", "role_id", name="uq_run_bindings_run_role"),)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
role_id: Mapped[str] = mapped_column(Text, nullable=False)
# FK to agent_personas — RESTRICT prevents deleting a persona that has bindings.
persona_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("agent_personas.id", ondelete="RESTRICT"),
nullable=False,
)
persona_hash: Mapped[str] = mapped_column(Text, nullable=False)
backend: Mapped[str] = mapped_column(Text, nullable=False)
binding_hash: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<RunBindingRow id={self.id!r} run_id={self.run_id!r} role_id={self.role_id!r}>"
# ---------------------------------------------------------------------------
# run_phases
# ---------------------------------------------------------------------------
class RunPhaseRow(Base):
"""Per-phase execution record for a run."""
__tablename__ = "run_phases"
__table_args__ = (UniqueConstraint("run_id", "phase_key", name="uq_run_phases_run_phase"),)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
phase_key: Mapped[str] = mapped_column(Text, nullable=False)
seq: Mapped[int] = mapped_column(Integer, nullable=False)
state: Mapped[str] = mapped_column(Text, nullable=False)
attempts: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
started_at: Mapped[str | None] = mapped_column(Text, nullable=True)
ended_at: Mapped[str | None] = mapped_column(Text, nullable=True)
def __repr__(self) -> str:
return f"<RunPhaseRow id={self.id!r} run_id={self.run_id!r} phase_key={self.phase_key!r}>"
# ---------------------------------------------------------------------------
# run_events
# ---------------------------------------------------------------------------
class RunEventRow(Base):
"""Ordered event stream for a run."""
__tablename__ = "run_events"
__table_args__ = (
UniqueConstraint("run_id", "seq", name="uq_run_events_run_seq"),
UniqueConstraint("run_id", "idempotency_key", name="uq_run_events_run_idempotency"),
Index("run_events_run_id_ts_idx", "run_id", "ts"),
)
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
# phase_id references run_phases.id; CASCADE so events are deleted when a phase is deleted.
phase_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=True,
)
seq: Mapped[int] = mapped_column(Integer, nullable=False)
type: Mapped[str] = mapped_column(Text, nullable=False)
payload: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
idempotency_key: Mapped[str] = mapped_column(Text, nullable=False)
ts: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<RunEventRow id={self.id!r} run_id={self.run_id!r} seq={self.seq!r}>"
# ---------------------------------------------------------------------------
# approval_requests
# ---------------------------------------------------------------------------
class ApprovalRequestRow(Base):
"""Human approval gate requests."""
__tablename__ = "approval_requests"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
# phase_id references run_phases.id; CASCADE so approval requests are deleted with the phase.
phase_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=True,
)
gate_key: Mapped[str] = mapped_column(Text, nullable=False)
state: Mapped[str] = mapped_column(Text, nullable=False)
idempotency_key: Mapped[str] = mapped_column(Text, nullable=False, unique=True)
payload: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
resolved_at: Mapped[str | None] = mapped_column(Text, nullable=True)
def __repr__(self) -> str:
return f"<ApprovalRequestRow id={self.id!r} gate_key={self.gate_key!r}>"
# ---------------------------------------------------------------------------
# approval_decisions
# ---------------------------------------------------------------------------
class ApprovalDecisionRow(Base):
"""Human decisions on approval requests."""
__tablename__ = "approval_decisions"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
approval_request_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("approval_requests.id", ondelete="CASCADE"),
nullable=False,
)
action: Mapped[str] = mapped_column(Text, nullable=False)
comment: Mapped[str | None] = mapped_column(Text, nullable=True)
decided_at: Mapped[str] = mapped_column(Text, nullable=False)
idempotency_key: Mapped[str] = mapped_column(Text, nullable=False, unique=True)
def __repr__(self) -> str:
return f"<ApprovalDecisionRow id={self.id!r} action={self.action!r}>"
# ---------------------------------------------------------------------------
# artifacts
# ---------------------------------------------------------------------------
class ArtifactRow(Base):
"""Content-addressed output artifacts from phases."""
__tablename__ = "artifacts"
__table_args__ = (
UniqueConstraint("run_id", "path", "hash", name="uq_artifacts_run_path_hash"),
)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
# phase_id references run_phases.id; CASCADE so artifacts are deleted with the phase.
phase_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=True,
)
path: Mapped[str] = mapped_column(Text, nullable=False)
schema_id: Mapped[str] = mapped_column(Text, nullable=False)
hash: Mapped[str] = mapped_column(Text, nullable=False)
valid: Mapped[bool] = mapped_column(Boolean, nullable=False)
validation_error: Mapped[dict[str, Any] | None] = mapped_column(JSON, nullable=True)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<ArtifactRow id={self.id!r} path={self.path!r} valid={self.valid!r}>"
# ---------------------------------------------------------------------------
# interactive_sessions
# ---------------------------------------------------------------------------
class InteractiveSessionRow(Base):
"""Interactive (non-run) agent sessions."""
__tablename__ = "interactive_sessions"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()))
# FK to agent_personas — RESTRICT prevents deleting a persona that has interactive sessions.
persona_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("agent_personas.id", ondelete="RESTRICT"),
nullable=False,
)
persona_hash: Mapped[str] = mapped_column(Text, nullable=False)
started_at: Mapped[str | None] = mapped_column(Text, nullable=True)
ended_at: Mapped[str | None] = mapped_column(Text, nullable=True)
last_message_at: Mapped[str | None] = mapped_column(Text, nullable=True)
state: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<InteractiveSessionRow id={self.id!r} state={self.state!r}>"
# ---------------------------------------------------------------------------
# tool_calls
# ---------------------------------------------------------------------------
class ToolCallRow(Base):
"""Audit log of every tool invocation (run or interactive)."""
__tablename__ = "tool_calls"
__table_args__ = (Index("tool_calls_run_id_ts_idx", "run_id", "ts"),)
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
# run_id / phase_id / interactive_session_id: exactly one must be non-NULL per row,
# but all three are nullable because tool_calls covers both run and interactive contexts.
# CASCADE ensures audit rows are removed when the parent run or session is deleted.
run_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=True,
)
phase_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=True,
)
interactive_session_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("interactive_sessions.id", ondelete="CASCADE"),
nullable=True,
)
tool_name: Mapped[str] = mapped_column(Text, nullable=False)
args: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
result: Mapped[dict[str, Any] | None] = mapped_column(JSON, nullable=True)
error: Mapped[str | None] = mapped_column(Text, nullable=True)
duration_ms: Mapped[int] = mapped_column(Integer, nullable=False)
ts: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<ToolCallRow id={self.id!r} tool_name={self.tool_name!r}>"
# ---------------------------------------------------------------------------
# llm_calls
# ---------------------------------------------------------------------------
class LlmCallRow(Base):
"""Full LLM call telemetry: tokens, cost, latency, model."""
__tablename__ = "llm_calls"
__table_args__ = (
Index("llm_calls_run_id_ts_idx", "run_id", "ts"),
Index("llm_calls_interactive_session_id_ts_idx", "interactive_session_id", "ts"),
Index("llm_calls_model_ts_idx", "model", "ts"),
)
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
# run_id / phase_id / interactive_session_id: exactly one must be non-NULL per row,
# but all three are nullable because llm_calls covers both run and interactive contexts.
# CASCADE ensures telemetry rows are removed when the parent run or session is deleted.
run_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=True,
)
phase_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=True,
)
interactive_session_id: Mapped[str | None] = mapped_column(
String(36),
ForeignKey("interactive_sessions.id", ondelete="CASCADE"),
nullable=True,
)
thread_id: Mapped[str] = mapped_column(Text, nullable=False)
persona_name: Mapped[str] = mapped_column(Text, nullable=False)
persona_version: Mapped[int] = mapped_column(Integer, nullable=False)
model: Mapped[str] = mapped_column(Text, nullable=False)
role: Mapped[str] = mapped_column(Text, nullable=False)
turn_index: Mapped[int] = mapped_column(Integer, nullable=False)
input_tokens: Mapped[int] = mapped_column(Integer, nullable=False)
output_tokens: Mapped[int] = mapped_column(Integer, nullable=False)
cached_tokens: Mapped[int] = mapped_column(Integer, nullable=False)
reasoning_tokens: Mapped[int] = mapped_column(Integer, nullable=False)
cost_usd_input: Mapped[float] = mapped_column(Float, nullable=False)
cost_usd_output: Mapped[float] = mapped_column(Float, nullable=False)
cost_usd_total: Mapped[float] = mapped_column(Float, nullable=False)
latency_ms: Mapped[int] = mapped_column(Integer, nullable=False)
status: Mapped[str] = mapped_column(Text, nullable=False)
error_code: Mapped[str | None] = mapped_column(Text, nullable=True)
request_id: Mapped[str | None] = mapped_column(Text, nullable=True)
ts: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<LlmCallRow id={self.id!r} model={self.model!r} status={self.status!r}>"
# ---------------------------------------------------------------------------
# model_pricing
# ---------------------------------------------------------------------------
class ModelPricingRow(Base):
"""Cached model pricing data (fetched from provider APIs)."""
__tablename__ = "model_pricing"
model: Mapped[str] = mapped_column(Text, primary_key=True)
input_per_1k_usd: Mapped[float] = mapped_column(Float, nullable=False)
output_per_1k_usd: Mapped[float] = mapped_column(Float, nullable=False)
context_length: Mapped[int] = mapped_column(Integer, nullable=False)
fetched_at: Mapped[str] = mapped_column(Text, nullable=False)
raw_payload: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<ModelPricingRow model={self.model!r}>"
# ---------------------------------------------------------------------------
# budget_ledger
# ---------------------------------------------------------------------------
class BudgetLedgerRow(Base):
"""Per-scope budget tracking (e.g. global, per-run, per-persona)."""
__tablename__ = "budget_ledger"
scope: Mapped[str] = mapped_column(Text, primary_key=True)
spent_usd: Mapped[float] = mapped_column(Float, nullable=False, default=0.0)
cap_usd: Mapped[float | None] = mapped_column(Float, nullable=True)
last_updated: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<BudgetLedgerRow scope={self.scope!r} spent_usd={self.spent_usd!r}>"
# ---------------------------------------------------------------------------
# persona_consents
# ---------------------------------------------------------------------------
class PersonaConsentRow(Base):
"""Persisted persona consent decisions (approve/block)."""
__tablename__ = "persona_consents"
persona_hash: Mapped[str] = mapped_column(Text, primary_key=True)
persona_name: Mapped[str] = mapped_column(Text, nullable=False)
persona_version: Mapped[int] = mapped_column(Integer, nullable=False)
decision: Mapped[str] = mapped_column(Text, nullable=False)
decided_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<PersonaConsentRow persona_hash={self.persona_hash!r} decision={self.decision!r}>"
# ---------------------------------------------------------------------------
# phase_feedback
# ---------------------------------------------------------------------------
class PhaseFeedbackRow(Base):
"""User feedback on completed phases (reaction + optional comment)."""
__tablename__ = "phase_feedback"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
# CASCADE: feedback is deleted when the run is deleted (audit data follows the run lifecycle).
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
# CASCADE: feedback is deleted when the phase is deleted.
phase_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("run_phases.id", ondelete="CASCADE"),
nullable=False,
)
reaction: Mapped[str | None] = mapped_column(Text, nullable=True)
comment: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
def __repr__(self) -> str:
return f"<PhaseFeedbackRow id={self.id!r} run_id={self.run_id!r}>"
# ---------------------------------------------------------------------------
# run_commands (schema-only; used in future steps)
# ---------------------------------------------------------------------------
class RunCommandRow(Base):
"""Queued commands targeting a run (pause, resume, abort, etc.)."""
__tablename__ = "run_commands"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
run_id: Mapped[str] = mapped_column(
String(36),
ForeignKey("runs.id", ondelete="CASCADE"),
nullable=False,
)
command: Mapped[str] = mapped_column(Text, nullable=False)
payload: Mapped[dict[str, Any]] = mapped_column(JSON, nullable=False)
idempotency_key: Mapped[str] = mapped_column(Text, nullable=False, unique=True)
created_at: Mapped[str] = mapped_column(Text, nullable=False)
processed_at: Mapped[str | None] = mapped_column(Text, nullable=True)
def __repr__(self) -> str:
return f"<RunCommandRow id={self.id!r} run_id={self.run_id!r} command={self.command!r}>"

View File

@@ -0,0 +1,154 @@
"""Persona schema + YAML loader + content-addressed hash + consent helpers."""
from __future__ import annotations
from pathlib import Path
from typing import Any, Literal
import yaml
from pydantic import BaseModel, ConfigDict, Field, ValidationInfo, field_validator
from .enums import Backend, Capability, RiskLevel
from .hash import sha256
class FilesystemPermissionSpec(BaseModel):
"""1:1 mapping to deepagents FilesystemPermission TypedDict."""
model_config = ConfigDict(frozen=True, extra="forbid")
operations: tuple[Literal["read", "write", "edit", "ls"], ...] = Field(min_length=1)
paths: tuple[str, ...] = Field(min_length=1)
mode: Literal["allow", "deny"] = "allow"
@field_validator("paths")
@classmethod
def _validate_paths(cls, v: tuple[str, ...]) -> tuple[str, ...]:
for p in v:
if not p.startswith("/"):
raise ValueError(f"path must start with '/': {p!r}")
if "\x00" in p:
raise ValueError(f"path must not contain null bytes: {p!r}")
# Check for literal ".." segment — glob paths like "/**" are OK
segments = p.split("/")
if ".." in segments:
raise ValueError(f"path must not contain '..': {p!r}")
if "~" in p:
raise ValueError(f"path must not contain '~': {p!r}")
return v
class PersonaSubagent(BaseModel):
"""1:1 mapping to deepagents SubAgent TypedDict."""
model_config = ConfigDict(frozen=True, extra="forbid")
name: str = Field(min_length=1)
description: str = Field(min_length=10)
system_prompt: str = Field(min_length=10)
allowed_tools: tuple[str, ...] = Field(default_factory=tuple)
model: str | None = None
permissions: tuple[FilesystemPermissionSpec, ...] = Field(default_factory=tuple)
# deepagents accepts dict[str, Any] for interrupt_on — intentional Any
interrupt_on: dict[str, Any] = Field(default_factory=dict)
class Persona(BaseModel):
"""Persona definition from docs/schemas/personas/<name>@<version>.yaml.
Immutability: list-valued fields are stored as tuples to prevent post-construction
mutation that would invalidate compute_hash(). dict-valued fields (model_params,
interrupt_on) remain dict because they are pass-through to deepagents which expects
``dict[str, Any]``; callers must not mutate them.
"""
model_config = ConfigDict(frozen=True, extra="forbid")
name: str = Field(min_length=1)
version: int = Field(ge=1)
description: str | None = None
backend: Backend
model: str = Field(min_length=1)
provider_origin: str = Field(min_length=1)
capabilities: tuple[Capability, ...] = Field(min_length=1)
max_risk_level: RiskLevel
allowed_roles: tuple[str, ...] | None = None
system_prompt: str = Field(min_length=10)
allowed_tools: tuple[str, ...] | None = None
subagents: tuple[PersonaSubagent, ...] = Field(default_factory=tuple)
permissions: tuple[FilesystemPermissionSpec, ...] = Field(default_factory=tuple)
# deepagents accepts dict[str, Any] for interrupt_on — intentional Any
interrupt_on: dict[str, Any] | None = None
# deepagents accepts dict[str, Any] for model_params — intentional Any
model_params: dict[str, Any] = Field(default_factory=dict)
deepagents_backend: Literal["state", "local_shell", "filesystem", "composite", "langsmith"] = (
"local_shell"
)
skills: tuple[str, ...] = Field(default_factory=tuple)
memory_files: tuple[str, ...] = Field(default_factory=tuple)
fallback_model: str | None = None
max_cost_per_call_usd: float | None = Field(default=None, ge=0)
@field_validator("model")
@classmethod
def _validate_openrouter_model(cls, v: str, info: ValidationInfo) -> str:
backend = info.data.get("backend") if info.data else None
if backend == Backend.OPENROUTER and not v.strip():
raise ValueError("openrouter backend requires non-empty model")
return v
def compute_hash(self) -> str:
"""Content-addressed identity hash (canonical JSON of normalized fields)."""
return sha256(
{
"name": self.name,
"version": self.version,
"backend": self.backend.value,
"model": self.model,
"provider_origin": self.provider_origin,
"capabilities": sorted(c.value for c in self.capabilities),
"max_risk_level": self.max_risk_level.value,
"allowed_roles": (
sorted(self.allowed_roles) if self.allowed_roles is not None else None
),
"system_prompt": self.system_prompt,
"allowed_tools": (
sorted(self.allowed_tools) if self.allowed_tools is not None else None
),
"subagents": [s.model_dump() for s in self.subagents],
"permissions": [p.model_dump() for p in self.permissions],
"interrupt_on": self.interrupt_on,
"model_params": self.model_params,
"deepagents_backend": self.deepagents_backend,
"fallback_model": self.fallback_model,
"max_cost_per_call_usd": self.max_cost_per_call_usd,
"skills": self.skills,
"memory_files": self.memory_files,
}
)
def load_persona_yaml(path: Path) -> Persona:
"""Load and validate a single persona yaml file."""
if not path.is_file():
raise FileNotFoundError(f"persona yaml not found: {path}")
data = yaml.safe_load(path.read_text(encoding="utf-8"))
return Persona.model_validate(data)
def load_personas_from_dir(directory: Path) -> list[Persona]:
"""Load all *.yaml files from a directory, sorted by filename for determinism.
Raises ValueError if the same (name, version) pair appears more than once.
Returns an empty list if the directory does not exist.
"""
if not directory.is_dir():
return []
personas = [load_persona_yaml(p) for p in sorted(directory.glob("*.yaml"))]
seen: dict[tuple[str, int], str] = {}
for p in personas:
key = (p.name, p.version)
if key in seen:
raise ValueError(f"duplicate persona name={p.name!r} version={p.version}")
seen[key] = p.compute_hash()
return personas

View File

@@ -0,0 +1 @@
"""Prompt envelope builder for LangChain messages. Implemented in Step 5."""

View File

View File

@@ -0,0 +1 @@
"""Run event types for streaming progress. Implemented in Step 4."""

View File

@@ -0,0 +1 @@
"""Safety gate for destructive command classification. Implemented in Step 11."""

View File

@@ -0,0 +1,274 @@
"""Build a deepagents CompiledStateGraph from a Persona + run context.
Connects:
- Persona (config) -> deepagents.create_deep_agent(...)
- OpenRouter (model="openrouter:...") -> ChatOpenAI(base_url=openrouter)
- Workspace dir -> LocalShellBackend (filesystem + shell execution)
- Persona.permissions + DEFAULT_DENY -> deepagents.FilesystemPermission list
- Subagents -> deepagents.SubAgent TypedDict list
- Middleware list -> passed to create_deep_agent
"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Any, Literal
from uuid import UUID
from deepagents import FilesystemPermission, SubAgent, create_deep_agent
from deepagents.backends import (
CompositeBackend,
FilesystemBackend,
LocalShellBackend,
StateBackend,
)
from langchain_openai import ChatOpenAI
from .config import Config
from .errors import MyDeepAgentError
from .persona import FilesystemPermissionSpec, Persona, PersonaSubagent
DEFAULT_DENY_PATHS: tuple[str, ...] = (
"/.env*",
"/**/*.env*",
"/**/*token*",
"/**/*secret*",
"/**/*credential*",
"/**/*.pem",
"/**/*.key",
"/.ssh/**",
"/.aws/**",
"/.config/gcloud/**",
"/.kube/**",
"/.gnupg/**",
)
# Mapping from our richer operation set (read/write/edit/ls) to the deepagents
# binary set (read/write). deepagents treats ls/grep/glob as read-side and
# write_file/edit_file as write-side internally, so this collapse is safe.
_OP_MAP: dict[str, Literal["read", "write"]] = {
"read": "read",
"write": "write",
"edit": "write",
"ls": "read",
}
def _map_operations(ops: tuple[str, ...] | list[str]) -> list[Literal["read", "write"]]:
"""Deduplicate-preserve-order mapping of our ops to deepagents ops."""
seen: set[str] = set()
out: list[Literal["read", "write"]] = []
for op in ops:
mapped = _OP_MAP[op]
if mapped not in seen:
seen.add(mapped)
out.append(mapped)
return out
def default_safety_permissions() -> list[FilesystemPermission]:
"""Default-allow paths and deny secret-bearing paths.
Returned permissions are evaluated in order; first match wins.
Allow comes first so reads/writes to the worktree succeed by default;
then explicit denies block the secret patterns no matter what.
"""
return [
FilesystemPermission(
operations=["read", "write"],
paths=["/**"],
mode="allow",
),
FilesystemPermission(
operations=["read", "write"],
paths=list(DEFAULT_DENY_PATHS),
mode="deny",
),
]
def _spec_to_permission(spec: FilesystemPermissionSpec) -> FilesystemPermission:
"""Convert pydantic FilesystemPermissionSpec to deepagents FilesystemPermission.
Our schema accepts {read, write, edit, ls} for human-readable yaml. deepagents
collapses these to {read, write} internally; we apply the same collapse here.
"""
return FilesystemPermission(
operations=_map_operations(spec.operations),
paths=list(spec.paths),
mode=spec.mode,
)
def _subagent_to_dict(sub: PersonaSubagent) -> SubAgent:
"""Convert PersonaSubagent -> deepagents SubAgent TypedDict.
Only includes optional keys when set; deepagents inherits defaults from the parent
agent when a subagent omits ``tools`` / ``model`` / ``permissions`` / ``interrupt_on``.
"""
out: dict[str, Any] = {
"name": sub.name,
"description": sub.description,
"system_prompt": sub.system_prompt,
}
if sub.allowed_tools:
out["tools"] = list(sub.allowed_tools)
if sub.model is not None:
out["model"] = sub.model
if sub.permissions:
out["permissions"] = [_spec_to_permission(p) for p in sub.permissions]
if sub.interrupt_on:
out["interrupt_on"] = sub.interrupt_on
return out # type: ignore[return-value] # TypedDict construction from dict literal
def _resolve_openrouter_api_key(config: Config) -> str:
"""Pull the OpenRouter API key from config -> env -> error.
Priority: config.openrouter_api_key -> MYDEEPAGENT_OPENROUTER_API_KEY -> OPENROUTER_API_KEY.
"""
if config.openrouter_api_key:
return config.openrouter_api_key
env_key = os.environ.get("MYDEEPAGENT_OPENROUTER_API_KEY") or os.environ.get(
"OPENROUTER_API_KEY"
)
if env_key:
return env_key
raise MyDeepAgentError.human_required(
"backend_auth_failed",
message="OpenRouter API key is not configured",
recovery_hint=(
"set MYDEEPAGENT_OPENROUTER_API_KEY in .env or run `mydeepagent login openrouter`"
),
)
def resolve_model_instance(
persona: Persona, config: Config, model_override: str | None = None
) -> Any:
"""Persona -> langchain BaseChatModel instance or 'provider:model' string.
For ``openrouter:`` prefix, returns a ``ChatOpenAI`` with ``base_url=openrouter``.
For other providers (``anthropic:``, ``openai:``, ``google:``), returns the string as-is
so that deepagents' ``init_chat_model`` resolves it via the matching integration package.
"""
model_spec = model_override or persona.model
if model_spec.startswith("openrouter:"):
params = persona.model_params
return ChatOpenAI(
model=model_spec.removeprefix("openrouter:"),
api_key=_resolve_openrouter_api_key(config),
base_url=config.openrouter_base_url,
max_tokens=params.get("max_tokens", 4096),
temperature=params.get("temperature", 0.2),
top_p=params.get("top_p", 1.0),
)
return model_spec
def build_backend(persona: Persona, root_dir: Path) -> Any:
"""Persona.deepagents_backend -> concrete deepagents backend instance.
Returns:
LocalShellBackend for "local_shell" (filesystem + shell execute, the default).
FilesystemBackend for "filesystem" (filesystem only, no shell).
None for "state" (deepagents default StateBackend, in-process state only).
CompositeBackend for "composite" (local_shell + state-backed /memories/ namespace).
Raises:
MyDeepAgentError(fatal, config_invalid) for unknown backend identifiers
or "langsmith" which is reserved for a future milestone.
"""
name = persona.deepagents_backend
if name == "local_shell":
return LocalShellBackend(
root_dir=str(root_dir),
virtual_mode=False,
timeout=120,
max_output_bytes=100_000,
inherit_env=False,
)
if name == "filesystem":
return FilesystemBackend(root_dir=str(root_dir), virtual_mode=False, max_file_size_mb=10)
if name == "state":
return None # deepagents default StateBackend
if name == "composite":
return CompositeBackend(
default=LocalShellBackend(root_dir=str(root_dir), virtual_mode=False),
routes={"/memories/": StateBackend()},
)
raise MyDeepAgentError.fatal(
"config_invalid",
message=f"unsupported deepagents_backend: {name!r}",
recovery_hint="use one of: local_shell, filesystem, state, composite",
)
def build_agent(
persona: Persona,
config: Config,
*,
root_dir: Path,
middleware: list[Any] | None = None,
checkpointer: Any | None = None,
run_id: UUID | None = None,
phase_key: str | None = None,
model_override: str | None = None,
) -> Any:
"""Construct a deepagents CompiledStateGraph for the given persona.
Returns a CompiledStateGraph. Caller invokes via
``agent.invoke / ainvoke / astream / astream_events`` with ``{"messages": [...]}`` input.
deepagents 0.6.1 limitation: FilesystemPermission is rejected when the backend
implements SandboxBackendProtocol (e.g. LocalShellBackend). SafetyShellMiddleware
enforces path + destructive-command safety in those cases instead.
"""
from .middleware.safety import SafetyShellMiddleware
model = resolve_model_instance(persona, config, model_override)
backend = build_backend(persona, root_dir)
# SafetyShellMiddleware is always first; caller-supplied middleware appends.
all_middleware: list[Any] = [SafetyShellMiddleware()]
if middleware:
all_middleware.extend(middleware)
subagents: list[SubAgent] = [_subagent_to_dict(s) for s in persona.subagents]
kwargs: dict[str, Any] = {
"model": model,
"system_prompt": persona.system_prompt,
"middleware": all_middleware,
}
if backend is not None:
kwargs["backend"] = backend
# deepagents 0.6.1: FilesystemPermission + SandboxBackendProtocol backend raises
# NotImplementedError. Skip permissions kwarg for local_shell; SafetyShellMiddleware
# handles path enforcement instead. Other backends (state, filesystem, composite)
# still use the deepagents permissions system.
use_permissions = persona.deepagents_backend != "local_shell"
if use_permissions:
permissions: list[FilesystemPermission] = [
*(_spec_to_permission(p) for p in persona.permissions),
*default_safety_permissions(),
]
kwargs["permissions"] = permissions
if persona.allowed_tools:
kwargs["tools"] = list(persona.allowed_tools)
if subagents:
kwargs["subagents"] = subagents
if persona.interrupt_on:
kwargs["interrupt_on"] = persona.interrupt_on
if checkpointer is not None:
kwargs["checkpointer"] = checkpointer
if persona.skills:
kwargs["skills"] = list(persona.skills)
if persona.memory_files:
kwargs["memory"] = list(persona.memory_files)
return create_deep_agent(**kwargs)

View File

@@ -0,0 +1 @@
"""Slash command registry and dispatcher. Implemented in Step 10."""

View File

@@ -0,0 +1 @@
"""TUI approval dialog for human-in-the-loop actions. Implemented in Step 7."""

View File

@@ -0,0 +1 @@
"""TUI Rich panel and table renderers. Implemented in Step 10."""

View File

@@ -0,0 +1 @@
"""TUI streaming output renderer for run events. Implemented in Step 10."""

View File

@@ -0,0 +1,127 @@
"""WorkflowTemplate schema + YAML loader."""
from __future__ import annotations
from collections import Counter
from pathlib import Path
import yaml
from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator
from .enums import Backend, Capability, RiskLevel
from .hash import sha256
class ExpectedArtifact(BaseModel):
"""Expected output artifact of a workflow phase."""
model_config = ConfigDict(frozen=True, extra="forbid", populate_by_name=True)
path: str = Field(min_length=1)
# yaml uses 'schema' key; pydantic attribute is schema_id to avoid shadowing BaseModel.schema
schema_id: str = Field(min_length=1, alias="schema")
class WorkflowPhase(BaseModel):
"""Single phase definition inside a workflow template."""
model_config = ConfigDict(frozen=True, extra="forbid")
key: str = Field(min_length=1, pattern=r"^[a-z][a-z0-9_]*$")
title: str = Field(min_length=1)
risk: RiskLevel
role: str = Field(min_length=1)
expected_artifact: ExpectedArtifact | None = None
gates: tuple[str, ...] = Field(default_factory=tuple)
timeout_seconds: int | None = Field(default=None, ge=1)
instructions: str = Field(min_length=10)
max_budget_usd: float | None = Field(default=None, ge=0)
class WorkflowRole(BaseModel):
"""Role definition: what capabilities a bound persona must have."""
model_config = ConfigDict(frozen=True, extra="forbid")
id: str = Field(min_length=1, pattern=r"^[a-z][a-z0-9_]*$")
required_capabilities: tuple[Capability, ...] = Field(min_length=1)
preferred_backends: tuple[Backend, ...] = Field(default_factory=tuple)
fallback_personas: tuple[str, ...] = Field(default_factory=tuple)
class WorkflowTemplate(BaseModel):
"""Complete workflow template loaded from docs/schemas/workflows/<name>@<version>.yaml."""
model_config = ConfigDict(frozen=True, extra="forbid")
name: str = Field(min_length=1)
version: int = Field(ge=1)
description: str | None = None
roles: tuple[WorkflowRole, ...] = Field(min_length=1)
phases: tuple[WorkflowPhase, ...] = Field(min_length=1)
default_gates: tuple[str, ...] = Field(default_factory=tuple)
max_total_budget_usd: float | None = Field(default=None, ge=0)
@model_validator(mode="after")
def _validate_phase_roles(self) -> WorkflowTemplate:
role_ids = {r.id for r in self.roles}
for ph in self.phases:
if ph.role not in role_ids:
raise ValueError(f"phase '{ph.key}' references unknown role '{ph.role}'")
return self
@model_validator(mode="after")
def _validate_unique_phase_keys(self) -> WorkflowTemplate:
counts = Counter(ph.key for ph in self.phases)
duplicates = sorted(k for k, c in counts.items() if c > 1)
if duplicates:
raise ValueError(f"duplicate phase keys: {duplicates}")
return self
@field_validator("roles")
@classmethod
def _validate_unique_role_ids(cls, v: tuple[WorkflowRole, ...]) -> tuple[WorkflowRole, ...]:
counts = Counter(r.id for r in v)
duplicates = sorted(k for k, c in counts.items() if c > 1)
if duplicates:
raise ValueError(f"duplicate role ids: {duplicates}")
return v
def compute_hash(self) -> str:
"""Content-addressed identity hash of this template."""
return sha256(
{
"name": self.name,
"version": self.version,
"roles": [r.model_dump() for r in self.roles],
"phases": [ph.model_dump(by_alias=True) for ph in self.phases],
"default_gates": sorted(self.default_gates),
"max_total_budget_usd": self.max_total_budget_usd,
}
)
def load_workflow_yaml(path: Path) -> WorkflowTemplate:
"""Load and validate a single workflow yaml file."""
if not path.is_file():
raise FileNotFoundError(f"workflow yaml not found: {path}")
data = yaml.safe_load(path.read_text(encoding="utf-8"))
return WorkflowTemplate.model_validate(data)
def load_workflows_from_dir(directory: Path) -> list[WorkflowTemplate]:
"""Load all *.yaml workflow files from a directory, sorted by filename.
Raises ValueError if the same (name, version) pair appears more than once.
Returns an empty list if the directory does not exist.
"""
if not directory.is_dir():
return []
workflows = [load_workflow_yaml(p) for p in sorted(directory.glob("*.yaml"))]
seen: set[tuple[str, int]] = set()
for w in workflows:
key = (w.name, w.version)
if key in seen:
raise ValueError(f"duplicate workflow name={w.name!r} version={w.version}")
seen.add(key)
return workflows

View File

View File

View File

@@ -0,0 +1,78 @@
"""Integration tests for src/my_deepagent/persistence/checkpointer.py."""
from __future__ import annotations
import sqlite3
from pathlib import Path
from my_deepagent.persistence.checkpointer import get_checkpointer_ctx
class TestGetCheckpointerCtx:
"""Tests for the get_checkpointer_ctx context manager."""
def test_ctx_yields_saver_and_cleans_up(self, tmp_path: Path) -> None:
"""Entering the context yields a SqliteSaver; exiting releases the connection."""
db_path = tmp_path / "ck.db"
with get_checkpointer_ctx(db_path) as saver:
assert saver is not None
# The DB file must exist while inside the context.
assert db_path.exists()
# After context exit the file must still exist (not deleted).
assert db_path.exists()
def test_db_file_created_on_enter(self, tmp_path: Path) -> None:
"""The sqlite file is created when the context is entered."""
db_path = tmp_path / "nested" / "dir" / "ck.db"
assert not db_path.exists()
with get_checkpointer_ctx(db_path):
assert db_path.exists()
def test_parent_dir_created_if_missing(self, tmp_path: Path) -> None:
"""Parent directory is created automatically even if it does not exist."""
db_path = tmp_path / "a" / "b" / "c" / "ck.db"
assert not db_path.parent.exists()
with get_checkpointer_ctx(db_path):
assert db_path.parent.exists()
def test_connection_released_after_ctx_exit(self, tmp_path: Path) -> None:
"""After exiting the context manager, another process/connection can open the DB."""
db_path = tmp_path / "ck.db"
with get_checkpointer_ctx(db_path):
pass # enter and exit
# If the connection were leaked (not closed), WAL mode can still allow reads,
# but we verify by opening with a fresh sqlite3 connection — this must succeed.
with sqlite3.connect(str(db_path)) as conn:
cur = conn.execute("SELECT name FROM sqlite_master WHERE type='table'")
# LangGraph creates its checkpoint tables; result must be a list (not error).
tables = [row[0] for row in cur.fetchall()]
assert isinstance(tables, list)
def test_meta_and_checkpoint_db_no_lock_conflict(self, tmp_path: Path) -> None:
"""Using two separate DB files in the same directory causes no locking conflict."""
meta_db = tmp_path / "meta.db"
ck_db = tmp_path / "checkpoints.db"
# Simulate concurrent use: open both within the same scope.
with get_checkpointer_ctx(ck_db) as saver:
# Write something to the meta DB while the checkpointer holds its connection.
with sqlite3.connect(str(meta_db)) as conn:
conn.execute("CREATE TABLE IF NOT EXISTS kv (k TEXT PRIMARY KEY, v TEXT)")
conn.execute("INSERT OR REPLACE INTO kv VALUES ('key', 'value')")
conn.commit()
assert saver is not None
# Both files must exist and be independently readable.
assert meta_db.exists()
assert ck_db.exists()
with sqlite3.connect(str(meta_db)) as conn:
row = conn.execute("SELECT v FROM kv WHERE k='key'").fetchone()
assert row is not None
assert row[0] == "value"

View File

@@ -0,0 +1,143 @@
"""Real OpenRouter API smoke test. Costs ~$0.001-$0.003 per full run.
Skipped automatically when no API key is configured.
Uses deepseek/deepseek-chat (cheapest available) with max_tokens=50.
"""
from __future__ import annotations
import os
from pathlib import Path
import pytest
from my_deepagent.config import load_config
from my_deepagent.persona import Persona
from my_deepagent.session import resolve_model_instance
_HAS_KEY = (
bool(os.environ.get("MYDEEPAGENT_OPENROUTER_API_KEY") or os.environ.get("OPENROUTER_API_KEY"))
or Path(".env").is_file()
)
pytestmark = [
pytest.mark.integration,
pytest.mark.skipif(not _HAS_KEY, reason="no OpenRouter API key configured"),
]
def _smoke_persona() -> Persona:
return Persona.model_validate(
{
"name": "smoke-test",
"version": 1,
"backend": "openrouter",
"model": "openrouter:deepseek/deepseek-chat",
"provider_origin": "China/DeepSeek",
"capabilities": ["evidence_check"],
"max_risk_level": "low",
"system_prompt": (
"You are a smoke-test echo bot. Reply only with the literal token 'OK'."
),
"model_params": {"max_tokens": 50, "temperature": 0.0},
# deepagents 0.6.x: local_shell backend + permissions 동시 사용 시
# NotImplementedError 발생. state 백엔드는 permissions 제약 없음.
"deepagents_backend": "state",
}
)
def _smoke_persona_local_shell() -> Persona:
return Persona.model_validate(
{
"name": "smoke-test-local-shell",
"version": 1,
"backend": "openrouter",
"model": "openrouter:deepseek/deepseek-chat",
"provider_origin": "China/DeepSeek",
"capabilities": ["evidence_check"],
"max_risk_level": "low",
"system_prompt": (
"You are a smoke-test echo bot. Reply only with the literal token 'OK'."
),
"model_params": {"max_tokens": 50, "temperature": 0.0},
# local_shell backend: SafetyShellMiddleware enforces path + destructive-command
# policy; permissions kwarg is skipped to avoid deepagents 0.6.1 NotImplementedError.
"deepagents_backend": "local_shell",
}
)
def test_openrouter_chat_completion_returns_response() -> None:
"""ChatOpenAI 인스턴스로 1회 호출하여 OpenRouter base_url + auth + 응답 흐름 검증."""
config = load_config()
persona = _smoke_persona()
chat = resolve_model_instance(persona, config)
response = chat.invoke(
[
("system", persona.system_prompt),
("user", "Reply with the exact string 'OK' and nothing else."),
]
)
assert response is not None
content = response.content
# langchain BaseMessage.content는 str | list[content_block_dict]
if isinstance(content, str):
assert len(content) > 0
else:
assert len(content) > 0
def test_openrouter_usage_metadata_present() -> None:
"""response.usage_metadata가 input_tokens/output_tokens를 채워야 cost 계측 가능."""
config = load_config()
persona = _smoke_persona()
chat = resolve_model_instance(persona, config)
response = chat.invoke(
[
("system", persona.system_prompt),
("user", "Reply with 'OK'."),
]
)
usage = getattr(response, "usage_metadata", None)
assert usage is not None, "OpenRouter response must include usage_metadata"
assert usage.get("input_tokens", 0) > 0
assert usage.get("output_tokens", 0) > 0
def test_openrouter_deepagents_create_smoke() -> None:
"""deepagents create_deep_agent + 실 OpenRouter 호출 1회. 가장 비싼 검증."""
config = load_config()
persona = _smoke_persona()
from my_deepagent.session import build_agent
agent = build_agent(persona, config, root_dir=Path.cwd())
result = agent.invoke({"messages": [{"role": "user", "content": "Reply with 'OK' only."}]})
messages = result.get("messages", [])
assert len(messages) > 0
last = messages[-1]
content = getattr(last, "content", "")
if isinstance(content, list):
content = " ".join(str(c) for c in content)
assert len(str(content)) > 0
def test_openrouter_deepagents_local_shell_smoke(tmp_path: Path) -> None:
"""Real OpenRouter call via deepagents + LocalShellBackend + SafetyShellMiddleware.
Verifies deepagents 0.6.1 workaround: local_shell backend with permissions kwarg
skipped, SafetyShellMiddleware automatically injected by build_agent.
"""
config = load_config()
persona = _smoke_persona_local_shell()
from my_deepagent.session import build_agent
agent = build_agent(persona, config, root_dir=tmp_path)
result = agent.invoke({"messages": [{"role": "user", "content": "Reply 'OK' only."}]})
messages = result.get("messages", [])
assert len(messages) > 0
last = messages[-1]
content = getattr(last, "content", "")
if isinstance(content, list):
content = " ".join(str(c) for c in content)
assert len(str(content)) > 0

View File

@@ -0,0 +1,670 @@
"""Integration tests for src/my_deepagent/persistence/ (DB engine + ORM models)."""
from __future__ import annotations
import subprocess
import sys
import uuid
from pathlib import Path
from typing import Any
import pytest
import pytest_asyncio
from sqlalchemy import text
from sqlalchemy.exc import IntegrityError
from my_deepagent.persistence.db import Database
from my_deepagent.persistence.models import (
AgentPersonaRow,
RunEventRow,
RunInputRow,
RunPhaseRow,
RunRow,
WorkflowTemplateRow,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_NOW = "2026-05-15T00:00:00+00:00"
def _make_id() -> str:
return str(uuid.uuid4())
def _workflow_template_row(template_id: str) -> WorkflowTemplateRow:
"""Return a WorkflowTemplateRow that satisfies the runs.template_id FK."""
return WorkflowTemplateRow(
id=template_id,
name="test-wf",
version=1,
hash=template_id, # unique per invocation
definition={},
created_at=_NOW,
)
def _run_row(run_id: str | None = None, template_id: str | None = None) -> RunRow:
rid = run_id or _make_id()
tid = template_id or _make_id()
return RunRow(
id=rid,
template_id=tid,
template_hash="a" * 64,
state="pending",
repo_path="/repo",
base_branch="main",
worktree_root="/wt",
created_at=_NOW,
updated_at=_NOW,
)
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture()
def db_url(tmp_path: Path) -> str:
return f"sqlite+aiosqlite:///{tmp_path}/test.db"
@pytest_asyncio.fixture()
async def db(db_url: str) -> Database: # type: ignore[misc]
database = Database(db_url)
await database.init_schema()
yield database # type: ignore[misc]
await database.dispose()
# ---------------------------------------------------------------------------
# A.1: All 18 tables exist after init_schema
# ---------------------------------------------------------------------------
EXPECTED_TABLES = {
"workflow_templates",
"agent_personas",
"runs",
"run_inputs",
"run_bindings",
"run_phases",
"run_events",
"approval_requests",
"approval_decisions",
"artifacts",
"interactive_sessions",
"tool_calls",
"llm_calls",
"model_pricing",
"budget_ledger",
"persona_consents",
"phase_feedback",
"run_commands",
}
@pytest.mark.asyncio
async def test_init_schema_creates_all_tables(db: Database) -> None:
"""All expected tables must exist in sqlite_master after init_schema."""
async with db.session() as session:
result = await session.execute(
text("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
)
table_names = {row[0] for row in result.fetchall()}
table_names.discard("alembic_version")
assert EXPECTED_TABLES <= table_names, f"Missing tables: {EXPECTED_TABLES - table_names}"
# ---------------------------------------------------------------------------
# A.2: WAL mode active
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_wal_mode_active(db: Database) -> None:
"""journal_mode PRAGMA must return 'wal' after connection."""
async with db.session() as session:
result = await session.execute(text("PRAGMA journal_mode"))
mode = result.scalar()
assert mode == "wal", f"Expected 'wal', got {mode!r}"
# ---------------------------------------------------------------------------
# A.3: busy_timeout active
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_busy_timeout_active(db: Database) -> None:
"""busy_timeout PRAGMA must return 5000."""
async with db.session() as session:
result = await session.execute(text("PRAGMA busy_timeout"))
timeout = result.scalar()
assert timeout == 5000, f"Expected 5000, got {timeout!r}"
# ---------------------------------------------------------------------------
# A.4: foreign_keys active
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_foreign_keys_active(db: Database) -> None:
"""foreign_keys PRAGMA must return 1."""
async with db.session() as session:
result = await session.execute(text("PRAGMA foreign_keys"))
fk = result.scalar()
assert fk == 1, f"Expected 1, got {fk!r}"
# ---------------------------------------------------------------------------
# A.5: basic insert + select round-trip
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_run_row_insert_and_select(db: Database) -> None:
"""RunRow insert then SELECT must return the same state."""
rid = _make_id()
tid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
async with db.session() as session:
fetched = await session.get(RunRow, rid)
assert fetched is not None
assert fetched.id == rid
assert fetched.state == "pending"
@pytest.mark.asyncio
async def test_agent_persona_row_insert_and_select(db: Database) -> None:
"""AgentPersonaRow insert then SELECT must return the same record."""
persona_id = _make_id()
persona = AgentPersonaRow(
id=persona_id,
name="test-persona",
version=1,
hash="b" * 64,
definition={"model": "test"},
created_at=_NOW,
)
async with db.session() as session:
session.add(persona)
async with db.session() as session:
fetched = await session.get(AgentPersonaRow, persona_id)
assert fetched is not None
assert fetched.name == "test-persona"
assert fetched.version == 1
# ---------------------------------------------------------------------------
# A.6: UNIQUE constraint — workflow_templates.hash duplicate
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_workflow_template_hash_unique_constraint(db: Database) -> None:
"""Inserting two WorkflowTemplateRows with the same hash must raise IntegrityError."""
def make_template(tid: str) -> WorkflowTemplateRow:
return WorkflowTemplateRow(
id=tid,
name="my-wf",
version=1,
hash="c" * 64, # same hash for both
definition={},
created_at=_NOW,
)
t1 = make_template(_make_id())
async with db.session() as session:
session.add(t1)
t2 = make_template(_make_id())
with pytest.raises(IntegrityError):
async with db.session() as session:
session.add(t2)
# ---------------------------------------------------------------------------
# A.7: FK CASCADE — RunRow delete cascades to RunInputRow
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_fk_cascade_run_delete_cascades_run_input(db: Database) -> None:
"""Deleting a RunRow must cascade-delete its RunInputRow."""
rid = _make_id()
tid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
inp = RunInputRow(
id=_make_id(),
run_id=rid,
requirements_md="# Requirements",
objective={"goal": "test"},
extra={},
input_hash="d" * 64,
)
# Insert parent and child in the same transaction so FK is satisfied.
async with db.session() as session:
session.add(template)
await session.flush() # persist template before run references it
session.add(run)
await session.flush() # persist run before inp references it
session.add(inp)
async with db.session() as session:
fetched_run = await session.get(RunRow, rid)
assert fetched_run is not None
await session.delete(fetched_run)
async with db.session() as session:
result = await session.execute(
text("SELECT id FROM run_inputs WHERE run_id = :rid"),
{"rid": rid},
)
rows = result.fetchall()
assert rows == [], f"Expected cascade delete of run_inputs, got {rows}"
# ---------------------------------------------------------------------------
# A.8: JSON column round-trip
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_json_column_round_trip(db: Database) -> None:
"""RunEventRow.payload nested dict must survive DB round-trip intact."""
rid = _make_id()
tid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
payload: dict[str, Any] = {
"nested": {"list": [1, 2, 3], "flag": True},
"msg": "hello",
}
event = RunEventRow(
run_id=rid,
seq=1,
type="phase_started",
payload=payload,
idempotency_key="idem-1",
ts=_NOW,
)
async with db.session() as session:
session.add(template)
await session.flush() # persist template before run references it
session.add(run)
await session.flush() # persist run before event references it
session.add(event)
async with db.session() as session:
result = await session.execute(
text("SELECT payload FROM run_events WHERE run_id = :rid"), {"rid": rid}
)
raw = result.scalar()
import json as _json
restored = _json.loads(raw) if isinstance(raw, str) else raw
assert restored == payload
# ---------------------------------------------------------------------------
# A.9: UUID string column round-trip
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_uuid_column_round_trip(db: Database) -> None:
"""UUID primary key stored as string must compare equal after retrieval."""
expected_id = str(uuid.uuid4())
tid = _make_id()
template = _workflow_template_row(tid)
run = RunRow(
id=expected_id,
template_id=tid,
template_hash="e" * 64,
state="running",
repo_path="/r",
base_branch="main",
worktree_root="/w",
created_at=_NOW,
updated_at=_NOW,
)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
async with db.session() as session:
fetched = await session.get(RunRow, expected_id)
assert fetched is not None
assert fetched.id == expected_id
# ---------------------------------------------------------------------------
# A.10: UNIQUE(run_id, seq) on run_events
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_run_events_unique_run_seq(db: Database) -> None:
"""Two RunEventRows with the same (run_id, seq) must raise IntegrityError."""
rid = _make_id()
tid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
await session.flush()
session.add(
RunEventRow(
run_id=rid,
seq=1,
type="x",
payload={},
idempotency_key="key-a",
ts=_NOW,
)
)
with pytest.raises(IntegrityError):
async with db.session() as session:
session.add(
RunEventRow(
run_id=rid,
seq=1, # same seq → collision on (run_id, seq)
type="x",
payload={},
idempotency_key="key-b",
ts=_NOW,
)
)
# ---------------------------------------------------------------------------
# A.11: UNIQUE(run_id, idempotency_key) on run_events
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_run_events_unique_idempotency_key(db: Database) -> None:
"""Two RunEventRows with the same (run_id, idempotency_key) must raise IntegrityError."""
rid = _make_id()
tid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
await session.flush()
session.add(
RunEventRow(
run_id=rid,
seq=1,
type="x",
payload={},
idempotency_key="shared-key",
ts=_NOW,
)
)
with pytest.raises(IntegrityError):
async with db.session() as session:
session.add(
RunEventRow(
run_id=rid,
seq=2, # different seq
type="x",
payload={},
idempotency_key="shared-key", # same idem key → collision
ts=_NOW,
)
)
# ---------------------------------------------------------------------------
# A.12: Index existence on run_events
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_run_events_index_exists(db: Database) -> None:
"""The run_events_run_id_ts_idx index must exist in sqlite_master."""
async with db.session() as session:
result = await session.execute(
text(
"SELECT name FROM sqlite_master "
"WHERE type='index' AND name='run_events_run_id_ts_idx'"
)
)
names = [row[0] for row in result.fetchall()]
assert "run_events_run_id_ts_idx" in names
# ---------------------------------------------------------------------------
# A.13: dispose + new session works
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_dispose_and_reconnect(db_url: str) -> None:
"""After dispose(), creating a new Database and querying must succeed."""
db1 = Database(db_url)
await db1.init_schema()
await db1.dispose()
db2 = Database(db_url)
async with db2.session() as session:
result = await session.execute(
text("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
)
tables = [row[0] for row in result.fetchall()]
await db2.dispose()
assert "runs" in tables
# ---------------------------------------------------------------------------
# A.14: Alembic upgrade head produces valid schema
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_alembic_upgrade_head_produces_valid_schema(tmp_path: Path) -> None:
"""Running alembic upgrade head on a fresh DB must create the expected tables."""
db_path = tmp_path / "alembic_test.db"
db_url = f"sqlite:///{db_path}" # sync URL for alembic env.py
project_root = Path(__file__).parent.parent.parent
result = subprocess.run(
[
sys.executable,
"-m",
"alembic",
"upgrade",
"head",
],
cwd=str(project_root),
env={**__import__("os").environ, "DATABASE_URL": db_url},
capture_output=True,
text=True,
)
assert result.returncode == 0, (
f"alembic upgrade head failed:\nSTDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}"
)
import sqlite3
with sqlite3.connect(str(db_path)) as conn:
cur = conn.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
tables = {row[0] for row in cur.fetchall()}
tables.discard("alembic_version")
assert EXPECTED_TABLES <= tables, f"Missing after alembic upgrade: {EXPECTED_TABLES - tables}"
# ---------------------------------------------------------------------------
# P0-1: partial unique index ux_active_run_repo_base
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_active_run_unique_index_blocks_duplicate(db: Database) -> None:
"""Two active runs with the same (repo_path, base_branch) must raise IntegrityError."""
tid = _make_id()
template = _workflow_template_row(tid)
rid1 = _make_id()
run1 = _run_row(rid1, template_id=tid)
run1.state = "running"
rid2 = _make_id()
run2 = _run_row(rid2, template_id=tid)
run2.state = "pending"
# Same repo_path and base_branch — both active → must violate unique index.
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run1)
with pytest.raises(IntegrityError):
async with db.session() as session:
session.add(run2)
@pytest.mark.asyncio
async def test_active_run_unique_index_allows_completed(db: Database) -> None:
"""A completed run allows a new active run with the same (repo_path, base_branch)."""
tid = _make_id()
template = _workflow_template_row(tid)
rid1 = _make_id()
run1 = _run_row(rid1, template_id=tid)
run1.state = "completed"
rid2 = _make_id()
run2 = _run_row(rid2, template_id=tid)
run2.state = "running"
# Same repo/branch; run1 is completed (excluded) → run2 must succeed.
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run1)
async with db.session() as session:
session.add(run2)
async with db.session() as session:
fetched = await session.get(RunRow, rid2)
assert fetched is not None
assert fetched.state == "running"
# ---------------------------------------------------------------------------
# P0-3: FK CASCADE — RunRow delete cascades to all audit children
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_fk_cascade_run_delete_cascades_phase_feedback(db: Database) -> None:
"""Deleting a RunRow cascades to phase_feedback and run_phases rows."""
from my_deepagent.persistence.models import PhaseFeedbackRow
tid = _make_id()
rid = _make_id()
phase_id = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
phase = RunPhaseRow(
id=phase_id,
run_id=rid,
phase_key="plan",
seq=1,
state="completed",
attempts=1,
)
feedback = PhaseFeedbackRow(
run_id=rid,
phase_id=phase_id,
reaction="thumbs_up",
created_at=_NOW,
)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
await session.flush()
session.add(phase)
await session.flush()
session.add(feedback)
async with db.session() as session:
fetched_run = await session.get(RunRow, rid)
assert fetched_run is not None
await session.delete(fetched_run)
async with db.session() as session:
fb_result = await session.execute(
text("SELECT id FROM phase_feedback WHERE run_id = :rid"), {"rid": rid}
)
ph_result = await session.execute(
text("SELECT id FROM run_phases WHERE run_id = :rid"), {"rid": rid}
)
assert fb_result.fetchall() == [], "phase_feedback must cascade-delete with run"
assert ph_result.fetchall() == [], "run_phases must cascade-delete with run"
# ---------------------------------------------------------------------------
# P0-3: FK RESTRICT — deleting WorkflowTemplateRow with runs is blocked
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_fk_restrict_template_delete_blocked_by_run(db: Database) -> None:
"""Deleting a WorkflowTemplateRow that has a referencing RunRow must raise IntegrityError."""
tid = _make_id()
rid = _make_id()
template = _workflow_template_row(tid)
run = _run_row(rid, template_id=tid)
async with db.session() as session:
session.add(template)
await session.flush()
session.add(run)
with pytest.raises(IntegrityError):
async with db.session() as session:
fetched = await session.get(WorkflowTemplateRow, tid)
assert fetched is not None
await session.delete(fetched)
# ---------------------------------------------------------------------------
# P0-1: partial unique index exists in sqlite_master after init_schema
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_active_run_partial_index_exists_in_schema(db: Database) -> None:
"""ux_active_run_repo_base partial unique index must exist after init_schema."""
async with db.session() as session:
result = await session.execute(
text(
"SELECT sql FROM sqlite_master "
"WHERE type='index' AND name='ux_active_run_repo_base'"
)
)
row = result.fetchone()
assert row is not None, "ux_active_run_repo_base index missing from sqlite_master"
assert "WHERE" in (row[0] or ""), f"Expected WHERE clause in index SQL, got: {row[0]}"

View File

View File

@@ -0,0 +1,391 @@
"""Unit tests for src/my_deepagent/artifact_schema.py."""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
import pytest
from my_deepagent.artifact_schema import (
ArtifactSchemaRegistry,
ValidationFinding,
ValidationResult,
)
from my_deepagent.errors import MyDeepAgentError
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
REPO_ROOT = Path(__file__).parent.parent.parent
SEED_ROOT = REPO_ROOT / "docs" / "schemas" / "artifacts"
SEED_SCHEMA_IDS = [
"common/final-report@1",
"dev/phase-plan@1",
"dev/review-finding-batch@1",
"dev/spec@1",
]
@pytest.fixture
def seed_registry() -> ArtifactSchemaRegistry:
return ArtifactSchemaRegistry(roots=[SEED_ROOT])
@pytest.fixture
def valid_spec() -> dict[str, Any]:
return {
"runId": "00000000-0000-4000-8000-000000000000",
"phaseKey": "spec",
"requirements": "User wants a CLI tool that analyzes log files.",
"acceptance_criteria": ["parses .log files", "outputs JSON summary"],
"approach": "Build a typer-based CLI using regex and json output.",
"risks": ["log format variations may break parser"],
}
# ---------------------------------------------------------------------------
# 1. Seed schema load success (4 schemas)
# ---------------------------------------------------------------------------
@pytest.mark.parametrize("schema_id", SEED_SCHEMA_IDS)
def test_seed_schema_loads(seed_registry: ArtifactSchemaRegistry, schema_id: str) -> None:
schema = seed_registry.load(schema_id)
assert isinstance(schema, dict)
assert schema.get("$id") == schema_id
# ---------------------------------------------------------------------------
# 2. Load result caching — same dict object on second call
# ---------------------------------------------------------------------------
def test_load_caches_same_object(seed_registry: ArtifactSchemaRegistry) -> None:
first = seed_registry.load("dev/spec@1")
second = seed_registry.load("dev/spec@1")
assert first is second
# ---------------------------------------------------------------------------
# 3. Unknown schema_id → artifact_schema_unknown
# ---------------------------------------------------------------------------
def test_unknown_schema_id_raises(seed_registry: ArtifactSchemaRegistry) -> None:
with pytest.raises(MyDeepAgentError) as exc_info:
seed_registry.load("dev/nonexistent@99")
assert exc_info.value.code == "artifact_schema_unknown"
# ---------------------------------------------------------------------------
# 4. Invalid schema_id format (no slash) → artifact_schema_unknown
# ---------------------------------------------------------------------------
def test_invalid_schema_id_no_slash(seed_registry: ArtifactSchemaRegistry) -> None:
with pytest.raises(MyDeepAgentError) as exc_info:
seed_registry.load("foo")
assert exc_info.value.code == "artifact_schema_unknown"
# ---------------------------------------------------------------------------
# 5. schema_id starting with "/" → rejected (no slash separating domain/name)
# ---------------------------------------------------------------------------
def test_invalid_schema_id_leading_slash(seed_registry: ArtifactSchemaRegistry) -> None:
# "/foo/bar" has a slash but the domain portion would be empty
# After splitting on "/", domain="" which is not a valid domain/name pair.
# The registry treats it as a path traversal risk: Path("/foo/bar.json")
# is absolute and will never exist under a root directory (is_file() → False).
with pytest.raises(MyDeepAgentError) as exc_info:
seed_registry.load("/dev/spec@1")
assert exc_info.value.code == "artifact_schema_unknown"
# ---------------------------------------------------------------------------
# 6. Empty schema_id → artifact_schema_unknown
# ---------------------------------------------------------------------------
def test_empty_schema_id_raises(seed_registry: ArtifactSchemaRegistry) -> None:
with pytest.raises(MyDeepAgentError) as exc_info:
seed_registry.load("")
assert exc_info.value.code == "artifact_schema_unknown"
# ---------------------------------------------------------------------------
# 7. Fallback: schema absent in first root, present in second
# ---------------------------------------------------------------------------
def test_fallback_to_second_root(tmp_path: Path) -> None:
first_root = tmp_path / "first"
first_root.mkdir()
second_root = tmp_path / "second"
(second_root / "dev").mkdir(parents=True)
schema: dict[str, Any] = {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "dev/thing@1",
"type": "object",
}
(second_root / "dev" / "thing@1.json").write_text(json.dumps(schema), encoding="utf-8")
registry = ArtifactSchemaRegistry(roots=[first_root, second_root])
loaded = registry.load("dev/thing@1")
assert loaded["$id"] == "dev/thing@1"
# ---------------------------------------------------------------------------
# 8. validate with valid data → ok=True
# ---------------------------------------------------------------------------
def test_validate_valid_spec(
seed_registry: ArtifactSchemaRegistry, valid_spec: dict[str, Any]
) -> None:
result = seed_registry.validate("dev/spec@1", valid_spec)
assert result.ok is True
assert result.errors == ()
# ---------------------------------------------------------------------------
# 9. validate with invalid data → ok=False, findings non-empty
# ---------------------------------------------------------------------------
def test_validate_invalid_data_returns_findings(
seed_registry: ArtifactSchemaRegistry,
) -> None:
result = seed_registry.validate("dev/spec@1", {"wrong": "data"})
assert result.ok is False
assert len(result.errors) > 0
for finding in result.errors:
assert isinstance(finding, ValidationFinding)
# ---------------------------------------------------------------------------
# 10. Missing required field → validator="required", path correct
# ---------------------------------------------------------------------------
def test_validate_missing_required_field(
seed_registry: ArtifactSchemaRegistry, valid_spec: dict[str, Any]
) -> None:
data = {k: v for k, v in valid_spec.items() if k != "requirements"}
result = seed_registry.validate("dev/spec@1", data)
assert result.ok is False
required_findings = [f for f in result.errors if f.validator == "required"]
assert any("requirements" in f.message for f in required_findings)
# ---------------------------------------------------------------------------
# 11. Invalid enum value → validator="enum", expected has enum list
# ---------------------------------------------------------------------------
def test_validate_invalid_enum_severity(seed_registry: ArtifactSchemaRegistry) -> None:
data = {
"runId": "00000000-0000-4000-8000-000000000000",
"phaseKey": "review",
"reviewerRole": "code-reviewer",
"findings": [
{
"severity": "bogus",
"category": "correctness",
"summary": "something is wrong here",
}
],
"summary": "Overall review summary with enough length.",
}
result = seed_registry.validate("dev/review-finding-batch@1", data)
assert result.ok is False
enum_findings = [f for f in result.errors if f.validator == "enum"]
assert len(enum_findings) > 0
finding = enum_findings[0]
assert isinstance(finding.expected, list)
assert "bogus" not in finding.expected
# ---------------------------------------------------------------------------
# 12. Wrong type → validator="type", expected has type name
# ---------------------------------------------------------------------------
def test_validate_wrong_type(
seed_registry: ArtifactSchemaRegistry, valid_spec: dict[str, Any]
) -> None:
data = dict(valid_spec)
data["acceptance_criteria"] = "should be a list, not a string"
result = seed_registry.validate("dev/spec@1", data)
assert result.ok is False
type_findings = [f for f in result.errors if f.validator == "type"]
assert len(type_findings) > 0
assert type_findings[0].expected == "array"
# ---------------------------------------------------------------------------
# 13. Nested error path — /findings/0/severity format
# ---------------------------------------------------------------------------
def test_validate_nested_error_path(seed_registry: ArtifactSchemaRegistry) -> None:
data = {
"runId": "00000000-0000-4000-8000-000000000000",
"phaseKey": "review",
"reviewerRole": "code-reviewer",
"findings": [
{
"severity": "not-valid",
"category": "correctness",
"summary": "a finding summary",
}
],
"summary": "Overall review summary with enough length.",
}
result = seed_registry.validate("dev/review-finding-batch@1", data)
assert result.ok is False
paths = [f.path for f in result.errors]
assert any(p.startswith("/findings/0/") for p in paths)
# ---------------------------------------------------------------------------
# 14. known_schema_ids() returns all 4 seed schemas, sorted
# ---------------------------------------------------------------------------
def test_known_schema_ids_returns_seeds(seed_registry: ArtifactSchemaRegistry) -> None:
ids = seed_registry.known_schema_ids()
for expected in SEED_SCHEMA_IDS:
assert expected in ids
assert ids == sorted(ids)
# ---------------------------------------------------------------------------
# 15. Empty roots list → config_invalid
# ---------------------------------------------------------------------------
def test_empty_roots_raises() -> None:
with pytest.raises(MyDeepAgentError) as exc_info:
ArtifactSchemaRegistry(roots=[])
assert exc_info.value.code == "config_invalid"
# ---------------------------------------------------------------------------
# 16. Corrupted JSON file → artifact_schema_load_failed
# ---------------------------------------------------------------------------
def test_corrupted_json_raises(tmp_path: Path) -> None:
(tmp_path / "dev").mkdir()
(tmp_path / "dev" / "broken@1.json").write_text("{", encoding="utf-8")
registry = ArtifactSchemaRegistry(roots=[tmp_path])
with pytest.raises(MyDeepAgentError) as exc_info:
registry.load("dev/broken@1")
assert exc_info.value.code == "artifact_schema_load_failed"
# ---------------------------------------------------------------------------
# 17. Valid JSON but not a dict → artifact_schema_load_failed
# ---------------------------------------------------------------------------
def test_non_dict_json_raises(tmp_path: Path) -> None:
(tmp_path / "dev").mkdir()
(tmp_path / "dev" / "array@1.json").write_text("[1, 2, 3]", encoding="utf-8")
registry = ArtifactSchemaRegistry(roots=[tmp_path])
with pytest.raises(MyDeepAgentError) as exc_info:
registry.load("dev/array@1")
assert exc_info.value.code == "artifact_schema_load_failed"
# ---------------------------------------------------------------------------
# 18. Schema itself is invalid Draft 2020-12 → artifact_schema_load_failed
# ---------------------------------------------------------------------------
def test_invalid_draft_schema_raises(tmp_path: Path) -> None:
(tmp_path / "dev").mkdir()
bad_schema = {"type": "not_a_type"}
(tmp_path / "dev" / "bad@1.json").write_text(json.dumps(bad_schema), encoding="utf-8")
registry = ArtifactSchemaRegistry(roots=[tmp_path])
with pytest.raises(MyDeepAgentError) as exc_info:
registry.load("dev/bad@1")
assert exc_info.value.code == "artifact_schema_load_failed"
# ---------------------------------------------------------------------------
# 19. Validator caching: _validator called twice returns same instance
# ---------------------------------------------------------------------------
def test_validator_instance_cached(seed_registry: ArtifactSchemaRegistry) -> None:
# Access internal cache to verify the same validator instance is reused.
v1 = seed_registry._validator("dev/spec@1")
v2 = seed_registry._validator("dev/spec@1")
assert v1 is v2
# ---------------------------------------------------------------------------
# 20. dev/spec@1 valid example produces ok=True (full fixture check)
# ---------------------------------------------------------------------------
def test_spec_valid_example_ok(seed_registry: ArtifactSchemaRegistry) -> None:
valid_spec: dict[str, Any] = {
"runId": "00000000-0000-4000-8000-000000000000",
"phaseKey": "spec",
"requirements": "User wants a CLI tool that analyzes log files.",
"acceptance_criteria": ["parses .log files", "outputs JSON summary"],
"approach": "Build a typer-based CLI using regex and json output.",
"risks": ["log format variations may break parser"],
}
result = seed_registry.validate("dev/spec@1", valid_spec)
assert result.ok is True
assert result.errors == ()
# ---------------------------------------------------------------------------
# Bonus: ValidationResult and ValidationFinding are frozen dataclasses
# ---------------------------------------------------------------------------
def test_validation_result_frozen() -> None:
result = ValidationResult(ok=True)
with pytest.raises((AttributeError, TypeError)):
result.ok = False # type: ignore[misc]
def test_validation_finding_frozen() -> None:
finding = ValidationFinding(path="/foo", message="err", validator="type", expected="string")
with pytest.raises((AttributeError, TypeError)):
finding.path = "/bar" # type: ignore[misc]
# ---------------------------------------------------------------------------
# Bonus: known_schema_ids with nonexistent root dir is silently skipped
# ---------------------------------------------------------------------------
def test_known_schema_ids_skips_nonexistent_root(tmp_path: Path) -> None:
missing = tmp_path / "does_not_exist"
registry = ArtifactSchemaRegistry(roots=[missing])
assert registry.known_schema_ids() == []
# ---------------------------------------------------------------------------
# Bonus: validate with non-dict top-level data
# ---------------------------------------------------------------------------
def test_validate_non_dict_data_returns_error(
seed_registry: ArtifactSchemaRegistry,
) -> None:
result = seed_registry.validate("dev/spec@1", [1, 2, 3])
assert result.ok is False
type_findings = [f for f in result.errors if f.validator == "type"]
assert len(type_findings) > 0

View File

@@ -0,0 +1,644 @@
"""Unit tests for src/my_deepagent/binding.py."""
from __future__ import annotations
import fcntl
import json
import re
from pathlib import Path
import pytest
from my_deepagent.binding import (
BackendAvailability,
Binding,
BindingOverride,
PersonaConsentStore,
bind_personas,
filter_consented_personas,
is_persona_eligible_for_role,
)
from my_deepagent.enums import Backend, Capability
from my_deepagent.errors import MyDeepAgentError
from my_deepagent.persona import Persona, load_personas_from_dir
from my_deepagent.workflow import WorkflowTemplate, load_workflows_from_dir
# ---------------------------------------------------------------------------
# PersonaConsentStore file-lock (fcntl.flock) verification
# ---------------------------------------------------------------------------
def test_consent_store_set_acquires_exclusive_lock(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
"""set() must take an exclusive flock and release it."""
ops: list[int] = []
orig_flock = fcntl.flock
def spy(fd: int, op: int) -> None:
ops.append(op)
orig_flock(fd, op)
monkeypatch.setattr(fcntl, "flock", spy)
store = PersonaConsentStore(tmp_path / "consents.json")
store.set("hash_abc", "approve")
assert fcntl.LOCK_EX in ops
assert fcntl.LOCK_UN in ops
def test_consent_store_revoke_acquires_exclusive_lock(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
ops: list[int] = []
orig_flock = fcntl.flock
def spy(fd: int, op: int) -> None:
ops.append(op)
orig_flock(fd, op)
monkeypatch.setattr(fcntl, "flock", spy)
store = PersonaConsentStore(tmp_path / "consents.json")
store.set("h", "approve")
ops.clear()
store.revoke("h")
assert fcntl.LOCK_EX in ops
assert fcntl.LOCK_UN in ops
def test_consent_store_get_acquires_shared_lock(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
"""get() takes a shared lock (LOCK_SH) so multiple readers don't serialise."""
ops: list[int] = []
orig_flock = fcntl.flock
def spy(fd: int, op: int) -> None:
ops.append(op)
orig_flock(fd, op)
monkeypatch.setattr(fcntl, "flock", spy)
store = PersonaConsentStore(tmp_path / "consents.json")
store.set("h", "approve")
ops.clear()
_ = store.get("h")
assert fcntl.LOCK_SH in ops
assert fcntl.LOCK_UN in ops
def test_consent_store_lock_file_created(tmp_path: Path) -> None:
"""A .lock sidecar file is created next to the consent store on first write."""
path = tmp_path / "consents.json"
store = PersonaConsentStore(path)
store.set("h", "approve")
assert (tmp_path / "consents.json.lock").is_file()
# ---------------------------------------------------------------------------
# Fixtures / helpers
# ---------------------------------------------------------------------------
PERSONAS_DIR = Path(__file__).parent.parent.parent / "docs" / "schemas" / "personas"
WORKFLOWS_DIR = Path(__file__).parent.parent.parent / "docs" / "schemas" / "workflows"
def _minimal_persona(**overrides: object) -> Persona:
base: dict[str, object] = {
"name": "test-persona",
"version": 1,
"backend": "openrouter",
"model": "openrouter:anthropic/claude-sonnet-4-6",
"provider_origin": "US/Anthropic",
"capabilities": ["spec_write", "phase_planning"],
"max_risk_level": "low",
"system_prompt": "You are a test persona for unit tests.",
}
base.update(overrides)
return Persona.model_validate(base)
def _all_available() -> BackendAvailability:
return BackendAvailability(available_backends=frozenset(Backend))
def _none_available() -> BackendAvailability:
return BackendAvailability(available_backends=frozenset())
@pytest.fixture()
def consent_store(tmp_path: Path) -> PersonaConsentStore:
return PersonaConsentStore(tmp_path / "consents.json")
@pytest.fixture()
def seed_personas() -> list[Persona]:
return load_personas_from_dir(PERSONAS_DIR)
@pytest.fixture()
def spec_and_review() -> WorkflowTemplate:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
return next(w for w in workflows if w.name == "spec-and-review")
# ---------------------------------------------------------------------------
# is_persona_eligible_for_role
# ---------------------------------------------------------------------------
def test_eligible_all_ok(spec_and_review: WorkflowTemplate) -> None:
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
p = _minimal_persona(capabilities=["spec_write", "phase_planning"], max_risk_level="low")
ok, reason = is_persona_eligible_for_role(p, spec_writer_role, spec_and_review)
assert ok is True
assert reason is None
def test_eligible_missing_capability(spec_and_review: WorkflowTemplate) -> None:
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
# only spec_write, missing phase_planning
p = _minimal_persona(capabilities=["spec_write"], max_risk_level="low")
ok, reason = is_persona_eligible_for_role(p, spec_writer_role, spec_and_review)
assert ok is False
assert reason is not None
assert "phase_planning" in reason
def test_eligible_allowed_roles_mismatch(spec_and_review: WorkflowTemplate) -> None:
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
p = _minimal_persona(
capabilities=["spec_write", "phase_planning"],
max_risk_level="low",
allowed_roles=["reviewer"], # does not include spec_writer
)
ok, reason = is_persona_eligible_for_role(p, spec_writer_role, spec_and_review)
assert ok is False
assert reason is not None
assert "allowed_roles" in reason
def test_eligible_allowed_roles_matches(spec_and_review: WorkflowTemplate) -> None:
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
p = _minimal_persona(
capabilities=["spec_write", "phase_planning"],
max_risk_level="low",
allowed_roles=["spec_writer"],
)
ok, reason = is_persona_eligible_for_role(p, spec_writer_role, spec_and_review)
assert ok is True
assert reason is None
def test_eligible_risk_too_high(spec_and_review: WorkflowTemplate) -> None:
"""bug-fix workflow has a 'medium' risk phase; a low-only persona is ineligible for it."""
bug_fix = load_workflows_from_dir(WORKFLOWS_DIR)
bug_fix_wf = next(w for w in bug_fix if w.name == "bug-fix-with-reproduction")
fixer_role = next(r for r in bug_fix_wf.roles if r.id == "fixer")
# fixer role has a 'medium' risk phase
p = _minimal_persona(
capabilities=["code_edit", "test_first_development"],
max_risk_level="low", # too low for medium phase
)
ok, reason = is_persona_eligible_for_role(p, fixer_role, bug_fix_wf)
assert ok is False
assert reason is not None
assert "medium" in reason
def test_eligible_risk_exact_match(spec_and_review: WorkflowTemplate) -> None:
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
p = _minimal_persona(capabilities=["spec_write", "phase_planning"], max_risk_level="low")
ok, _ = is_persona_eligible_for_role(p, spec_writer_role, spec_and_review)
assert ok is True
# ---------------------------------------------------------------------------
# bind_personas: end-to-end with seed data
# ---------------------------------------------------------------------------
def test_bind_personas_spec_and_review_success(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
bindings = bind_personas(spec_and_review, seed_personas, _all_available(), consent_store)
assert set(bindings.keys()) == {"spec_writer", "reviewer", "verifier"}
for role_id, binding in bindings.items():
assert isinstance(binding, Binding)
assert binding.role_id == role_id
assert re.fullmatch(r"[0-9a-f]{64}", binding.binding_hash)
def test_bind_personas_binding_hash_deterministic(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
b1 = bind_personas(spec_and_review, seed_personas, _all_available(), consent_store)
b2 = bind_personas(spec_and_review, seed_personas, _all_available(), consent_store)
for role_id in b1:
assert b1[role_id].binding_hash == b2[role_id].binding_hash
def test_bind_personas_spec_writer_is_spec_writer(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
bindings = bind_personas(spec_and_review, seed_personas, _all_available(), consent_store)
spec_persona = bindings["spec_writer"].persona
assert Capability.SPEC_WRITE in spec_persona.capabilities
assert Capability.PHASE_PLANNING in spec_persona.capabilities
# ---------------------------------------------------------------------------
# bind_personas: override
# ---------------------------------------------------------------------------
def test_bind_personas_override_picks_pinned(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
override = BindingOverride.parse({"spec_writer": "openrouter-claude-spec-writer@1"})
bindings = bind_personas(
spec_and_review, seed_personas, _all_available(), consent_store, override
)
assert bindings["spec_writer"].persona.name == "openrouter-claude-spec-writer"
def test_bind_personas_override_invalid_persona_raises(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
override = BindingOverride.parse({"spec_writer": "nonexistent-persona@1"})
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _all_available(), consent_store, override)
assert exc_info.value.code == "no_eligible_persona"
# ---------------------------------------------------------------------------
# bind_personas: backend unavailable
# ---------------------------------------------------------------------------
def test_bind_personas_backend_unavailable_raises(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _none_available(), consent_store)
assert exc_info.value.code == "backend_unavailable"
# ---------------------------------------------------------------------------
# bind_personas: model_unavailable for openrouter with empty model
# ---------------------------------------------------------------------------
def test_bind_personas_model_unavailable_raises(
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
"""Verify FAKE backend binds successfully (positive path for non-openrouter backends).
We cannot construct an openrouter persona with empty model via model_validate because
the validator rejects it. Instead verify the happy path: FAKE backend + non-empty
model should bind without errors when the FAKE backend is available.
"""
from my_deepagent.workflow import WorkflowPhase, WorkflowRole
role = WorkflowRole.model_validate(
{
"id": "spec_writer",
"required_capabilities": ["spec_write", "phase_planning"],
"preferred_backends": ["fake"],
}
)
phase = WorkflowPhase.model_validate(
{
"key": "spec",
"title": "Write spec",
"risk": "low",
"role": "spec_writer",
"instructions": "Write the specification document.",
}
)
tmpl = WorkflowTemplate.model_validate(
{
"name": "fake-wf",
"version": 1,
"roles": [role.model_dump()],
"phases": [phase.model_dump()],
}
)
fake_persona = _minimal_persona(
backend="fake",
model="fake-model",
capabilities=["spec_write", "phase_planning"],
)
fake_avail = BackendAvailability(available_backends=frozenset({Backend.FAKE}))
# Should succeed with FAKE backend + non-empty model
bindings = bind_personas(tmpl, [fake_persona], fake_avail, consent_store)
assert "spec_writer" in bindings
# ---------------------------------------------------------------------------
# bind_personas: no eligible persona
# ---------------------------------------------------------------------------
def test_bind_personas_no_eligible_raises(
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
# Provide a persona with wrong capabilities
bad_persona = _minimal_persona(capabilities=["backtest_run"])
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, [bad_persona], _all_available(), consent_store)
assert exc_info.value.code == "no_eligible_persona"
# ---------------------------------------------------------------------------
# PersonaConsentStore: get / set / revoke
# ---------------------------------------------------------------------------
def test_consent_store_get_none_when_absent(consent_store: PersonaConsentStore) -> None:
assert consent_store.get("abc123") is None
def test_consent_store_set_and_get(consent_store: PersonaConsentStore) -> None:
consent_store.set("abc123", "approve")
assert consent_store.get("abc123") == "approve"
def test_consent_store_block(consent_store: PersonaConsentStore) -> None:
consent_store.set("abc123", "block")
assert consent_store.get("abc123") == "block"
def test_consent_store_once(consent_store: PersonaConsentStore) -> None:
consent_store.set("abc123", "once")
assert consent_store.get("abc123") == "once"
def test_consent_store_revoke(consent_store: PersonaConsentStore) -> None:
consent_store.set("abc123", "approve")
consent_store.revoke("abc123")
assert consent_store.get("abc123") is None
def test_consent_store_revoke_absent_is_noop(consent_store: PersonaConsentStore) -> None:
consent_store.revoke("not_present") # must not raise
def test_consent_store_overwrite(consent_store: PersonaConsentStore) -> None:
consent_store.set("abc123", "approve")
consent_store.set("abc123", "block")
assert consent_store.get("abc123") == "block"
def test_consent_store_unknown_decision_returns_none(
consent_store: PersonaConsentStore,
tmp_path: Path,
) -> None:
"""Corrupt decision value (not approve/block/once) returns None, not raise."""
path = tmp_path / "consents.json"
path.write_text(
json.dumps({"abc123": {"decision": "foobar", "decided_at": "2026-01-01T00:00:00+00:00"}}),
encoding="utf-8",
)
store = PersonaConsentStore(path)
assert store.get("abc123") is None
def test_consent_store_corrupted_json_raises_fatal(tmp_path: Path) -> None:
path = tmp_path / "consents.json"
path.write_text("{invalid json", encoding="utf-8")
store = PersonaConsentStore(path)
with pytest.raises(MyDeepAgentError) as exc_info:
store.get("abc123")
assert exc_info.value.code == "internal_state_corruption"
def test_consent_store_atomic_write(consent_store: PersonaConsentStore) -> None:
"""The .tmp file must not remain after a successful write."""
consent_store.set("abc", "approve")
tmp_file = consent_store._path.with_suffix(".json.tmp")
assert not tmp_file.exists(), ".tmp leftover after successful write"
def test_consent_store_json_format(consent_store: PersonaConsentStore) -> None:
"""Stored JSON must be valid and contain decision + decided_at."""
consent_store.set("myhash", "once")
raw = consent_store._path.read_text(encoding="utf-8")
data = json.loads(raw)
assert "myhash" in data
assert data["myhash"]["decision"] == "once"
assert "decided_at" in data["myhash"]
# ---------------------------------------------------------------------------
# filter_consented_personas
# ---------------------------------------------------------------------------
def test_filter_removes_blocked(consent_store: PersonaConsentStore) -> None:
p1 = _minimal_persona(name="p1")
p2 = _minimal_persona(name="p2")
consent_store.set(p2.compute_hash(), "block")
result = filter_consented_personas([p1, p2], consent_store)
assert len(result) == 1
assert result[0].name == "p1"
def test_filter_keeps_approved(consent_store: PersonaConsentStore) -> None:
p = _minimal_persona()
consent_store.set(p.compute_hash(), "approve")
result = filter_consented_personas([p], consent_store)
assert len(result) == 1
def test_filter_keeps_once(consent_store: PersonaConsentStore) -> None:
p = _minimal_persona()
consent_store.set(p.compute_hash(), "once")
result = filter_consented_personas([p], consent_store)
assert len(result) == 1
def test_filter_keeps_none_decision(consent_store: PersonaConsentStore) -> None:
"""Persona with no stored decision passes through."""
p = _minimal_persona()
result = filter_consented_personas([p], consent_store)
assert len(result) == 1
def test_filter_empty_list(consent_store: PersonaConsentStore) -> None:
result = filter_consented_personas([], consent_store)
assert result == []
# ---------------------------------------------------------------------------
# bind_personas: consent-blocked persona detection
# ---------------------------------------------------------------------------
def test_bind_personas_all_eligible_blocked_raises(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
# Block all spec_writer-eligible personas
for p in seed_personas:
if Capability.SPEC_WRITE in p.capabilities and Capability.PHASE_PLANNING in p.capabilities:
consent_store.set(p.compute_hash(), "block")
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _all_available(), consent_store)
assert exc_info.value.code in ("persona_blocked_by_user", "no_eligible_persona")
def test_bind_personas_override_blocked_raises(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
spec_writer = next(p for p in seed_personas if p.name == "openrouter-claude-spec-writer")
consent_store.set(spec_writer.compute_hash(), "block")
override = BindingOverride.parse({"spec_writer": "openrouter-claude-spec-writer@1"})
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _all_available(), consent_store, override)
assert exc_info.value.code == "persona_blocked_by_user"
# ---------------------------------------------------------------------------
# _auto_select: preferred_backends order
# ---------------------------------------------------------------------------
def test_auto_select_prefers_preferred_backend(spec_and_review: WorkflowTemplate) -> None:
"""Persona with preferred backend wins over non-preferred even if alphabetically later."""
from my_deepagent.binding import _auto_select
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
# preferred_backends = ["openrouter"]
p_openrouter = _minimal_persona(
name="z-openrouter-persona",
backend="openrouter",
capabilities=["spec_write", "phase_planning"],
)
p_fake = _minimal_persona(
name="a-fake-persona",
backend="fake",
capabilities=["spec_write", "phase_planning"],
)
chosen = _auto_select([p_openrouter, p_fake], spec_writer_role)
assert chosen.name == "z-openrouter-persona"
def test_auto_select_higher_version_wins(spec_and_review: WorkflowTemplate) -> None:
from my_deepagent.binding import _auto_select
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
p_v1 = _minimal_persona(version=1, capabilities=["spec_write", "phase_planning"])
p_v2 = _minimal_persona(version=2, capabilities=["spec_write", "phase_planning"])
chosen = _auto_select([p_v1, p_v2], spec_writer_role)
assert chosen.version == 2
def test_auto_select_name_asc_tiebreak(spec_and_review: WorkflowTemplate) -> None:
from my_deepagent.binding import _auto_select
spec_writer_role = next(r for r in spec_and_review.roles if r.id == "spec_writer")
caps = ["spec_write", "phase_planning"]
p_b = _minimal_persona(name="b-persona", version=1, capabilities=caps)
p_a = _minimal_persona(name="a-persona", version=1, capabilities=caps)
chosen = _auto_select([p_b, p_a], spec_writer_role)
assert chosen.name == "a-persona"
# ---------------------------------------------------------------------------
# Step 2 patch: FAKE backend recovery hint
# ---------------------------------------------------------------------------
def test_backend_recovery_hint_fake() -> None:
"""FAKE backend recovery hint must mention 'fake' and 'tests only'."""
from my_deepagent.binding import _backend_recovery_hint
hint = _backend_recovery_hint(Backend.FAKE)
assert "fake" in hint.lower()
assert "tests only" in hint.lower() or "test harness" in hint.lower()
# ---------------------------------------------------------------------------
# Step 2 patch: override with non-integer version raises with diagnostic
# ---------------------------------------------------------------------------
def test_bind_personas_override_non_integer_version_raises(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
"""An override spec with a non-integer version must raise with clear diagnostic."""
override = BindingOverride(persona_pinned={"spec_writer": "openrouter-claude-spec-writer@abc"})
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _all_available(), consent_store, override)
assert exc_info.value.code == "no_eligible_persona"
assert "non-integer version" in str(exc_info.value)
# ---------------------------------------------------------------------------
# Step 2 patch: override with ineligible persona surfaces reason
# ---------------------------------------------------------------------------
def test_bind_personas_override_ineligible_persona_surfaces_reason(
seed_personas: list[Persona],
spec_and_review: WorkflowTemplate,
consent_store: PersonaConsentStore,
) -> None:
"""Override that names an ineligible persona must surface the ineligibility reason."""
# 'spec_writer' role needs spec_write + phase_planning.
# Find a persona in seed that does NOT have those caps so we can force it.
ineligible = next(
p for p in seed_personas if "spec_write" not in [c.value for c in p.capabilities]
)
override = BindingOverride(
persona_pinned={"spec_writer": f"{ineligible.name}@{ineligible.version}"}
)
with pytest.raises(MyDeepAgentError) as exc_info:
bind_personas(spec_and_review, seed_personas, _all_available(), consent_store, override)
assert exc_info.value.code == "no_eligible_persona"
err_str = str(exc_info.value)
# The error message must say the persona is ineligible with a reason.
assert "ineligible" in err_str or "missing" in err_str
# ---------------------------------------------------------------------------
# Step 2 patch: PersonaConsentStore atomic write calls os.fsync
# ---------------------------------------------------------------------------
def test_consent_store_write_calls_fsync(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""PersonaConsentStore.set() must call os.fsync() for atomic durability."""
import os
called: list[int] = []
orig_fsync = os.fsync
def spy(fd: int) -> None:
called.append(fd)
orig_fsync(fd)
monkeypatch.setattr(os, "fsync", spy)
store = PersonaConsentStore(tmp_path / "consents.json")
store.set("hash_abc", "approve")
assert len(called) >= 1, "os.fsync must be called at least once during write"

View File

@@ -0,0 +1,238 @@
"""Unit tests for src/my_deepagent/config.py."""
from __future__ import annotations
from pathlib import Path
import pytest
from pydantic import ValidationError
from my_deepagent.config import Config, load_config
# ---------------------------------------------------------------------------
# Default values (no env, no file)
# ---------------------------------------------------------------------------
def test_default_log_level(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.log_level == "info"
def test_default_lang(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.lang == "ko"
def test_default_budget_daily_usd(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.budget_daily_usd == pytest.approx(5.0)
def test_default_budget_run_usd(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.budget_run_usd == pytest.approx(1.0)
def test_default_budget_on_hit(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.budget_on_hit == "prompt"
def test_default_persona(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
assert cfg.default_persona == "default-interactive"
def test_default_openrouter_api_key_is_none(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
# _env_file=None bypasses any .env that may exist in the cwd (e.g. dev keys).
cfg = Config(_env_file=None) # type: ignore[call-arg]
assert cfg.openrouter_api_key is None
# ---------------------------------------------------------------------------
# Env var overrides
# ---------------------------------------------------------------------------
def test_env_budget_daily_usd(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_BUDGET_DAILY_USD", "10")
cfg = Config()
assert cfg.budget_daily_usd == pytest.approx(10.0)
def test_env_lang_en(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_LANG", "en")
cfg = Config()
assert cfg.lang == "en"
def test_env_log_level_debug(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_LOG_LEVEL", "debug")
cfg = Config()
assert cfg.log_level == "debug"
def test_env_openrouter_api_key(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_OPENROUTER_API_KEY", "sk-test-abc")
cfg = Config()
assert cfg.openrouter_api_key == "sk-test-abc"
def test_env_langsmith_tracing(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_LANGSMITH_TRACING", "true")
cfg = Config()
assert cfg.langsmith_tracing is True
# ---------------------------------------------------------------------------
# Validation errors for invalid values
# ---------------------------------------------------------------------------
def test_invalid_lang_raises(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_LANG", "fr")
with pytest.raises(ValidationError):
Config()
def test_invalid_log_level_raises(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_LOG_LEVEL", "verbose")
with pytest.raises(ValidationError):
Config()
def test_invalid_budget_on_hit_raises(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_BUDGET_ON_HIT", "explode")
with pytest.raises(ValidationError):
Config()
def test_negative_budget_raises(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
with pytest.raises(ValidationError):
Config(budget_daily_usd=-1.0)
# ---------------------------------------------------------------------------
# Frozen check
# ---------------------------------------------------------------------------
def test_frozen_prevents_mutation(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = Config()
with pytest.raises((ValidationError, TypeError)):
cfg.budget_daily_usd = 99 # type: ignore[misc]
# ---------------------------------------------------------------------------
# Path expansion (~ → absolute path)
# ---------------------------------------------------------------------------
def test_tilde_expansion_workspace_root(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_WORKSPACE_ROOT", "~/foo/bar")
cfg = Config()
assert cfg.workspace_root.is_absolute()
assert "~" not in str(cfg.workspace_root)
def test_tilde_expansion_data_dir(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
monkeypatch.setenv("MYDEEPAGENT_DATA_DIR", "~/mydata")
cfg = Config()
assert cfg.data_dir.is_absolute()
# ---------------------------------------------------------------------------
# TOML priority
# ---------------------------------------------------------------------------
def test_toml_overrides_default(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
_clear_env(monkeypatch)
toml_file = tmp_path / "config.toml"
toml_file.write_text('lang = "en"\nbudget_daily_usd = 7.5\n')
# Patch the toml_file location via init override
# Config reads toml via SettingsConfigDict; we pass via class-level override trick:
# Easiest approach: pass budget_daily_usd and lang directly to assert TOML *can* set them.
# For true TOML path injection, subclass Config temporarily.
class PatchedConfig(Config):
model_config = Config.model_config.copy()
PatchedConfig.model_config["toml_file"] = str(toml_file)
cfg = PatchedConfig()
assert cfg.lang == "en"
assert cfg.budget_daily_usd == pytest.approx(7.5)
# ---------------------------------------------------------------------------
# load_config helper
# ---------------------------------------------------------------------------
def test_load_config_with_overrides(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = load_config(budget_daily_usd=20.0, lang="en")
assert cfg.budget_daily_usd == pytest.approx(20.0)
assert cfg.lang == "en"
def test_load_config_default(monkeypatch: pytest.MonkeyPatch) -> None:
_clear_env(monkeypatch)
cfg = load_config()
assert cfg.log_level == "info"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_ENV_KEYS = [
"MYDEEPAGENT_BUDGET_DAILY_USD",
"MYDEEPAGENT_BUDGET_DAILY_WARN_USD",
"MYDEEPAGENT_BUDGET_RUN_USD",
"MYDEEPAGENT_BUDGET_RUN_WARN_USD",
"MYDEEPAGENT_BUDGET_ON_HIT",
"MYDEEPAGENT_LANG",
"MYDEEPAGENT_LOG_LEVEL",
"MYDEEPAGENT_OPENROUTER_API_KEY",
"MYDEEPAGENT_OPENROUTER_BASE_URL",
"MYDEEPAGENT_LANGSMITH_TRACING",
"MYDEEPAGENT_LANGSMITH_API_KEY",
"MYDEEPAGENT_LANGSMITH_PROJECT",
"MYDEEPAGENT_DATABASE_URL",
"MYDEEPAGENT_WORKSPACE_ROOT",
"MYDEEPAGENT_DATA_DIR",
"MYDEEPAGENT_CONFIG_DIR",
"MYDEEPAGENT_STATE_DIR",
"MYDEEPAGENT_DEFAULT_PERSONA",
]
def _clear_env(monkeypatch: pytest.MonkeyPatch) -> None:
"""Remove all MYDEEPAGENT_ env vars to isolate tests from the real environment."""
for key in _ENV_KEYS:
monkeypatch.delenv(key, raising=False)
# Also prevent dotenv file from being loaded
monkeypatch.setenv("MYDEEPAGENT_ENV_FILE", "")

View File

@@ -0,0 +1,235 @@
"""Unit tests for src/my_deepagent/enums.py."""
import pytest
from my_deepagent.enums import (
ApprovalDecisionAction,
ApprovalState,
Backend,
Capability,
ErrorClass,
RiskLevel,
RunPhaseState,
RunState,
SessionState,
)
# ---------------------------------------------------------------------------
# Backend
# ---------------------------------------------------------------------------
def test_backend_openrouter_value() -> None:
assert Backend.OPENROUTER == "openrouter"
def test_backend_anthropic_value() -> None:
assert Backend.ANTHROPIC == "anthropic"
def test_backend_openai_value() -> None:
assert Backend.OPENAI == "openai"
def test_backend_google_value() -> None:
assert Backend.GOOGLE == "google"
def test_backend_fake_value() -> None:
assert Backend.FAKE == "fake"
def test_backend_str_equality() -> None:
# StrEnum members compare equal to their string values
assert Backend.OPENROUTER == "openrouter"
assert str(Backend.OPENROUTER) == "openrouter"
# ---------------------------------------------------------------------------
# Capability
# ---------------------------------------------------------------------------
def test_capability_count() -> None:
assert len(list(Capability)) == 13
def test_capability_spec_write() -> None:
assert Capability.SPEC_WRITE == "spec_write"
def test_capability_code_edit() -> None:
assert Capability.CODE_EDIT == "code_edit"
def test_capability_final_report_compose() -> None:
assert Capability.FINAL_REPORT_COMPOSE == "final_report_compose"
def test_capability_all_are_str() -> None:
for cap in Capability:
assert isinstance(cap, str)
# ---------------------------------------------------------------------------
# RiskLevel
# ---------------------------------------------------------------------------
def test_risk_level_values() -> None:
assert RiskLevel.LOW == "low"
assert RiskLevel.MEDIUM == "medium"
assert RiskLevel.HIGH == "high"
# ---------------------------------------------------------------------------
# ApprovalDecisionAction
# ---------------------------------------------------------------------------
def test_approval_decision_action_approve() -> None:
assert ApprovalDecisionAction.APPROVE == "approve"
def test_approval_decision_action_reject() -> None:
assert ApprovalDecisionAction.REJECT == "reject"
def test_approval_decision_action_request_changes() -> None:
assert ApprovalDecisionAction.REQUEST_CHANGES == "request_changes"
def test_approval_decision_action_abort() -> None:
assert ApprovalDecisionAction.ABORT == "abort"
# ---------------------------------------------------------------------------
# ApprovalState
# ---------------------------------------------------------------------------
def test_approval_state_all_values() -> None:
expected = {"pending", "approved", "rejected", "changes_requested", "aborted", "paused"}
actual = {s.value for s in ApprovalState}
assert actual == expected
# ---------------------------------------------------------------------------
# RunState
# ---------------------------------------------------------------------------
def test_run_state_all_values() -> None:
expected = {
"created",
"bound",
"planning",
"awaiting_approval",
"executing",
"paused",
"completed",
"failed",
"aborted",
}
actual = {s.value for s in RunState}
assert actual == expected
def test_run_state_count() -> None:
assert len(list(RunState)) == 9
# ---------------------------------------------------------------------------
# RunPhaseState
# ---------------------------------------------------------------------------
def test_run_phase_state_all_values() -> None:
expected = {
"pending",
"running",
"awaiting_artifact",
"validating",
"awaiting_approval",
"completed",
"failed",
"skipped",
}
actual = {s.value for s in RunPhaseState}
assert actual == expected
def test_run_phase_state_count() -> None:
assert len(list(RunPhaseState)) == 8
# ---------------------------------------------------------------------------
# SessionState
# ---------------------------------------------------------------------------
def test_session_state_all_values() -> None:
expected = {
"CREATED",
"BOOTSTRAPPING",
"READY",
"BUSY",
"WAITING_FOR_APPROVAL",
"ARTIFACT_TIMEOUT",
"HUNG",
"CRASHED",
"RESUMING",
"REBOOTSTRAPPED",
"FAILED_NEEDS_HUMAN",
}
actual = {s.value for s in SessionState}
assert actual == expected
def test_session_state_count() -> None:
assert len(list(SessionState)) == 11
# ---------------------------------------------------------------------------
# ErrorClass
# ---------------------------------------------------------------------------
def test_error_class_recoverable() -> None:
assert ErrorClass.RECOVERABLE == "recoverable"
def test_error_class_human_required() -> None:
assert ErrorClass.HUMAN_REQUIRED == "human_required"
def test_error_class_fatal() -> None:
assert ErrorClass.FATAL == "fatal"
def test_error_class_count() -> None:
assert len(list(ErrorClass)) == 3
# ---------------------------------------------------------------------------
# StrEnum serialization / deserialization
# ---------------------------------------------------------------------------
def test_str_enum_from_value() -> None:
assert Backend("openrouter") is Backend.OPENROUTER
def test_str_enum_in_dict() -> None:
# StrEnum should work as dict key and compare with string
d = {Backend.OPENROUTER: "openrouter backend"}
assert d["openrouter"] == "openrouter backend"
@pytest.mark.parametrize(
"state",
list(RunState),
)
def test_run_state_parametrize(state: RunState) -> None:
assert isinstance(state, str)
assert RunState(state.value) is state

View File

@@ -0,0 +1,208 @@
"""Unit tests for src/my_deepagent/errors.py."""
from uuid import UUID, uuid4
import pytest
from my_deepagent.enums import ErrorClass
from my_deepagent.errors import BudgetExhaustedError, MyDeepAgentError
def test_cause_sets_suppress_context() -> None:
"""Wrapping a cause must suppress the implicit context per PEP 3134."""
original = ValueError("root cause")
err = MyDeepAgentError.recoverable("wrapped", cause=original)
assert err.__cause__ is original
assert err.__suppress_context__ is True
def test_no_cause_does_not_set_suppress_context() -> None:
err = MyDeepAgentError.recoverable("no_cause")
assert err.__cause__ is None
assert err.__suppress_context__ is False
def test_factory_returns_base_class_not_subclass() -> None:
"""LSP fix: factory methods always return MyDeepAgentError, never BudgetExhaustedError."""
err = BudgetExhaustedError.recoverable("foo")
assert type(err) is MyDeepAgentError
# ---------------------------------------------------------------------------
# MyDeepAgentError factory methods
# ---------------------------------------------------------------------------
def test_recoverable_class() -> None:
err = MyDeepAgentError.recoverable("network_blip", recovery_hint="retry")
assert err.error_class == ErrorClass.RECOVERABLE
def test_recoverable_code() -> None:
err = MyDeepAgentError.recoverable("network_blip")
assert err.code == "network_blip"
def test_recoverable_recovery_hint() -> None:
err = MyDeepAgentError.recoverable("network_blip", recovery_hint="retry after 1s")
assert err.recovery_hint == "retry after 1s"
def test_human_required_class() -> None:
err = MyDeepAgentError.human_required("destructive_command_blocked")
assert err.error_class == ErrorClass.HUMAN_REQUIRED
def test_human_required_code() -> None:
err = MyDeepAgentError.human_required("destructive_command_blocked")
assert err.code == "destructive_command_blocked"
def test_fatal_class() -> None:
err = MyDeepAgentError.fatal("unrecoverable_state")
assert err.error_class == ErrorClass.FATAL
def test_fatal_code() -> None:
err = MyDeepAgentError.fatal("unrecoverable_state")
assert err.code == "unrecoverable_state"
# ---------------------------------------------------------------------------
# run_id / phase_id context
# ---------------------------------------------------------------------------
def test_run_id_attached() -> None:
run_id = uuid4()
err = MyDeepAgentError.recoverable("timeout", run_id=run_id)
assert err.run_id == run_id
def test_phase_id_attached() -> None:
phase_id = uuid4()
err = MyDeepAgentError.recoverable("artifact_missing", phase_id=phase_id)
assert err.phase_id == phase_id
def test_run_id_none_by_default() -> None:
err = MyDeepAgentError.recoverable("x")
assert err.run_id is None
# ---------------------------------------------------------------------------
# __cause__ propagation
# ---------------------------------------------------------------------------
def test_cause_propagation() -> None:
original = ValueError("root cause")
err = MyDeepAgentError.recoverable("wrapped", cause=original)
assert err.__cause__ is original
def test_cause_none_by_default() -> None:
err = MyDeepAgentError.recoverable("no_cause")
assert err.__cause__ is None
# ---------------------------------------------------------------------------
# __repr__ format
# ---------------------------------------------------------------------------
def test_repr_contains_class_and_code() -> None:
err = MyDeepAgentError.recoverable("some_code")
r = repr(err)
assert "class=recoverable" in r
assert "code=some_code" in r
def test_repr_contains_run_id_when_present() -> None:
run_id = UUID("12345678-1234-5678-1234-567812345678")
err = MyDeepAgentError.recoverable("x", run_id=run_id)
assert str(run_id) in repr(err)
def test_repr_contains_hint_when_present() -> None:
err = MyDeepAgentError.recoverable("x", recovery_hint="do something")
assert "do something" in repr(err)
def test_repr_no_hint_when_absent() -> None:
err = MyDeepAgentError.recoverable("x")
assert "hint" not in repr(err)
# ---------------------------------------------------------------------------
# Exception hierarchy
# ---------------------------------------------------------------------------
def test_my_deepagent_error_is_exception() -> None:
err = MyDeepAgentError.recoverable("x")
assert isinstance(err, Exception)
def test_budget_exhausted_is_my_deepagent_error() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert isinstance(err, MyDeepAgentError)
# ---------------------------------------------------------------------------
# BudgetExhaustedError
# ---------------------------------------------------------------------------
def test_budget_exhausted_scope() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.scope == "day:2026-05-15"
def test_budget_exhausted_projected_usd() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.projected_usd == pytest.approx(1.20)
def test_budget_exhausted_cap_usd() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.cap_usd == pytest.approx(1.00)
def test_budget_exhausted_error_class() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.error_class == ErrorClass.HUMAN_REQUIRED
def test_budget_exhausted_code() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.code == "budget_exhausted"
def test_budget_exhausted_default_recovery_hint() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert err.recovery_hint is not None
assert len(err.recovery_hint) > 0
def test_budget_exhausted_custom_recovery_hint() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00, recovery_hint="call support")
assert err.recovery_hint == "call support"
def test_budget_exhausted_run_id() -> None:
run_id = uuid4()
err = BudgetExhaustedError("run:abc", 0.5, 0.4, run_id=run_id)
assert err.run_id == run_id
def test_budget_exhausted_message_contains_scope() -> None:
err = BudgetExhaustedError("day:2026-05-15", 1.20, 1.00)
assert "day:2026-05-15" in str(err)
def test_budget_exhausted_message_contains_values() -> None:
err = BudgetExhaustedError("scope", 1.2345, 1.0000)
msg = str(err)
assert "1.2345" in msg
assert "1.0000" in msg

View File

@@ -0,0 +1,121 @@
"""Unit tests for src/my_deepagent/hash.py."""
import re
import pytest
from my_deepagent.hash import canonicalize, sha256
# ---------------------------------------------------------------------------
# canonicalize: key ordering
# ---------------------------------------------------------------------------
def test_canonicalize_sorts_keys() -> None:
assert canonicalize({"b": 1, "a": 2}) == '{"a":2,"b":1}'
def test_canonicalize_nested_sorts_keys() -> None:
result = canonicalize({"x": {"b": 2, "a": 1}})
assert result == '{"x":{"a":1,"b":2}}'
def test_canonicalize_empty_dict() -> None:
assert canonicalize({}) == "{}"
def test_canonicalize_empty_list() -> None:
assert canonicalize([]) == "[]"
def test_canonicalize_none() -> None:
assert canonicalize(None) == "null"
def test_canonicalize_integer() -> None:
assert canonicalize(42) == "42"
def test_canonicalize_float() -> None:
# 0.1 has a known floating-point representation
result = canonicalize(0.1)
assert result == "0.1"
def test_canonicalize_no_whitespace() -> None:
result = canonicalize({"a": 1, "b": 2})
assert " " not in result
def test_canonicalize_list_preserves_order() -> None:
# Lists should not be reordered
assert canonicalize([3, 1, 2]) == "[3,1,2]"
def test_canonicalize_string_value() -> None:
assert canonicalize("hello") == '"hello"'
def test_canonicalize_boolean() -> None:
assert canonicalize(True) == "true"
assert canonicalize(False) == "false"
def test_canonicalize_nan_raises() -> None:
import math
with pytest.raises(ValueError):
canonicalize(math.nan)
# ---------------------------------------------------------------------------
# sha256: determinism
# ---------------------------------------------------------------------------
def test_sha256_deterministic() -> None:
value = {"a": 1, "b": [1, 2, 3]}
results = [sha256(value) for _ in range(100)]
assert len(set(results)) == 1
def test_sha256_returns_64_char_hex() -> None:
result = sha256({"a": 1})
assert re.fullmatch(r"[0-9a-f]{64}", result) is not None
def test_sha256_different_inputs_different_hash() -> None:
h1 = sha256({"a": 1})
h2 = sha256({"a": 2})
assert h1 != h2
def test_sha256_key_order_irrelevant() -> None:
# Same content, different insertion order → same hash
h1 = sha256({"a": 1, "b": 2})
h2 = sha256({"b": 2, "a": 1})
assert h1 == h2
def test_sha256_empty_dict() -> None:
result = sha256({})
assert re.fullmatch(r"[0-9a-f]{64}", result) is not None
def test_sha256_none() -> None:
result = sha256(None)
assert re.fullmatch(r"[0-9a-f]{64}", result) is not None
def test_sha256_nested() -> None:
h1 = sha256({"x": {"a": 1, "b": 2}})
h2 = sha256({"x": {"b": 2, "a": 1}})
assert h1 == h2
def test_sha256_known_value() -> None:
# Pre-computed: sha256('{"a":1}') in UTF-8
import hashlib
expected = hashlib.sha256(b'{"a":1}').hexdigest()
assert sha256({"a": 1}) == expected

View File

@@ -0,0 +1,118 @@
"""Unit tests for src/my_deepagent/middleware/audit.py."""
from __future__ import annotations
from typing import Any
from unittest.mock import AsyncMock, MagicMock
from uuid import UUID
import pytest
from my_deepagent.middleware.audit import AuditToolMiddleware
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_request(name: str = "read_file", args: dict[str, Any] | None = None) -> MagicMock:
request = MagicMock()
request.tool_call = {"name": name, "args": args or {"path": "x.py"}}
return request
# ---------------------------------------------------------------------------
# awrap_tool_call — success path
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_audit_middleware_records_correct_fields_on_success() -> None:
recorder = AsyncMock()
mw = AuditToolMiddleware(
run_id=UUID("00000000-0000-0000-0000-000000000001"),
phase_id=UUID("00000000-0000-0000-0000-000000000002"),
interactive_session_id=UUID("00000000-0000-0000-0000-000000000003"),
recorder=recorder,
)
result_value = "file contents here"
handler = AsyncMock(return_value=result_value)
request = _make_request(name="read_file", args={"path": "src/main.py"})
result = await mw.awrap_tool_call(request, handler)
assert result == result_value
recorder.assert_awaited_once()
record: dict[str, Any] = recorder.call_args[0][0]
assert record["tool_name"] == "read_file"
assert record["args"] == {"path": "src/main.py"}
assert record["result"] == result_value
assert record["error"] is None
assert record["duration_ms"] >= 0
assert record["run_id"] == UUID("00000000-0000-0000-0000-000000000001")
@pytest.mark.asyncio
async def test_audit_middleware_no_recorder_is_noop() -> None:
mw = AuditToolMiddleware()
handler = AsyncMock(return_value="ok")
result = await mw.awrap_tool_call(_make_request(), handler)
assert result == "ok"
# ---------------------------------------------------------------------------
# awrap_tool_call — error path
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_audit_middleware_records_error_code_on_exception() -> None:
recorder = AsyncMock()
mw = AuditToolMiddleware(recorder=recorder)
handler = AsyncMock(side_effect=PermissionError("access denied"))
with pytest.raises(PermissionError):
await mw.awrap_tool_call(_make_request(), handler)
recorder.assert_awaited_once()
record: dict[str, Any] = recorder.call_args[0][0]
assert record["error"] == "PermissionError"
assert record["result"] is None
@pytest.mark.asyncio
async def test_audit_middleware_reraises_exception() -> None:
mw = AuditToolMiddleware(recorder=AsyncMock())
handler = AsyncMock(side_effect=ValueError("bad args"))
with pytest.raises(ValueError, match="bad args"):
await mw.awrap_tool_call(_make_request(), handler)
# ---------------------------------------------------------------------------
# result serialization
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_audit_middleware_serializes_non_primitive_result_as_str() -> None:
recorder = AsyncMock()
mw = AuditToolMiddleware(recorder=recorder)
class _CustomResult:
def __str__(self) -> str:
return "custom-result-str"
handler = AsyncMock(return_value=_CustomResult())
await mw.awrap_tool_call(_make_request(), handler)
record = recorder.call_args[0][0]
assert record["result"] == "custom-result-str"
@pytest.mark.asyncio
async def test_audit_middleware_passes_dict_result_as_is() -> None:
recorder = AsyncMock()
mw = AuditToolMiddleware(recorder=recorder)
handler = AsyncMock(return_value={"key": "value"})
await mw.awrap_tool_call(_make_request(), handler)
record = recorder.call_args[0][0]
assert record["result"] == {"key": "value"}

View File

@@ -0,0 +1,143 @@
"""Unit tests for src/my_deepagent/middleware/cost.py."""
from __future__ import annotations
from typing import Any
from unittest.mock import AsyncMock, MagicMock
from uuid import UUID
import pytest
from my_deepagent.middleware.cost import CostMiddleware
from my_deepagent.monitoring.pricing import ModelPrice, PricingCache
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_pricing_cache(
model: str = "anthropic/claude-sonnet",
input_per_1k: float = 0.003,
output_per_1k: float = 0.015,
) -> PricingCache:
cache = PricingCache()
cache.set(
[
ModelPrice(
model=model,
input_per_1k_usd=input_per_1k,
output_per_1k_usd=output_per_1k,
context_length=200000,
)
]
)
return cache
def _make_response(input_tokens: int = 100, output_tokens: int = 50) -> MagicMock:
response = MagicMock()
response.usage_metadata = {"input_tokens": input_tokens, "output_tokens": output_tokens}
return response
# ---------------------------------------------------------------------------
# awrap_model_call — success path
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_cost_middleware_records_correct_fields_on_success() -> None:
recorder = AsyncMock()
cache = _make_pricing_cache()
mw = CostMiddleware(
pricing=cache,
model_name="anthropic/claude-sonnet",
run_id=UUID("00000000-0000-0000-0000-000000000001"),
phase_id=UUID("00000000-0000-0000-0000-000000000002"),
persona_name="test-persona",
recorder=recorder,
)
response = _make_response(input_tokens=1000, output_tokens=500)
handler = AsyncMock(return_value=response)
request = MagicMock()
result = await mw.awrap_model_call(request, handler)
assert result is response
recorder.assert_awaited_once()
record: dict[str, Any] = recorder.call_args[0][0]
assert record["model"] == "anthropic/claude-sonnet"
assert record["input_tokens"] == 1000
assert record["output_tokens"] == 500
assert record["status"] == "ok"
assert record["error_code"] is None
assert record["latency_ms"] >= 0
# cost: (1000/1000 * 0.003) + (500/1000 * 0.015)
expected_cost = 0.003 * 1.0 + 0.015 * 0.5
assert record["cost_usd_total"] == pytest.approx(expected_cost)
@pytest.mark.asyncio
async def test_cost_middleware_no_recorder_is_noop() -> None:
cache = _make_pricing_cache()
mw = CostMiddleware(pricing=cache, model_name="anthropic/claude-sonnet")
response = _make_response()
handler = AsyncMock(return_value=response)
# Should not raise even with recorder=None
result = await mw.awrap_model_call(MagicMock(), handler)
assert result is response
# ---------------------------------------------------------------------------
# awrap_model_call — error path
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_cost_middleware_records_error_on_handler_exception() -> None:
recorder = AsyncMock()
cache = _make_pricing_cache()
mw = CostMiddleware(
pricing=cache,
model_name="anthropic/claude-sonnet",
recorder=recorder,
)
handler = AsyncMock(side_effect=RuntimeError("timeout"))
with pytest.raises(RuntimeError, match="timeout"):
await mw.awrap_model_call(MagicMock(), handler)
recorder.assert_awaited_once()
record: dict[str, Any] = recorder.call_args[0][0]
assert record["status"] == "error"
assert record["error_code"] == "RuntimeError"
assert record["input_tokens"] == 0
assert record["output_tokens"] == 0
@pytest.mark.asyncio
async def test_cost_middleware_reraises_exception() -> None:
cache = _make_pricing_cache()
mw = CostMiddleware(pricing=cache, model_name="m", recorder=AsyncMock())
handler = AsyncMock(side_effect=ValueError("bad input"))
with pytest.raises(ValueError, match="bad input"):
await mw.awrap_model_call(MagicMock(), handler)
# ---------------------------------------------------------------------------
# cost computation via cache
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_cost_zero_when_model_not_in_cache() -> None:
recorder = AsyncMock()
cache = PricingCache() # empty
mw = CostMiddleware(pricing=cache, model_name="unknown/model", recorder=recorder)
response = _make_response(input_tokens=1000, output_tokens=1000)
handler = AsyncMock(return_value=response)
await mw.awrap_model_call(MagicMock(), handler)
record = recorder.call_args[0][0]
assert record["cost_usd_total"] == 0.0

View File

@@ -0,0 +1,168 @@
"""Unit tests for src/my_deepagent/middleware/fallback.py."""
from __future__ import annotations
from typing import Any
from unittest.mock import AsyncMock, MagicMock
import httpx
import openai
import pytest
from my_deepagent.middleware.fallback import FallbackModelMiddleware
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_request(has_model_attr: bool = True) -> MagicMock:
request = MagicMock()
if not has_model_attr:
del request.model
return request
# ---------------------------------------------------------------------------
# Fallback on RateLimitError
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_fallback_on_rate_limit_error_calls_handler_with_fallback() -> None:
primary = MagicMock(name="primary-model")
fallback = MagicMock(name="fallback-model")
mw = FallbackModelMiddleware(primary=primary, fallback=fallback)
call_count = 0
fallback_model_seen: Any = None
async def handler(request: Any) -> str:
nonlocal call_count, fallback_model_seen
call_count += 1
if call_count == 1:
raise openai.RateLimitError(
"rate limit",
response=MagicMock(status_code=429, headers={}),
body={},
)
fallback_model_seen = getattr(request, "model", None)
return "fallback-response"
request = _make_request()
result = await mw.awrap_model_call(request, handler)
assert result == "fallback-response"
assert call_count == 2
assert fallback_model_seen is fallback
@pytest.mark.asyncio
async def test_fallback_on_api_connection_error() -> None:
primary = MagicMock()
fallback = MagicMock()
mw = FallbackModelMiddleware(primary=primary, fallback=fallback)
call_count = 0
async def handler(request: Any) -> str:
nonlocal call_count
call_count += 1
if call_count == 1:
raise openai.APIConnectionError(request=MagicMock())
return "connection-fallback"
result = await mw.awrap_model_call(_make_request(), handler)
assert result == "connection-fallback"
assert call_count == 2
@pytest.mark.asyncio
async def test_fallback_on_httpx_error() -> None:
primary = MagicMock()
fallback = MagicMock()
mw = FallbackModelMiddleware(primary=primary, fallback=fallback)
call_count = 0
async def handler(request: Any) -> str:
nonlocal call_count
call_count += 1
if call_count == 1:
raise httpx.ConnectError("connect failed")
return "httpx-fallback"
result = await mw.awrap_model_call(_make_request(), handler)
assert result == "httpx-fallback"
assert call_count == 2
# ---------------------------------------------------------------------------
# No fallback — exception propagates
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_no_fallback_raises_original_error() -> None:
mw = FallbackModelMiddleware(primary=MagicMock(), fallback=None)
handler = AsyncMock(
side_effect=openai.RateLimitError(
"rate limit",
response=MagicMock(status_code=429, headers={}),
body={},
)
)
with pytest.raises(openai.RateLimitError):
await mw.awrap_model_call(_make_request(), handler)
# ---------------------------------------------------------------------------
# AuthenticationError — never retried
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_auth_error_is_not_retried() -> None:
primary = MagicMock()
fallback = MagicMock()
mw = FallbackModelMiddleware(primary=primary, fallback=fallback)
call_count = 0
async def handler(request: Any) -> str:
nonlocal call_count
call_count += 1
raise openai.AuthenticationError(
"bad api key",
response=MagicMock(status_code=401, headers={}),
body={},
)
with pytest.raises(openai.AuthenticationError):
await mw.awrap_model_call(_make_request(), handler)
# Handler should only be called once (no retry for auth errors)
assert call_count == 1
# ---------------------------------------------------------------------------
# _with_fallback_model
# ---------------------------------------------------------------------------
def test_with_fallback_model_swaps_model_attribute() -> None:
primary = MagicMock(name="primary")
fallback = MagicMock(name="fallback")
mw = FallbackModelMiddleware(primary=primary, fallback=fallback)
request = MagicMock()
request.model = primary
patched = mw._with_fallback_model(request)
assert patched.model is fallback
def test_with_fallback_model_no_model_attr_does_not_crash() -> None:
mw = FallbackModelMiddleware(primary=MagicMock(), fallback=MagicMock())
request = MagicMock(spec=[]) # no attributes
# Should not raise
patched = mw._with_fallback_model(request)
assert patched is request

View File

@@ -0,0 +1,258 @@
"""Unit tests for src/my_deepagent/middleware/safety.py."""
from __future__ import annotations
from typing import Any
from unittest.mock import AsyncMock, MagicMock
import pytest
from my_deepagent.errors import MyDeepAgentError
from my_deepagent.middleware.safety import SafetyShellMiddleware, _is_denied_path
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_shell_request(cmd: str | list[str], tool_name: str = "shell") -> MagicMock:
request = MagicMock()
if isinstance(cmd, list):
request.tool_call = {"name": tool_name, "args": {"argv": cmd}}
else:
request.tool_call = {"name": tool_name, "args": {"command": cmd}}
return request
def _make_other_tool_request(
name: str = "read_file", args: dict[str, Any] | None = None
) -> MagicMock:
request = MagicMock()
request.tool_call = {"name": name, "args": args or {}}
return request
# ---------------------------------------------------------------------------
# Destructive commands — should raise
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_rm_rf_slash_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError) as exc_info:
await mw.awrap_tool_call(_make_shell_request("rm -rf /"), AsyncMock())
assert exc_info.value.code == "destructive_command_blocked"
@pytest.mark.asyncio
async def test_rm_rf_with_path_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError) as exc_info:
await mw.awrap_tool_call(_make_shell_request("rm -rf ./build"), AsyncMock())
assert exc_info.value.code == "destructive_command_blocked"
@pytest.mark.asyncio
async def test_git_push_force_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(_make_shell_request("git push --force origin main"), AsyncMock())
@pytest.mark.asyncio
async def test_git_push_force_with_lease_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(
_make_shell_request("git push --force-with-lease origin main"), AsyncMock()
)
@pytest.mark.asyncio
async def test_git_reset_hard_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(_make_shell_request("git reset --hard HEAD"), AsyncMock())
@pytest.mark.asyncio
async def test_git_clean_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(_make_shell_request("git clean -fd"), AsyncMock())
@pytest.mark.asyncio
async def test_drop_table_sql_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(_make_shell_request("psql -c 'DROP TABLE users'"), AsyncMock())
@pytest.mark.asyncio
async def test_execute_tool_name_also_blocked() -> None:
"""The 'execute' tool name is also checked for destructive patterns."""
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(
_make_shell_request("rm -rf /tmp/data", tool_name="execute"), AsyncMock()
)
# ---------------------------------------------------------------------------
# argv (list) form — should also be blocked
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_rm_rf_as_list_argv_is_blocked() -> None:
mw = SafetyShellMiddleware()
with pytest.raises(MyDeepAgentError):
await mw.awrap_tool_call(
_make_shell_request(["rm", "-rf", "/tmp"], tool_name="shell"), AsyncMock()
)
# ---------------------------------------------------------------------------
# Safe commands — should pass through
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_ls_la_passes_through() -> None:
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="total 42")
result = await mw.awrap_tool_call(_make_shell_request("ls -la"), handler)
assert result == "total 42"
handler.assert_awaited_once()
@pytest.mark.asyncio
async def test_git_status_passes_through() -> None:
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="On branch main")
result = await mw.awrap_tool_call(_make_shell_request("git status"), handler)
assert result == "On branch main"
@pytest.mark.asyncio
async def test_git_push_without_force_passes_through() -> None:
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="ok")
result = await mw.awrap_tool_call(_make_shell_request("git push origin main"), handler)
assert result == "ok"
# ---------------------------------------------------------------------------
# Non-shell tools — should NOT be inspected
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_read_file_tool_with_destructive_content_passes() -> None:
"""read_file is not a shell tool; its content should not be blocked."""
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="file content")
request = _make_other_tool_request("read_file", {"path": "/some/file.py"})
result = await mw.awrap_tool_call(request, handler)
assert result == "file content"
@pytest.mark.asyncio
async def test_unknown_tool_not_checked() -> None:
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="ok")
result = await mw.awrap_tool_call(_make_other_tool_request("arbitrary_tool"), handler)
assert result == "ok"
# ---------------------------------------------------------------------------
# _is_denied_path unit tests
# ---------------------------------------------------------------------------
def test_is_denied_path_env_file() -> None:
assert _is_denied_path(".env") is True
def test_is_denied_path_env_local_in_subdir() -> None:
assert _is_denied_path("config/.env.local") is True
def test_is_denied_path_ssh_key() -> None:
assert _is_denied_path(".ssh/id_rsa") is True
def test_is_denied_path_safe_source_file() -> None:
assert _is_denied_path("src/main.py") is False
def test_is_denied_path_token_file() -> None:
assert _is_denied_path("api_token.json") is True
def test_is_denied_path_aws_credentials() -> None:
assert _is_denied_path(".aws/credentials") is True
def test_is_denied_path_pem_file() -> None:
assert _is_denied_path("key.pem") is True
def test_is_denied_path_absolute_env() -> None:
# absolute path normalised by lstrip('/')
assert _is_denied_path("/.env") is True
# ---------------------------------------------------------------------------
# Secret-path tool blocking via awrap_tool_call
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_read_file_env_path_is_blocked() -> None:
mw = SafetyShellMiddleware()
request = _make_other_tool_request("read_file", {"file_path": ".env"})
with pytest.raises(MyDeepAgentError) as exc_info:
await mw.awrap_tool_call(request, AsyncMock())
assert exc_info.value.code == "secret_access_blocked"
@pytest.mark.asyncio
async def test_write_file_pem_path_is_blocked() -> None:
mw = SafetyShellMiddleware()
request = _make_other_tool_request("write_file", {"file_path": "key.pem"})
with pytest.raises(MyDeepAgentError) as exc_info:
await mw.awrap_tool_call(request, AsyncMock())
assert exc_info.value.code == "secret_access_blocked"
@pytest.mark.asyncio
async def test_ls_ssh_dir_is_blocked() -> None:
mw = SafetyShellMiddleware()
request = _make_other_tool_request("ls", {"path": ".ssh/"})
with pytest.raises(MyDeepAgentError) as exc_info:
await mw.awrap_tool_call(request, AsyncMock())
assert exc_info.value.code == "secret_access_blocked"
@pytest.mark.asyncio
async def test_read_file_safe_path_passes() -> None:
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="content")
request = _make_other_tool_request("read_file", {"file_path": "src/foo.py"})
result = await mw.awrap_tool_call(request, handler)
assert result == "content"
handler.assert_awaited_once()
@pytest.mark.asyncio
async def test_execute_tool_path_arg_not_path_checked() -> None:
"""execute tool goes through shell-check only, not path-check."""
mw = SafetyShellMiddleware()
handler = AsyncMock(return_value="ok")
# safe shell command with a path arg — should not be blocked via path logic
request = _make_shell_request("ls /some/safe/dir", tool_name="execute")
result = await mw.awrap_tool_call(request, handler)
assert result == "ok"

View File

@@ -0,0 +1,332 @@
"""Unit tests for src/my_deepagent/persona.py."""
from __future__ import annotations
import re
from pathlib import Path
import pytest
from pydantic import ValidationError
from my_deepagent.enums import Backend
from my_deepagent.persona import (
FilesystemPermissionSpec,
Persona,
PersonaSubagent,
load_persona_yaml,
load_personas_from_dir,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
PERSONAS_DIR = Path(__file__).parent.parent.parent / "docs" / "schemas" / "personas"
def _minimal_persona_dict(**overrides: object) -> dict[str, object]:
"""Return a minimal valid persona dict, overridable per-test."""
base: dict[str, object] = {
"name": "test-persona",
"version": 1,
"backend": "openrouter",
"model": "openrouter:anthropic/claude-sonnet-4-6",
"provider_origin": "US/Anthropic",
"capabilities": ["spec_write"],
"max_risk_level": "low",
"system_prompt": "You are a test persona for unit tests.",
}
base.update(overrides)
return base
# ---------------------------------------------------------------------------
# Seed yaml: all 10 load successfully
# ---------------------------------------------------------------------------
def test_all_seed_personas_load() -> None:
personas = load_personas_from_dir(PERSONAS_DIR)
assert len(personas) == 10
def test_seed_persona_names_unique() -> None:
personas = load_personas_from_dir(PERSONAS_DIR)
keys = [(p.name, p.version) for p in personas]
assert len(keys) == len(set(keys))
def test_seed_personas_backends_are_openrouter() -> None:
personas = load_personas_from_dir(PERSONAS_DIR)
for p in personas:
assert p.backend == Backend.OPENROUTER
def test_seed_persona_capabilities_non_empty() -> None:
personas = load_personas_from_dir(PERSONAS_DIR)
for p in personas:
assert len(p.capabilities) >= 1
def test_seed_persona_hash_is_64_char_hex() -> None:
personas = load_personas_from_dir(PERSONAS_DIR)
for p in personas:
h = p.compute_hash()
assert re.fullmatch(r"[0-9a-f]{64}", h), f"{p.name}: bad hash {h!r}"
def test_seed_persona_frozen() -> None:
"""Frozen model: attribute assignment must raise."""
personas = load_personas_from_dir(PERSONAS_DIR)
p = personas[0]
with pytest.raises((TypeError, ValidationError)):
p.name = "mutated" # type: ignore[misc]
# ---------------------------------------------------------------------------
# extra="forbid": unknown fields rejected
# ---------------------------------------------------------------------------
def test_persona_extra_field_raises() -> None:
data = _minimal_persona_dict(unknown_field="surprise")
with pytest.raises(ValidationError, match="extra"):
Persona.model_validate(data)
# ---------------------------------------------------------------------------
# FilesystemPermissionSpec validators
# ---------------------------------------------------------------------------
def test_permission_path_no_leading_slash_raises() -> None:
with pytest.raises(ValidationError, match="must start with '/'"):
FilesystemPermissionSpec(operations=["read"], paths=["relative/path"])
def test_permission_path_dotdot_raises() -> None:
with pytest.raises(ValidationError, match=r"must not contain '\.\.'"):
FilesystemPermissionSpec(operations=["read"], paths=["/foo/../bar"])
def test_permission_path_tilde_raises() -> None:
with pytest.raises(ValidationError, match="must not contain '~'"):
FilesystemPermissionSpec(operations=["read"], paths=["/path/~expansion/secret"])
def test_permission_path_glob_ok() -> None:
"""Glob patterns like /** should not trigger the path validator."""
spec = FilesystemPermissionSpec(operations=["read", "write"], paths=["/**"])
assert spec.paths == ("/**",)
def test_permission_mode_default_allow() -> None:
spec = FilesystemPermissionSpec(operations=["read"], paths=["/tmp"])
assert spec.mode == "allow"
def test_permission_deny_mode() -> None:
spec = FilesystemPermissionSpec(operations=["write"], paths=["/.env"], mode="deny")
assert spec.mode == "deny"
def test_permission_extra_field_raises() -> None:
with pytest.raises(ValidationError):
FilesystemPermissionSpec(operations=["read"], paths=["/tmp"], unknown=True) # type: ignore[call-arg]
# ---------------------------------------------------------------------------
# Persona.compute_hash: determinism
# ---------------------------------------------------------------------------
def test_compute_hash_deterministic() -> None:
p = Persona.model_validate(_minimal_persona_dict())
hashes = [p.compute_hash() for _ in range(20)]
assert len(set(hashes)) == 1
def test_compute_hash_different_personas_differ() -> None:
p1 = Persona.model_validate(_minimal_persona_dict(name="p1"))
p2 = Persona.model_validate(_minimal_persona_dict(name="p2"))
assert p1.compute_hash() != p2.compute_hash()
def test_compute_hash_version_affects_hash() -> None:
p1 = Persona.model_validate(_minimal_persona_dict(version=1))
p2 = Persona.model_validate(_minimal_persona_dict(version=2))
assert p1.compute_hash() != p2.compute_hash()
# ---------------------------------------------------------------------------
# Persona: min_length, ge validators
# ---------------------------------------------------------------------------
def test_persona_empty_capabilities_raises() -> None:
data = _minimal_persona_dict(capabilities=[])
with pytest.raises(ValidationError):
Persona.model_validate(data)
def test_persona_version_zero_raises() -> None:
data = _minimal_persona_dict(version=0)
with pytest.raises(ValidationError):
Persona.model_validate(data)
def test_persona_negative_max_cost_raises() -> None:
data = _minimal_persona_dict(max_cost_per_call_usd=-0.01)
with pytest.raises(ValidationError):
Persona.model_validate(data)
def test_persona_system_prompt_too_short_raises() -> None:
data = _minimal_persona_dict(system_prompt="short")
with pytest.raises(ValidationError):
Persona.model_validate(data)
# ---------------------------------------------------------------------------
# load_persona_yaml: file not found
# ---------------------------------------------------------------------------
def test_load_persona_yaml_missing_file(tmp_path: Path) -> None:
with pytest.raises(FileNotFoundError):
load_persona_yaml(tmp_path / "nonexistent.yaml")
# ---------------------------------------------------------------------------
# load_personas_from_dir: duplicate detection
# ---------------------------------------------------------------------------
def test_load_personas_from_dir_duplicate_raises(tmp_path: Path) -> None:
import yaml
data = _minimal_persona_dict()
for fname in ("persona-a@1.yaml", "persona-b@1.yaml"):
(tmp_path / fname).write_text(yaml.dump(data), encoding="utf-8")
with pytest.raises(ValueError, match="duplicate persona"):
load_personas_from_dir(tmp_path)
def test_load_personas_from_dir_missing_dir() -> None:
result = load_personas_from_dir(Path("/nonexistent_directory_xyz"))
assert result == []
def test_load_personas_from_dir_sorted_by_filename(tmp_path: Path) -> None:
"""Files are loaded in filename order for determinism."""
import yaml
for i, name in enumerate(["zz-persona", "aa-persona"]):
data = _minimal_persona_dict(name=name, version=1)
(tmp_path / f"{name}@1.yaml").write_text(yaml.dump(data), encoding="utf-8")
personas = load_personas_from_dir(tmp_path)
assert personas[0].name == "aa-persona"
assert personas[1].name == "zz-persona"
# ---------------------------------------------------------------------------
# PersonaSubagent: extra="forbid", min_length
# ---------------------------------------------------------------------------
def test_subagent_extra_field_raises() -> None:
with pytest.raises(ValidationError):
PersonaSubagent(
name="x",
description="at least ten chars here",
system_prompt="at least ten chars here",
unknown_field=True, # type: ignore[call-arg]
)
def test_subagent_short_description_raises() -> None:
with pytest.raises(ValidationError):
PersonaSubagent(name="x", description="short", system_prompt="at least ten chars here")
# ---------------------------------------------------------------------------
# Snapshot: specific persona hashes are stable
# ---------------------------------------------------------------------------
def test_default_interactive_hash_prefix() -> None:
"""Hash of default-interactive@1 must start with 8193103c.
Hash updated: permissions block removed from yaml (deepagents 0.6.1 workaround).
"""
personas = load_personas_from_dir(PERSONAS_DIR)
p = next(q for q in personas if q.name == "default-interactive")
assert p.compute_hash().startswith("8193103c")
def test_spec_writer_hash_prefix() -> None:
"""Hash of openrouter-claude-spec-writer@1 must be stable."""
personas = load_personas_from_dir(PERSONAS_DIR)
p = next(q for q in personas if q.name == "openrouter-claude-spec-writer")
h = p.compute_hash()
assert len(h) == 64
assert re.fullmatch(r"[0-9a-f]{64}", h)
# ---------------------------------------------------------------------------
# Step 2 patch: null byte path rejection
# ---------------------------------------------------------------------------
def test_filesystem_permission_null_byte_rejected() -> None:
"""Null bytes in a filesystem permission path must be rejected."""
with pytest.raises(ValidationError, match="null bytes"):
FilesystemPermissionSpec.model_validate(
{
"operations": ["read"],
"paths": ["/foo\x00/bar"],
"mode": "deny",
}
)
# ---------------------------------------------------------------------------
# Deep immutability: nested list-valued fields are tuples (cannot be mutated)
# ---------------------------------------------------------------------------
def test_persona_capabilities_immutable() -> None:
"""capabilities is a tuple — .append() must raise AttributeError."""
p = Persona.model_validate(_minimal_persona_dict())
with pytest.raises((AttributeError, TypeError)):
p.capabilities.append(None) # type: ignore[attr-defined]
def test_persona_subagents_immutable() -> None:
"""subagents is a tuple — .append() must raise AttributeError."""
p = Persona.model_validate(_minimal_persona_dict())
with pytest.raises((AttributeError, TypeError)):
p.subagents.append(None) # type: ignore[attr-defined]
def test_persona_skills_immutable() -> None:
"""skills is a tuple — .append() must raise AttributeError."""
p = Persona.model_validate(_minimal_persona_dict())
with pytest.raises((AttributeError, TypeError)):
p.skills.append("new_skill") # type: ignore[attr-defined]
def test_filesystem_permission_paths_immutable() -> None:
"""paths is a tuple — .append() must raise AttributeError."""
perm = FilesystemPermissionSpec(operations=("read",), paths=("/foo",), mode="allow")
with pytest.raises((AttributeError, TypeError)):
perm.paths.append("/bar") # type: ignore[attr-defined]
def test_filesystem_permission_operations_immutable() -> None:
"""operations is a tuple — .append() must raise AttributeError."""
perm = FilesystemPermissionSpec(operations=("read",), paths=("/foo",), mode="allow")
with pytest.raises((AttributeError, TypeError)):
perm.operations.append("write") # type: ignore[attr-defined]

View File

@@ -0,0 +1,229 @@
"""Unit tests for src/my_deepagent/monitoring/pricing.py."""
from __future__ import annotations
import httpx
import pytest
import respx
from my_deepagent.errors import MyDeepAgentError
from my_deepagent.monitoring.pricing import (
ModelPrice,
PricingCache,
_parse_pricing_payload,
fetch_openrouter_pricing,
)
# ---------------------------------------------------------------------------
# _parse_pricing_payload
# ---------------------------------------------------------------------------
def test_parse_valid_payload_returns_model_prices() -> None:
data = {
"data": [
{
"id": "deepseek/deepseek-chat",
"pricing": {"prompt": "0.000001", "completion": "0.000002"},
"context_length": 32768,
},
{
"id": "anthropic/claude-sonnet",
"pricing": {"prompt": "0.000003", "completion": "0.000015"},
"context_length": 200000,
},
]
}
result = _parse_pricing_payload(data)
assert len(result) == 2
assert result[0].model == "deepseek/deepseek-chat"
assert result[0].input_per_1k_usd == pytest.approx(0.001)
assert result[0].output_per_1k_usd == pytest.approx(0.002)
assert result[0].context_length == 32768
assert result[1].model == "anthropic/claude-sonnet"
def test_parse_empty_data_list_returns_empty() -> None:
result = _parse_pricing_payload({"data": []})
assert result == []
def test_parse_data_is_not_list_returns_empty() -> None:
# data is a dict instead of list — malformed response
result = _parse_pricing_payload({"data": {"id": "bad"}})
assert result == []
def test_parse_missing_data_key_returns_empty() -> None:
result = _parse_pricing_payload({})
assert result == []
def test_parse_skips_entries_without_id() -> None:
data = {
"data": [
{"pricing": {"prompt": "0.000001", "completion": "0.000002"}, "context_length": 1000},
]
}
result = _parse_pricing_payload(data)
assert result == []
def test_parse_skips_entries_with_invalid_pricing_values() -> None:
data = {
"data": [
{
"id": "model/x",
"pricing": {"prompt": "not-a-number", "completion": "also-bad"},
"context_length": 1000,
}
]
}
result = _parse_pricing_payload(data)
assert result == []
def test_parse_handles_null_pricing_gracefully() -> None:
data = {
"data": [
{"id": "model/y", "pricing": None, "context_length": 0},
]
}
result = _parse_pricing_payload(data)
# pricing=None -> {} -> prompt/completion default to "0"
assert len(result) == 1
assert result[0].input_per_1k_usd == 0.0
assert result[0].output_per_1k_usd == 0.0
def test_parse_handles_missing_context_length() -> None:
data = {
"data": [
{"id": "model/z", "pricing": {"prompt": "0.000001", "completion": "0.000002"}},
]
}
result = _parse_pricing_payload(data)
assert len(result) == 1
assert result[0].context_length == 0
def test_parse_non_dict_entry_is_skipped() -> None:
data = {"data": ["not-a-dict", None]}
result = _parse_pricing_payload(data)
assert result == []
# ---------------------------------------------------------------------------
# PricingCache.compute_cost
# ---------------------------------------------------------------------------
def test_compute_cost_known_model() -> None:
cache = PricingCache()
cache.set(
[
ModelPrice(
model="deepseek/deepseek-chat",
input_per_1k_usd=0.001,
output_per_1k_usd=0.002,
context_length=32768,
)
]
)
cost = cache.compute_cost("deepseek/deepseek-chat", input_tokens=1000, output_tokens=500)
assert cost == pytest.approx(0.001 * 1.0 + 0.002 * 0.5)
def test_compute_cost_openrouter_prefix_stripped() -> None:
cache = PricingCache()
cache.set(
[
ModelPrice(
model="deepseek/deepseek-chat",
input_per_1k_usd=0.001,
output_per_1k_usd=0.002,
context_length=32768,
)
]
)
# Should strip "openrouter:" prefix when looking up
cost = cache.compute_cost(
"openrouter:deepseek/deepseek-chat", input_tokens=1000, output_tokens=0
)
assert cost == pytest.approx(0.001)
def test_compute_cost_unknown_model_returns_zero() -> None:
cache = PricingCache()
cost = cache.compute_cost("unknown/model", input_tokens=1000, output_tokens=1000)
assert cost == 0.0
def test_compute_cost_zero_tokens_returns_zero() -> None:
cache = PricingCache()
cache.set(
[ModelPrice(model="m/x", input_per_1k_usd=1.0, output_per_1k_usd=2.0, context_length=1000)]
)
assert cache.compute_cost("m/x", input_tokens=0, output_tokens=0) == 0.0
def test_pricing_cache_get_strips_openrouter_prefix() -> None:
cache = PricingCache()
cache.set(
[ModelPrice(model="a/b", input_per_1k_usd=0.5, output_per_1k_usd=1.0, context_length=0)]
)
assert cache.get("openrouter:a/b") is not None
assert cache.get("a/b") is not None
# ---------------------------------------------------------------------------
# fetch_openrouter_pricing (respx mock)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_fetch_openrouter_pricing_success() -> None:
payload = {
"data": [
{
"id": "deepseek/deepseek-chat",
"pricing": {"prompt": "0.000001", "completion": "0.000002"},
"context_length": 64000,
}
]
}
with respx.mock:
respx.get("https://openrouter.ai/api/v1/models").mock(
return_value=httpx.Response(200, json=payload)
)
result = await fetch_openrouter_pricing(
api_key="sk-or-test", base_url="https://openrouter.ai/api/v1"
)
assert len(result) == 1
assert result[0].model == "deepseek/deepseek-chat"
@pytest.mark.asyncio
async def test_fetch_openrouter_pricing_http_error_raises_recoverable() -> None:
with respx.mock:
respx.get("https://openrouter.ai/api/v1/models").mock(
return_value=httpx.Response(401, json={"error": "unauthorized"})
)
with pytest.raises(MyDeepAgentError) as exc_info:
await fetch_openrouter_pricing(
api_key="bad-key", base_url="https://openrouter.ai/api/v1"
)
assert exc_info.value.code == "network_blip"
@pytest.mark.asyncio
async def test_fetch_openrouter_pricing_connect_error_raises_recoverable() -> None:
with respx.mock:
respx.get("https://openrouter.ai/api/v1/models").mock(
side_effect=httpx.ConnectError("connection refused")
)
with pytest.raises(MyDeepAgentError) as exc_info:
await fetch_openrouter_pricing(
api_key="sk-or-test", base_url="https://openrouter.ai/api/v1"
)
assert exc_info.value.code == "network_blip"

View File

@@ -0,0 +1,454 @@
"""Unit tests for src/my_deepagent/session.py.
Tests verify the dataclass-based deepagents API (FilesystemPermission attributes,
build_backend backend type dispatch, _map_operations deduplication, etc.).
No real API calls are made.
"""
from __future__ import annotations
from pathlib import Path
from typing import Any
import pytest
from deepagents import FilesystemPermission
from deepagents.backends import (
CompositeBackend,
FilesystemBackend,
LocalShellBackend,
)
from langchain_openai import ChatOpenAI
from langgraph.graph.state import CompiledStateGraph
from my_deepagent.config import load_config
from my_deepagent.errors import MyDeepAgentError
from my_deepagent.persona import FilesystemPermissionSpec, Persona, PersonaSubagent
from my_deepagent.session import (
_map_operations,
_resolve_openrouter_api_key,
_spec_to_permission,
_subagent_to_dict,
build_agent,
build_backend,
default_safety_permissions,
resolve_model_instance,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _minimal_persona(**overrides: Any) -> Persona:
base: dict[str, Any] = {
"name": "test-persona",
"version": 1,
"backend": "openrouter",
"model": "openrouter:anthropic/claude-sonnet-4-6",
"provider_origin": "US/Anthropic",
"capabilities": ["spec_write"],
"max_risk_level": "low",
"system_prompt": "You are a test assistant for unit tests.",
}
base.update(overrides)
return Persona.model_validate(base)
def _minimal_permission_spec(
operations: list[str] | None = None,
paths: list[str] | None = None,
mode: str = "allow",
) -> FilesystemPermissionSpec:
return FilesystemPermissionSpec(
operations=tuple(operations or ["read"]),
paths=tuple(paths or ["/**"]),
mode=mode, # type: ignore[arg-type]
)
def _minimal_subagent(**overrides: Any) -> PersonaSubagent:
base: dict[str, Any] = {
"name": "test-sub",
"description": "A test subagent description.",
"system_prompt": "You are a subagent for unit tests.",
}
base.update(overrides)
return PersonaSubagent.model_validate(base)
# ---------------------------------------------------------------------------
# default_safety_permissions — dataclass attribute access
# ---------------------------------------------------------------------------
def test_default_safety_permissions_returns_two_entries() -> None:
perms = default_safety_permissions()
assert len(perms) == 2
def test_default_safety_permissions_returns_filesystem_permission_instances() -> None:
perms = default_safety_permissions()
for p in perms:
assert isinstance(p, FilesystemPermission)
def test_default_safety_permissions_allow_is_first() -> None:
perms = default_safety_permissions()
assert perms[0].mode == "allow"
assert "/**" in perms[0].paths
def test_default_safety_permissions_allow_has_both_operations() -> None:
perms = default_safety_permissions()
assert "read" in perms[0].operations
assert "write" in perms[0].operations
def test_default_safety_permissions_deny_is_second() -> None:
perms = default_safety_permissions()
assert perms[1].mode == "deny"
deny_paths = perms[1].paths
assert any("env" in p for p in deny_paths)
assert any("ssh" in p for p in deny_paths)
def test_default_safety_permissions_deny_covers_secrets() -> None:
perms = default_safety_permissions()
deny_paths = perms[1].paths
assert any("secret" in p for p in deny_paths)
assert any("token" in p for p in deny_paths)
assert any("pem" in p for p in deny_paths)
# ---------------------------------------------------------------------------
# _map_operations — 8 케이스
# ---------------------------------------------------------------------------
def test_map_operations_read() -> None:
assert _map_operations(("read",)) == ["read"]
def test_map_operations_write() -> None:
assert _map_operations(("write",)) == ["write"]
def test_map_operations_edit_maps_to_write() -> None:
assert _map_operations(("edit",)) == ["write"]
def test_map_operations_ls_maps_to_read() -> None:
assert _map_operations(("ls",)) == ["read"]
def test_map_operations_deduplicates_all_four() -> None:
result = _map_operations(("read", "write", "edit", "ls"))
assert result == ["read", "write"]
def test_map_operations_ls_and_edit() -> None:
assert _map_operations(("ls", "edit")) == ["read", "write"]
def test_map_operations_preserves_order_write_then_read() -> None:
result = _map_operations(("write", "read"))
assert result == ["write", "read"]
def test_map_operations_empty_returns_empty() -> None:
assert _map_operations(()) == []
# ---------------------------------------------------------------------------
# _spec_to_permission — dataclass attribute + mapping
# ---------------------------------------------------------------------------
def test_spec_to_permission_returns_filesystem_permission() -> None:
spec = _minimal_permission_spec(operations=["read"], paths=["/**"], mode="allow")
result = _spec_to_permission(spec)
assert isinstance(result, FilesystemPermission)
def test_spec_to_permission_maps_read_write_correctly() -> None:
spec = _minimal_permission_spec(operations=["read", "write"], paths=["/**"], mode="allow")
result = _spec_to_permission(spec)
assert result.operations == ["read", "write"]
assert result.paths == ["/**"]
assert result.mode == "allow"
def test_spec_to_permission_maps_edit_to_write() -> None:
spec = _minimal_permission_spec(operations=["edit"], paths=["/src/**"], mode="allow")
result = _spec_to_permission(spec)
assert result.operations == ["write"]
def test_spec_to_permission_maps_ls_to_read() -> None:
spec = _minimal_permission_spec(operations=["ls"], paths=["/data/**"], mode="allow")
result = _spec_to_permission(spec)
assert result.operations == ["read"]
def test_spec_to_permission_deduplicates_read_edit_ls() -> None:
spec = _minimal_permission_spec(
operations=["read", "edit", "ls"], paths=["/workspace/**"], mode="allow"
)
result = _spec_to_permission(spec)
# read=read, edit=write, ls=read → ["read", "write"]
assert result.operations == ["read", "write"]
def test_spec_to_permission_deny_mode_passthrough() -> None:
spec = _minimal_permission_spec(operations=["read"], paths=["/.env*"], mode="deny")
result = _spec_to_permission(spec)
assert result.mode == "deny"
assert "/.env*" in result.paths
# ---------------------------------------------------------------------------
# _subagent_to_dict
# ---------------------------------------------------------------------------
def test_subagent_to_dict_required_fields() -> None:
sub = _minimal_subagent()
d = _subagent_to_dict(sub)
assert d["name"] == "test-sub"
assert d["description"] == "A test subagent description."
assert d["system_prompt"] == "You are a subagent for unit tests."
def test_subagent_to_dict_optional_tools_included_when_set() -> None:
sub = _minimal_subagent(allowed_tools=["read_file", "write_file"])
d = _subagent_to_dict(sub)
assert "tools" in d
assert d["tools"] == ["read_file", "write_file"]
def test_subagent_to_dict_no_tools_key_when_empty() -> None:
sub = _minimal_subagent()
d = _subagent_to_dict(sub)
assert "tools" not in d
def test_subagent_to_dict_optional_model_included_when_set() -> None:
sub = _minimal_subagent(model="openrouter:deepseek/deepseek-chat")
d = _subagent_to_dict(sub)
assert "model" in d
assert d["model"] == "openrouter:deepseek/deepseek-chat"
def test_subagent_to_dict_no_model_key_when_none() -> None:
sub = _minimal_subagent()
d = _subagent_to_dict(sub)
assert "model" not in d
def test_subagent_to_dict_permissions_included_when_set() -> None:
sub = _minimal_subagent(
permissions=[{"operations": ["read"], "paths": ["/**"], "mode": "allow"}]
)
d = _subagent_to_dict(sub)
assert "permissions" in d
assert len(d["permissions"]) == 1
# permissions 안의 항목도 FilesystemPermission 인스턴스
assert isinstance(d["permissions"][0], FilesystemPermission)
def test_subagent_to_dict_permissions_empty_not_included() -> None:
sub = _minimal_subagent()
d = _subagent_to_dict(sub)
assert "permissions" not in d
def test_subagent_to_dict_interrupt_on_included_when_set() -> None:
sub = _minimal_subagent(interrupt_on={"write_file": {"allowed_decisions": ["approve"]}})
d = _subagent_to_dict(sub)
assert "interrupt_on" in d
def test_subagent_to_dict_no_interrupt_on_when_empty() -> None:
sub = _minimal_subagent()
d = _subagent_to_dict(sub)
assert "interrupt_on" not in d
# ---------------------------------------------------------------------------
# _resolve_openrouter_api_key
# ---------------------------------------------------------------------------
def test_resolve_api_key_from_config() -> None:
config = load_config(openrouter_api_key="sk-or-from-config")
key = _resolve_openrouter_api_key(config)
assert key == "sk-or-from-config"
def test_resolve_api_key_from_mydeepagent_env(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.setenv("MYDEEPAGENT_OPENROUTER_API_KEY", "sk-or-env-mydeepagent")
config = load_config(openrouter_api_key=None)
key = _resolve_openrouter_api_key(config)
assert key == "sk-or-env-mydeepagent"
def test_resolve_api_key_fallback_to_openrouter_env(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-env-fallback")
config = load_config(openrouter_api_key=None)
key = _resolve_openrouter_api_key(config)
assert key == "sk-or-env-fallback"
def test_resolve_api_key_raises_when_missing(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("MYDEEPAGENT_OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
config = load_config(openrouter_api_key=None)
with pytest.raises(MyDeepAgentError) as exc_info:
_resolve_openrouter_api_key(config)
assert exc_info.value.code == "backend_auth_failed"
def test_resolve_api_key_config_takes_priority_over_env(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setenv("MYDEEPAGENT_OPENROUTER_API_KEY", "sk-or-env")
config = load_config(openrouter_api_key="sk-or-config-wins")
key = _resolve_openrouter_api_key(config)
assert key == "sk-or-config-wins"
# ---------------------------------------------------------------------------
# resolve_model_instance
# ---------------------------------------------------------------------------
def test_resolve_model_openrouter_returns_chat_openai() -> None:
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(model="openrouter:anthropic/claude-sonnet-4-6")
instance = resolve_model_instance(persona, config)
assert isinstance(instance, ChatOpenAI)
assert instance.openai_api_base == config.openrouter_base_url
def test_resolve_model_openrouter_uses_model_params() -> None:
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(
model="openrouter:anthropic/claude-sonnet-4-6",
model_params={"max_tokens": 1024, "temperature": 0.5},
)
instance = resolve_model_instance(persona, config)
assert isinstance(instance, ChatOpenAI)
assert instance.max_tokens == 1024
def test_resolve_model_non_openrouter_returns_string() -> None:
config = load_config()
persona = _minimal_persona(
backend="anthropic",
model="anthropic:claude-3-5-sonnet-20241022",
)
result = resolve_model_instance(persona, config)
assert isinstance(result, str)
assert result == "anthropic:claude-3-5-sonnet-20241022"
def test_resolve_model_with_override_openrouter() -> None:
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(model="openrouter:anthropic/claude-sonnet-4-6")
instance = resolve_model_instance(
persona, config, model_override="openrouter:deepseek/deepseek-chat"
)
assert isinstance(instance, ChatOpenAI)
assert "deepseek-chat" in instance.model_name
# ---------------------------------------------------------------------------
# build_backend — 5 케이스
# ---------------------------------------------------------------------------
def test_build_backend_local_shell(tmp_path: Path) -> None:
persona = _minimal_persona(deepagents_backend="local_shell")
result = build_backend(persona, tmp_path)
assert isinstance(result, LocalShellBackend)
def test_build_backend_filesystem(tmp_path: Path) -> None:
persona = _minimal_persona(deepagents_backend="filesystem")
result = build_backend(persona, tmp_path)
assert isinstance(result, FilesystemBackend)
def test_build_backend_state_returns_none(tmp_path: Path) -> None:
persona = _minimal_persona(deepagents_backend="state")
result = build_backend(persona, tmp_path)
assert result is None
def test_build_backend_composite(tmp_path: Path) -> None:
persona = _minimal_persona(deepagents_backend="composite")
result = build_backend(persona, tmp_path)
assert isinstance(result, CompositeBackend)
def test_build_backend_langsmith_raises_config_invalid(tmp_path: Path) -> None:
persona = _minimal_persona(deepagents_backend="langsmith")
with pytest.raises(MyDeepAgentError) as exc_info:
build_backend(persona, tmp_path)
assert exc_info.value.code == "config_invalid"
# ---------------------------------------------------------------------------
# build_agent
# ---------------------------------------------------------------------------
def test_build_agent_returns_compiled_state_graph(tmp_path: Path) -> None:
"""build_agent should construct a CompiledStateGraph without calling the LLM API."""
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(deepagents_backend="state")
graph = build_agent(persona, config, root_dir=tmp_path)
assert isinstance(graph, CompiledStateGraph)
assert hasattr(graph, "invoke")
assert hasattr(graph, "ainvoke")
def test_build_agent_with_middleware_list(tmp_path: Path) -> None:
"""Extra middleware is accepted without error.
build_agent automatically prepends SafetyShellMiddleware. Callers should pass
*other* middleware here; passing a second SafetyShellMiddleware would hit
deepagents' duplicate-name guard.
"""
from my_deepagent.middleware.audit import AuditToolMiddleware
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(deepagents_backend="state")
graph = build_agent(
persona,
config,
root_dir=tmp_path,
middleware=[AuditToolMiddleware()],
)
assert isinstance(graph, CompiledStateGraph)
def test_build_agent_filesystem_backend(tmp_path: Path) -> None:
"""build_agent works with filesystem backend."""
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(deepagents_backend="filesystem")
graph = build_agent(persona, config, root_dir=tmp_path)
assert isinstance(graph, CompiledStateGraph)
def test_build_agent_with_persona_permissions(tmp_path: Path) -> None:
"""build_agent merges persona permissions with default safety permissions."""
config = load_config(openrouter_api_key="sk-or-test")
persona = _minimal_persona(
deepagents_backend="state",
permissions=[{"operations": ["read"], "paths": ["/workspace/**"], "mode": "allow"}],
)
graph = build_agent(persona, config, root_dir=tmp_path)
assert isinstance(graph, CompiledStateGraph)

View File

@@ -0,0 +1,55 @@
"""Seed persona integration tests for session.py model resolution."""
from __future__ import annotations
from pathlib import Path
import pytest
from langchain_openai import ChatOpenAI
from my_deepagent.config import load_config
from my_deepagent.enums import Backend
from my_deepagent.persona import load_personas_from_dir
from my_deepagent.session import resolve_model_instance
PERSONAS_DIR = Path(__file__).parent.parent.parent / "docs" / "schemas" / "personas"
@pytest.fixture
def seed_personas() -> list: # type: ignore[type-arg]
return load_personas_from_dir(PERSONAS_DIR)
def test_resolve_model_instance_seed_personas(seed_personas: list) -> None: # type: ignore[type-arg]
"""resolve_model_instance should return ChatOpenAI for openrouter personas, str otherwise."""
config = load_config(openrouter_api_key="sk-or-dummy")
for persona in seed_personas:
instance = resolve_model_instance(persona, config)
if persona.backend == Backend.OPENROUTER:
assert isinstance(instance, ChatOpenAI), (
f"persona {persona.name!r} with backend=openrouter should return ChatOpenAI, "
f"got {type(instance)}"
)
# base_url should point to openrouter
assert instance.openai_api_base is not None
base = instance.openai_api_base
assert "openrouter" in base or base == config.openrouter_base_url
else:
assert isinstance(instance, str), (
f"persona {persona.name!r} with backend={persona.backend} should return str, "
f"got {type(instance)}"
)
def test_all_seed_personas_have_non_empty_model(seed_personas: list) -> None: # type: ignore[type-arg]
for persona in seed_personas:
assert persona.model, f"persona {persona.name!r} has empty model"
def test_all_openrouter_seed_personas_have_openrouter_prefix(seed_personas: list) -> None: # type: ignore[type-arg]
for persona in seed_personas:
if persona.backend == Backend.OPENROUTER:
assert persona.model.startswith("openrouter:"), (
f"persona {persona.name!r} has backend=openrouter but model={persona.model!r} "
"does not start with 'openrouter:'"
)

View File

@@ -0,0 +1,335 @@
"""Unit tests for src/my_deepagent/workflow.py."""
from __future__ import annotations
import re
from pathlib import Path
import pytest
from pydantic import ValidationError
from my_deepagent.workflow import (
ExpectedArtifact,
WorkflowTemplate,
load_workflow_yaml,
load_workflows_from_dir,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
WORKFLOWS_DIR = Path(__file__).parent.parent.parent / "docs" / "schemas" / "workflows"
def _minimal_role(**overrides: object) -> dict[str, object]:
base: dict[str, object] = {
"id": "spec_writer",
"required_capabilities": ["spec_write"],
}
base.update(overrides)
return base
def _minimal_phase(**overrides: object) -> dict[str, object]:
base: dict[str, object] = {
"key": "spec",
"title": "Write spec",
"risk": "low",
"role": "spec_writer",
"instructions": "Write the specification document for the feature.",
}
base.update(overrides)
return base
def _minimal_template(**overrides: object) -> dict[str, object]:
base: dict[str, object] = {
"name": "test-workflow",
"version": 1,
"roles": [_minimal_role()],
"phases": [_minimal_phase()],
}
base.update(overrides)
return base
# ---------------------------------------------------------------------------
# Seed yaml: all 3 load successfully
# ---------------------------------------------------------------------------
def test_all_seed_workflows_load() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
assert len(workflows) == 3
def test_seed_workflow_names() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
names = {w.name for w in workflows}
assert names == {"spec-and-review", "bug-fix-with-reproduction", "code-investigation"}
def test_seed_workflow_roles_non_empty() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
for w in workflows:
assert len(w.roles) >= 1
def test_seed_workflow_phases_non_empty() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
for w in workflows:
assert len(w.phases) >= 1
def test_seed_workflow_phase_keys_unique() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
for w in workflows:
keys = [ph.key for ph in w.phases]
assert len(keys) == len(set(keys)), f"{w.name}: duplicate phase keys"
# ---------------------------------------------------------------------------
# WorkflowTemplate validators
# ---------------------------------------------------------------------------
def test_phase_references_undefined_role_raises() -> None:
data = _minimal_template(
roles=[_minimal_role(id="spec_writer")],
phases=[_minimal_phase(role="nonexistent_role")],
)
with pytest.raises(ValidationError, match="unknown role"):
WorkflowTemplate.model_validate(data)
def test_duplicate_phase_keys_raises() -> None:
data = _minimal_template(
roles=[_minimal_role(id="spec_writer")],
phases=[
_minimal_phase(key="spec"),
_minimal_phase(key="spec"),
],
)
with pytest.raises(ValidationError, match="duplicate phase keys"):
WorkflowTemplate.model_validate(data)
def test_duplicate_role_ids_raises() -> None:
data = _minimal_template(
roles=[_minimal_role(id="spec_writer"), _minimal_role(id="spec_writer")],
phases=[_minimal_phase(role="spec_writer")],
)
with pytest.raises(ValidationError, match="duplicate role ids"):
WorkflowTemplate.model_validate(data)
def test_phase_key_uppercase_raises() -> None:
data = _minimal_template(phases=[_minimal_phase(key="SPEC")])
with pytest.raises(ValidationError):
WorkflowTemplate.model_validate(data)
def test_phase_key_with_hyphen_raises() -> None:
"""Hyphens are not allowed in phase keys (only a-z, 0-9, _)."""
data = _minimal_template(phases=[_minimal_phase(key="spec-one")])
with pytest.raises(ValidationError):
WorkflowTemplate.model_validate(data)
def test_phase_key_leading_digit_raises() -> None:
data = _minimal_template(phases=[_minimal_phase(key="1spec")])
with pytest.raises(ValidationError):
WorkflowTemplate.model_validate(data)
def test_phase_key_snake_case_ok() -> None:
data = _minimal_template(phases=[_minimal_phase(key="spec_write_phase")])
wt = WorkflowTemplate.model_validate(data)
assert wt.phases[0].key == "spec_write_phase"
def test_role_id_pattern_invalid_raises() -> None:
data = _minimal_template(
roles=[_minimal_role(id="Spec-Writer")],
phases=[_minimal_phase(role="spec_writer")],
)
with pytest.raises(ValidationError):
WorkflowTemplate.model_validate(data)
# ---------------------------------------------------------------------------
# ExpectedArtifact: alias mapping
# ---------------------------------------------------------------------------
def test_expected_artifact_schema_alias() -> None:
"""yaml uses 'schema' key; Python attribute is schema_id."""
art = ExpectedArtifact.model_validate({"path": "artifacts/spec.json", "schema": "dev/spec@1"})
assert art.schema_id == "dev/spec@1"
assert art.path == "artifacts/spec.json"
def test_expected_artifact_extra_field_raises() -> None:
with pytest.raises(ValidationError):
ExpectedArtifact.model_validate({"path": "x.json", "schema": "dev/spec@1", "unknown": True})
def test_expected_artifact_missing_schema_raises() -> None:
with pytest.raises(ValidationError):
ExpectedArtifact.model_validate({"path": "x.json"})
# ---------------------------------------------------------------------------
# WorkflowTemplate frozen + extra="forbid"
# ---------------------------------------------------------------------------
def test_template_frozen() -> None:
wt = WorkflowTemplate.model_validate(_minimal_template())
with pytest.raises((TypeError, ValidationError)):
wt.name = "mutated" # type: ignore[misc]
def test_template_extra_field_raises() -> None:
data = _minimal_template(extra_unknown_field="oops")
with pytest.raises(ValidationError):
WorkflowTemplate.model_validate(data)
# ---------------------------------------------------------------------------
# compute_hash: determinism
# ---------------------------------------------------------------------------
def test_compute_hash_deterministic() -> None:
wt = WorkflowTemplate.model_validate(_minimal_template())
hashes = [wt.compute_hash() for _ in range(20)]
assert len(set(hashes)) == 1
def test_compute_hash_returns_64_char_hex() -> None:
wt = WorkflowTemplate.model_validate(_minimal_template())
h = wt.compute_hash()
assert re.fullmatch(r"[0-9a-f]{64}", h)
def test_compute_hash_different_templates_differ() -> None:
wt1 = WorkflowTemplate.model_validate(_minimal_template(name="wf1"))
wt2 = WorkflowTemplate.model_validate(_minimal_template(name="wf2"))
assert wt1.compute_hash() != wt2.compute_hash()
# ---------------------------------------------------------------------------
# load_workflow_yaml: file not found
# ---------------------------------------------------------------------------
def test_load_workflow_yaml_missing_file(tmp_path: Path) -> None:
with pytest.raises(FileNotFoundError):
load_workflow_yaml(tmp_path / "no.yaml")
# ---------------------------------------------------------------------------
# load_workflows_from_dir: duplicate detection + missing dir
# ---------------------------------------------------------------------------
def test_load_workflows_from_dir_duplicate_raises(tmp_path: Path) -> None:
import yaml
data = _minimal_template()
for fname in ("wf-a@1.yaml", "wf-b@1.yaml"):
(tmp_path / fname).write_text(yaml.dump(data), encoding="utf-8")
with pytest.raises(ValueError, match="duplicate workflow"):
load_workflows_from_dir(tmp_path)
def test_load_workflows_from_dir_missing_dir() -> None:
result = load_workflows_from_dir(Path("/nonexistent_wf_dir_xyz"))
assert result == []
# ---------------------------------------------------------------------------
# Snapshot: seed hashes are stable
# ---------------------------------------------------------------------------
def test_spec_and_review_hash_prefix() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
w = next(x for x in workflows if x.name == "spec-and-review")
assert w.compute_hash().startswith("1c94587647b16f0d")
def test_bug_fix_hash_prefix() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
w = next(x for x in workflows if x.name == "bug-fix-with-reproduction")
assert w.compute_hash().startswith("a137c9656f10e88a")
# ---------------------------------------------------------------------------
# Step 2 patch: Counter-based duplicate role ids report is sorted
# ---------------------------------------------------------------------------
def test_workflow_duplicate_role_ids_reported_sorted() -> None:
"""Multiple duplicated role ids must be reported in sorted order."""
with pytest.raises(ValidationError, match=r"duplicate role ids: \['a', 'b'\]"):
WorkflowTemplate.model_validate(
{
"name": "x",
"version": 1,
"roles": [
{"id": "b", "required_capabilities": ["spec_write"]},
{"id": "a", "required_capabilities": ["spec_write"]},
{"id": "a", "required_capabilities": ["spec_write"]},
{"id": "b", "required_capabilities": ["spec_write"]},
],
"phases": [
{
"key": "x",
"title": "x",
"risk": "low",
"role": "a",
"instructions": "x" * 20,
}
],
}
)
def test_code_investigation_hash_prefix() -> None:
workflows = load_workflows_from_dir(WORKFLOWS_DIR)
w = next(x for x in workflows if x.name == "code-investigation")
assert w.compute_hash().startswith("5b80ea2e248d5232")
# ---------------------------------------------------------------------------
# Deep immutability: nested list-valued fields are tuples (cannot be mutated)
# ---------------------------------------------------------------------------
def test_workflow_phases_immutable() -> None:
"""phases is a tuple — .append() must raise AttributeError."""
wt = WorkflowTemplate.model_validate(_minimal_template())
with pytest.raises((AttributeError, TypeError)):
wt.phases.append(None) # type: ignore[attr-defined]
def test_workflow_roles_immutable() -> None:
"""roles is a tuple — .append() must raise AttributeError."""
wt = WorkflowTemplate.model_validate(_minimal_template())
with pytest.raises((AttributeError, TypeError)):
wt.roles.append(None) # type: ignore[attr-defined]
def test_workflow_role_required_capabilities_immutable() -> None:
"""required_capabilities is a tuple — .append() must raise AttributeError."""
from my_deepagent.workflow import WorkflowRole
role = WorkflowRole.model_validate(
{"id": "spec_writer", "required_capabilities": ["spec_write"]}
)
with pytest.raises((AttributeError, TypeError)):
role.required_capabilities.append(None) # type: ignore[attr-defined]

2501
my-deepagent/uv.lock generated Normal file

File diff suppressed because it is too large Load Diff