feat(my-deepagent): v0.3 PR #2 — context compaction (auto + manual /compact)
Claude Code의 auto-compact + `/compact` 슬래시 등가. 핵심 동작: - 세션 누적 토큰 (`total_input_tokens + total_output_tokens`)이 활성 모델 컨텍스트 윈도우의 70%를 넘으면 자동으로 가장 오래된 비-system / 비-archived 메시지를 cheap 모델 (`openrouter:deepseek/deepseek-chat` 기본)로 1회 요약 → `MessageRow(is_summary=True, role=system)` 1줄 삽입 + 원본은 `archived=True` + negative seq band (-(original.seq + 1))으로 옮김. - LangGraph thread는 `thread_suffix` bump로 새 컨텍스트 시작 (재인입 비용 회피). 세션 자체는 살아있음 — `sessions show <id> --all`로 archived 메시지 조회 가능. - 수동 `/compact` 슬래시도 동일 함수 호출. 메시지가 부족하면 (`< MIN_COMPACTABLE`) 사유 출력하고 no-op. 데이터·라이브러리: - `monitoring/token_budget.py` (신규): `tiktoken cl100k_base`로 추정 (DeepSeek/ Anthropic 모델 정확한 토크나이저가 없으므로 보수적 over-count). `MODEL_CONTEXT_LIMITS` (DeepSeek 64k, Claude Sonnet/Haiku/Opus 200k, GPT-4o 128k), 미등록 모델은 32k 기본값. `COMPACTION_THRESHOLD = 0.7`. - `compaction.py` (신규): `should_compact()` / `compact_session()` / `CompactionResult`. `_SESSION_LOCKS: dict[str, asyncio.Lock]` 세션별 직렬화 — 동시 compaction은 두 번째가 첫 번째를 기다림. `KEEP_RECENT_K = 10`, `MIN_COMPACTABLE = 4`. LLM 호출은 DB session 바깥 (asyncpg connection 점유 회피). - `pyproject.toml`: `tiktoken>=0.7` 명시 (이전엔 langchain-openai 경유 transitive). REPL 통합 (`cli/interactive.py`): - `_approx_token_count`를 tiktoken-based로 교체. - 매 ainvoke 후 `should_compact(session_row)` → 임계 초과 시 자동 `compact_session()` → 성공 시 `clear_agent_cache()`로 thread bump + 한 줄 알림. - `/compact` 슬래시 등록 (`_register_compaction_slash`). 테스트 (`tests/integration/test_compaction.py`, 7 케이스): 1. `should_compact` 70% 임계 아래/위/미등록 모델 (3개) 2. `MIN_COMPACTABLE` 미만 → LLM 호출 없이 거부 3. Happy path: 14 메시지 → 4 archive(negative seq) + summary at seq=1 + 10 live 유지 + 토큰 카운터 산술 검증 4. 동일 session_id 동시 호출 2개 → Lock 직렬화 검증 5. 없는 session_id → `session_not_found` 게이트: - ruff check / format --check / mypy: PASS - pytest -q --ignore=tests/integration/test_e2e_workflow.py --ignore=tests/integration/test_openrouter_smoke.py: 611 passed (7 신규 포함) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -33,11 +33,13 @@ from sqlalchemy import desc, select
|
||||
|
||||
from ..audit import make_audit_recorder
|
||||
from ..budget import make_budget_tracker_from_config
|
||||
from ..compaction import compact_session, should_compact
|
||||
from ..config import Config, load_config
|
||||
from ..governance import require_consent
|
||||
from ..middleware.audit import AuditToolMiddleware
|
||||
from ..middleware.cost import CostMiddleware
|
||||
from ..monitoring.pricing import ModelPrice, PricingCache
|
||||
from ..monitoring.token_budget import count_tokens
|
||||
from ..persistence.checkpointer import get_checkpointer_ctx
|
||||
from ..persistence.db import Database
|
||||
from ..persistence.models import InteractiveSessionRow, MessageRow
|
||||
@@ -462,10 +464,29 @@ def _register_telemetry_slash(reg: SlashRegistry) -> None:
|
||||
reg.register("sessions", _sessions, help="list recent interactive sessions")
|
||||
|
||||
|
||||
def _register_compaction_slash(reg: SlashRegistry, sess: InteractiveSession) -> None:
|
||||
"""Register /compact slash handler (v0.3 PR #2)."""
|
||||
|
||||
async def _compact(_: SlashParsed) -> bool:
|
||||
result = await compact_session(sess.db, sess.config, str(sess.session_id))
|
||||
if result.compacted:
|
||||
sess.clear_agent_cache()
|
||||
_CONSOLE.print(
|
||||
f"[green]compacted[/] — {result.archived} messages archived, "
|
||||
f"summary {result.summary_tokens} tokens (new thread started)"
|
||||
)
|
||||
else:
|
||||
_CONSOLE.print(f"[yellow]compaction skipped:[/] {result.reason}")
|
||||
return False
|
||||
|
||||
reg.register("compact", _compact, help="manually compact the conversation history")
|
||||
|
||||
|
||||
def _register_slash(reg: SlashRegistry, sess: InteractiveSession) -> None:
|
||||
_register_navigation_slash(reg, sess)
|
||||
_register_persona_slash(reg, sess)
|
||||
_register_telemetry_slash(reg)
|
||||
_register_compaction_slash(reg, sess)
|
||||
|
||||
|
||||
def _completer(personas: list[Persona], slash_names: list[str]) -> WordCompleter:
|
||||
@@ -474,14 +495,14 @@ def _completer(personas: list[Persona], slash_names: list[str]) -> WordCompleter
|
||||
return WordCompleter(words, ignore_case=True, sentence=True)
|
||||
|
||||
|
||||
def _approx_token_count(text: str) -> int:
|
||||
"""Conservative char-based token estimate (PR #1 placeholder).
|
||||
def _approx_token_count(text: str, model: str = "") -> int:
|
||||
"""Token count via tiktoken (PR #2).
|
||||
|
||||
PR #2 swaps this for tiktoken with model-aware tokenizer selection.
|
||||
1 token ≈ 4 chars is the cl100k_base rule of thumb for English; mixed
|
||||
Korean text trends higher tokens/char, so we round up.
|
||||
Falls back to a char-based heuristic inside `count_tokens` on tiktoken
|
||||
failure. Caller passes the active model so future model-specific
|
||||
tokenizers slot in without changing the call site.
|
||||
"""
|
||||
return max(0, (len(text) + 3) // 4)
|
||||
return count_tokens(text, model)
|
||||
|
||||
|
||||
async def _invoke_and_stream(
|
||||
@@ -496,7 +517,7 @@ async def _invoke_and_stream(
|
||||
sess.session_id,
|
||||
"user",
|
||||
user_text,
|
||||
token_count=_approx_token_count(user_text),
|
||||
token_count=_approx_token_count(user_text, sess.active_model),
|
||||
)
|
||||
|
||||
# 2. Invoke the agent. LangGraph thread_id includes the suffix so /model
|
||||
@@ -528,9 +549,23 @@ async def _invoke_and_stream(
|
||||
sess.session_id,
|
||||
"assistant",
|
||||
content_str,
|
||||
token_count=_approx_token_count(content_str),
|
||||
token_count=_approx_token_count(content_str, sess.active_model),
|
||||
)
|
||||
|
||||
# 4. Auto-compaction check. Triggered when total used tokens cross 70%
|
||||
# of the active model's context window. Holds a per-session lock so
|
||||
# concurrent turns serialise; failure is non-fatal (next turn retries).
|
||||
async with sess.db.session() as s:
|
||||
session_row = await s.get(InteractiveSessionRow, str(sess.session_id))
|
||||
if session_row is not None and should_compact(session_row):
|
||||
result = await compact_session(sess.db, sess.config, str(sess.session_id))
|
||||
if result.compacted:
|
||||
sess.clear_agent_cache() # bumps thread_suffix → fresh deepagents thread
|
||||
_CONSOLE.print(
|
||||
f"[dim]context compacted — {result.archived} messages archived, "
|
||||
f"summary {result.summary_tokens} tokens, new thread[/]"
|
||||
)
|
||||
|
||||
|
||||
async def _repl_loop(
|
||||
sess: InteractiveSession,
|
||||
|
||||
Reference in New Issue
Block a user