Files
can-you-assist/workplans/CYA-WP-0001-console-native-mvp.md
tegwick f93b766e12 docs(memory): add MemoryVision.md + gap analysis and related doc updates
- New MemoryVision.md outlining long-term vision for phase-memory integration in cya (profiles, phases, lifecycle, ports)
- Persisted full Intent-vs-Scope gap analysis in history/
- Updated SCOPE.md to reflect post-MVP reality and MemoryVision direction
- Minor cross-references in AGENTS.md and the CYA-WP-0001 workplan

This lays the foundation for the next workplan (CYA-WP-0002) focused on realizing the MemoryVision.

Refs: MemoryVision.md, history/2026-05-26-CYA-Intent-Scope-Gap-Analysis.md, CYA-WP-0001 T05/T08
2026-05-26 02:42:54 +02:00

22 KiB
Raw Permalink Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_decision_id, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_decision_id state_hub_workstream_id
CYA-WP-0001 workplan Console-Native MVP: CLI Skeleton, Safe Assistance Flow, and Integration Boundaries capabilities can-you-assist finished grok foerster-capabilities 2026-05-25 2026-05-26 a644364b-11c4-49a9-bf17-99063382e27b a3ef5a46-17be-41a9-8df8-f457d86462da

CYA-WP-0001: Console-Native MVP — CLI Skeleton, Safe Assistance Flow, and Integration Boundaries

Status Update — 2026-05-26 (Activated)

This workplan is moved from ready to active immediately following resolution of State Hub Decision a644364b-11c4-49a9-bf17-99063382e27b.

Stack technology choices (accepted per agent recommendations):

  • Primary language/runtime: Python
  • CLI framework: Typer
  • Terminal presentation: rich
  • Configuration format: TOML (alignment with sibling patterns)
  • Packaging/layout: Full modern Python package (pyproject.toml + src/ layout) from day 1

Additional operator direction recorded in the decision:

  • Phase-memory integration (T05): Keep strictly minimal. Pure explicit ports / no-op implementations with clear "replace with real phase-memory integration" markers. No placeholder local JSON file or store in this slice.
  • Safety / risk classification (T03): Implement genuine rule-based assessment as the primary mechanism. Surface the classification results to the LLM as structured context where relevant. The LLM is allowed to propose actions or refine classifications, but any architecture-level, policy, or significant design decisions that arise must be captured as ADRs in this repository.

The narrow MVP slice is now authorized to proceed. Implementation of T01 (scaffolding + Typer CLI entrypoint) can begin.

Status Update — 2026-05-26 (T01T07 core implementation complete)

Commit: git commit of the T01T07 slice (see below for SHA).

Delivered and verified by running the installed cya binary + pytest:

  • T01: Modern Python package (pyproject.toml + src/ layout), Typer + rich CLI, cya --help, --version, cya "<request>" one-shot mode, editable install works.
  • T02: Bounded, transparent, non-recursive context collector (cwd top-level + git + env, name-based ignores, provenance on every item) + fully working --explain-context.
  • T03: Genuine rule-based risk classifier as primary mechanism (destructive, mass-edit, privileged, network, safe, etc.). Mandatory explicit terminal confirmation for anything above "safe". No auto-execution. Matches the exact workplan acceptance example.
  • T04: Stable LLMAdapter Protocol + deterministic FakeLLMAdapter. 100% of LLM interaction flows through this seam (ready for real llm-connect).
  • T05: Strictly minimal phase-memory ports (pure no-ops with loud "phase-memory not yet connected" markers, no hidden store or singletons, per operator direction).
  • T06: Orchestrator that coordinates collector → risk/confirmation → adapter → render. CLI surface is now thin delegation.
  • T07: pytest harness + 9+ safety-focused tests (risk classifier on destructive cases, collector invariants, serializability). All green, no live LLM required.

State Hub: Progress event logged against workstream 0a1233fd-75ab-4726-8857-6c97de939069. Operator should run cd ~/state-hub && make fix-consistency REPO=can-you-assist to import the updated tasks and regenerate .custodian-brief.md.

Next: Finish T07 (more orchestrator/adapter tests) or move to T08 (README, USAGE, handoff, AGENTS.md command updates).

Goal

Deliver the first narrow, usable slice of cya (the can-you-assist console assistant) that proves the core loop:

User expresses intent in natural language from a terminal → cya safely gathers minimal relevant local context → routes through a clean adapter boundary for llm-connect → returns an explainable suggestion, command, or answer → enforces explicit confirmation for any potentially destructive or broad action.

This workplan establishes the CLI surface, context collector, safety layer, and the integration seams for the two sibling projects (llm-connect and phase-memory) without taking ownership of their implementations. It produces a minimal but real tool that can be invoked today for practical console work.

Current Evidence

  • High-quality intent, scope, and agent instruction documents exist and are consistent (INTENT.md, SCOPE.md, AGENTS.md, README.md, wiki/CyaSpeechModeExtension.md).
  • State Hub bootstrap workstream repo-integration-can-you-assist (id 0a1233fd-75ab-4726-8857-6c97de939069) — both bootstrap tasks are now complete. Decision D1 (a644364b-11c4-49a9-bf17-99063382e27b) has been resolved with the choices above; this workplan is now active.
  • Sibling projects exist and are further along:
    • llm-connect (real Python package with multi-provider adapters, config, tests).
    • phase-memory (foundational workplans complete; local runtime, ports, and contracts exist).
  • Implementation progress (as of this commit): Full working cya CLI + package (T01), bounded context collector (T02), genuine rule-based risk + mandatory confirmation (T03), llm-connect adapter Protocol + Fake (T04), strictly minimal phase-memory no-op ports (T05), orchestrator (T06), pytest harness + safety tests (T07). The tool can be installed (pip install -e .) and used today.
  • grok inspect successfully discovers AGENTS.md and the project context.

Non-Goals (for this MVP slice)

  • No implementation of durable, user-controlled memory or complex preference adaptation (define explicit thin ports only; real behavior comes from phase-memory later).
  • No voice, speech, or phone-bridge mode (the detailed design in wiki/CyaSpeechModeExtension.md is explicitly future work).
  • No autonomous or background execution of generated commands, scripts, or file modifications.
  • Do not vendor, copy, or fork code from llm-connect or phase-memory.
  • No deep repository indexing, embeddings, vector search, or large-scale content analysis in the first slice.
  • No distribution, homebrew/pip packaging polish, or multi-platform installers.
  • No long-lived multi-turn conversational REPL or session state machine (one-shot + very lightweight session possible if it falls out naturally).
  • No assumption of any specific LLM provider or hosted service.

Success Criteria for This Slice

  • cya "..." (and cya --help) can be run from a fresh checkout after a simple editable install and produces useful, context-aware output for the canonical examples in INTENT.md.
  • Context collection is transparent and bounded: the user can always see exactly what local information would be sent to the model.
  • Any suggestion that the risk classifier marks above "safe" (destructive filesystem operations, force pushes, mass edits, network-affecting commands, etc.) triggers a clear preview + explicit terminal confirmation flow. Nothing executes without the user typing confirmation.
  • 100% of LLM interaction (even in tests) flows through a small, well-documented adapter boundary whose shape is compatible with the direction of llm-connect.
  • A focused test suite exists with strong coverage of the safety invariants and context-selection rules.
  • The workplan file is the source of truth. After the operator runs make fix-consistency, State Hub reflects the tasks and status.
  • A new contributor or the sibling teams can read the updated README + this workplan and immediately understand the integration points and non-goals.

Task Breakdown

T01 — Project scaffolding, language choice, and CLI entrypoint

id: CYA-WP-0001-T01
status: done
priority: high
state_hub_task_id: "716a8679-39b1-4b99-a2d4-44b1f5076f9e"

Bootstrap the minimal runnable package and the primary user-facing command.

  • Confirm primary implementation language (strong current signal: Python, driven by .gitignore patterns and the concrete state of the two sibling projects). Record the decision and rationale in this workplan or a short ADR note.
  • Create the initial package layout and build configuration (pyproject.toml or equivalent) with:
    • Proper package name (can-you-assist), version, and console script entry point cya.
    • Clean separation: cya/cli/, cya/context/, cya/safety/, cya/llm/ (boundary), cya/memory/ (ports).
  • Implement the absolute minimum CLI surface:
    • cya --version
    • cya --help (rich, with examples)
    • cya "free-form natural language request here" (one-shot mode)
    • Basic flags for context control (--file, --no-git, --explain-context, --dry-run).
  • Support both one-shot invocation and a very lightweight "session" mode if it falls out naturally from the REPL-less design (future voice bridge will need session tokens).

Acceptance criteria:

  • After pip install -e . (or the equivalent for the chosen stack), cya --help and cya --version work with no external services.
  • The command accepts a positional request string and prints a structured, human-readable response even before any LLM integration exists (graceful fallback or explicit "no backend configured" message is acceptable in early T01).
  • Layout and naming make the future integration seams obvious.

T02 — Safe, transparent, and intentionally bounded local context collector

id: CYA-WP-0001-T02
status: done
priority: high
state_hub_task_id: "349dc524-56ca-4f4f-a7a9-7afbca35c166"

Implement the "Context Collector" responsibility described in INTENT.md and SCOPE.md.

Collect only what is necessary for the current request and make the collection completely inspectable.

Minimum sources for MVP (all opt-in or narrowly scoped):

  • Current working directory (top-level entries only; respect common ignore patterns such as .git, node_modules, .venv, __pycache__, etc.).
  • Git state summary (current branch, dirty status, recent commit subject, list of modified files — obtained via read-only subprocess calls).
  • Explicitly provided files or globs via --file / positional arguments.
  • Data from stdin when the tool is used in a pipeline.
  • Tiny, high-signal environment facts only when clearly relevant (e.g., $SHELL, $EDITOR/$VISUAL).

Hard constraints for this slice:

  • No unbounded recursive directory walks.
  • No secret scanning or credential harvesting.
  • No automatic scraping of shell history, ~/.config, or other user data without an explicit future opt-in + memory layer.
  • Collection must produce a stable, serializable ContextEnvelope (or equivalent) that can be pretty-printed for the user and hashed/token-counted for the model.

Acceptance criteria:

  • A --show-context / --explain-context (or debug) flag prints exactly the data that would be sent to the LLM, with clear provenance for each piece.
  • Tests prove that the collector refuses to traverse known dangerous or expensive locations.
  • The collector is a pure module with no side effects beyond read-only inspection.

T03 — Risk classification and mandatory confirmation layer

id: CYA-WP-0001-T03
status: done
priority: high
state_hub_task_id: "28306063-cfb5-4049-ab45-365526bd3e28"

This is a core product behavior, not an afterthought.

Operator direction (2026-05-26): Implement genuine rule-based assessment as the primary mechanism. Provide the classification results to the LLM as structured context where relevant. The LLM may propose or refine, but any architecture-level or policy decisions that surface must be raised as ADRs.

Build a risk classifier (simple rules + optional LLM assistance for edge cases) that labels suggestions as one of:

  • safe
  • review (needs a second look)
  • destructive / mass-edit / privileged
  • network-affecting
  • other (with rationale)

For anything above "safe":

  • Always emit a clear, copy-pasteable preview of the exact command(s) or action(s) being suggested.
  • Show a concise "what will be affected" summary derived from the context envelope.
  • Require an explicit confirmation step in the controlling terminal (e.g., Run this? [y/N] or Type 'yes' to proceed).
  • Record the confirmation decision in any future audit trail (even if the memory layer is not yet present).

Never auto-execute anything in this slice, even "safe" suggestions, unless the user has explicitly asked for a "run" sub-mode (and even then only after preview).

Acceptance criteria:

  • cya "delete every log file older than 30 days in this tree" produces a preview, classifies the action as destructive, and blocks until the user confirms in the terminal.
  • The confirmation channel is always the terminal that launched cya (important for the future voice bridge design).
  • The classifier and confirmation logic have dedicated tests that are part of the default test run (no live LLM required).

T04 — llm-connect adapter boundary (the integration seam)

id: CYA-WP-0001-T04
status: done
priority: high
state_hub_task_id: "54b32952-6a92-4ee6-8688-3ca7de026d8a"

Per SCOPE.md and INTENT.md, can-you-assist owns orchestration and the CLI experience; llm-connect owns provider access.

Define a small, stable interface (protocol / abstract base / typed call) in this repository that all model interaction must go through:

  • Something like:
    class LLMAdapter(Protocol):
        def complete(
            self,
            request: AssistanceRequest,   # contains framed intent + packed context + config hints
        ) -> AssistanceResponse:          # contains suggestions, explanation, rationale, risks, raw model output summary
    
  • Provide a deterministic fake / mock implementation used by all unit and safety tests.
  • Provide a thin concrete adapter (or direct import) that will eventually delegate to the real llm-connect package.
  • Configuration surface (which backend, model, temperature, token budget, etc.) must be explicit and delegated to the llm-connect configuration model once the boundary is stable.

Acceptance criteria:

  • There is zero production code path that talks to an LLM (or a mock) bypassing this boundary.
  • The interface is documented with a short "Integration Guide for llm-connect" section or companion note.
  • Switching from the fake to a real (or stubbed) llm-connect client is a small, localized change.

T05 — Thin, explicit phase-memory ports and future hooks

id: CYA-WP-0001-T05
status: done
priority: medium
state_hub_task_id: "134065b5-3421-4353-80b4-9a55b5a2015e"

Prepare the ground for phase-memory without pulling a dependency or inventing hidden state.

Operator direction (2026-05-26): Keep strictly minimal in this slice. Pure explicit ports with no-op implementations and clear "to be replaced by real phase-memory integration" markers. No local JSON placeholder or file-backed store yet.

Define clear ports / extension points for the memory capabilities that INTENT.md says must remain under user control:

  • Remember a preference or workflow pattern.
  • Recall relevant history / preferences for the current cwd + task class.
  • Forget / reset (scoped).
  • Inspect / export current memory for this project or user.

In the MVP these ports can be:

  • Pure no-ops that clearly log "phase-memory not yet connected".
  • Or a trivial local JSON file store under an opt-in, user-visible location (~/.config/cya/ or per-project .cya/memory.json), explicitly labeled as a placeholder.

All memory interactions must be behind these ports. No global singletons, no implicit ~/.cache magic, no opaque vendor memory.

Acceptance criteria:

  • Code review can point to the exact files/functions that will be replaced or implemented by phase-memory integration.
  • The MVP still functions (gracefully) with memory completely disabled.
  • README and help text explain the intended memory story and how users will stay in control.

T06 — Assistance orchestrator and prompt/response handling

id: CYA-WP-0001-T06
status: done
priority: high
state_hub_task_id: "26146d92-e1c9-462c-9721-f50c5c37f5a4"

The piece that turns raw user intent + collected context into a well-formed request to the LLM adapter and then turns the adapter response into terminal output the user can act on.

Responsibilities:

  • Lightweight intent framing (command suggestion vs. explanation vs. summarization vs. plan vs. "help me understand this error").
  • Context packing with rough token awareness (so we don't blindly overflow models).
  • Prompt construction (or structured request) that includes the safety charter, output schema expectations, and the collected context.
  • Post-processing: parse structured output, attach local provenance explanations, produce the final user-facing artifact (suggestion block, rationale, next-step hints).

The orchestrator must be testable in isolation with a fake LLM adapter.

Acceptance criteria:

  • The end-to-end flow (context → orchestrator → fake LLM → rendered output) works for at least the four primary use-case families listed in INTENT.md.
  • Every response the user sees includes a short "why this answer" section that references the context pieces actually used.

T07 — Test strategy, harness, and safety-focused suite

id: CYA-WP-0001-T07
status: done
priority: high
state_hub_task_id: "71bd300a-0049-4e4d-9ea3-c75546a2b5c6"

Choose and bootstrap a test framework appropriate for a console tool (pytest is the obvious default given the Python signal).

Required test categories for the slice:

  • Context collector safety and bounding properties.
  • Risk classifier + confirmation flow (the "never auto-execute destructive actions" invariant must be tested).
  • Orchestrator behavior with deterministic fake LLM responses (including error and refusal paths).
  • CLI surface (help, version, argument parsing, stdin handling).
  • Adapter boundary contract tests (the fake satisfies the protocol; swapping implementations is safe).

Prefer fast, hermetic, no-network tests as the default. Any test that touches a real model or external service must be explicitly opt-in (markers, env vars) and skipped in normal CI.

Acceptance criteria:

  • pytest (or chosen equivalent) runs cleanly from a fresh checkout with pytest or make test.
  • Safety invariants have explicit, readable test names and assertions.
  • Coverage or a manual checklist shows the high-risk paths are exercised.

T08 — Documentation, examples, and handoff to siblings / operator

id: CYA-WP-0001-T08
status: done
priority: medium
state_hub_task_id: "d51801b1-2184-4ec5-91c1-b5b535dcef6a"

Make the MVP usable and the integration points obvious.

  • Heavily update README.md with real invocation examples, architecture overview (text diagram is fine), and "How to integrate with llm-connect / phase-memory" guidance.
  • Add or expand USAGE.md / docs/ with the canonical workflows.
  • In the workplan or a short companion note, explicitly list the extension points and any technical debt discovered during the slice (for later registration in State Hub).
  • Ensure the bootstrap workstream tasks that depend on this (SBOM, extension point registration) have clear handoff notes.

Acceptance criteria:

  • A person who has never seen the repo before can clone it, follow the updated README, install, and successfully run 23 non-trivial cya requests after reading only the README.
  • The sibling project owners can read this workplan + the boundary documentation and know exactly where their packages will plug in.

Dependencies & Cross-Repo Coordination

  • llm-connect: Supplies the real multi-provider client, config chain, token counting, and structured response helpers. We define the consumption contract here.
  • phase-memory: Supplies the actual memory implementation behind the ports we define. We own the call sites and the "user stays in control" contract.
  • State Hub (custodian domain): Owns work tracking, progress, decisions, and the fix-consistency sync that will import this workplan's tasks.
  • No other hard runtime dependencies for the core MVP loop.

Stack & Technology Choices (Decision Required Before active)

This workplan is intentionally left in ready status until the following are explicitly reviewed and recorded:

  • Primary language / runtime (Python is the current leading candidate).
  • CLI framework (typer, click, or stdlib argparse + rich for presentation).
  • Terminal presentation library (rich, textual, or plain stdlib + colors).
  • Configuration format (TOML strongly preferred for alignment with llm-connect patterns).
  • Whether an initial pyproject.toml + src layout or a lighter single-file bootstrap is preferred for T01.

Once the stack review is complete, change the workplan status to active and begin implementation (or spawn follow-on workplans if the slice needs to be split).

  • Parent bootstrap: State Hub workstream repo-integration-can-you-assist (tasks T01T04)
  • This repo: INTENT.md, SCOPE.md, AGENTS.md, README.md, wiki/CyaSpeechModeExtension.md
  • Sibling repos (for patterns and interfaces): llm-connect, phase-memory
  • State Hub consistency command (run after any workplan or task file change): cd ~/state-hub && make fix-consistency REPO=can-you-assist

Extension Points & Technical Debt (T08 note)

Obvious extension points registered during the slice:

  • cya/llm/adapter.py — the LLMAdapter Protocol (primary seam for llm-connect).
  • cya/memory/__init__.py — the four explicit no-op ports (primary seam for phase-memory).
  • cya/safety/risk.py — the _RULES table (easy to extend with new patterns).
  • cya/context/collector.py — the ignore policy and collect_* functions.
  • cya/orchestrator.py — the single handle_request entry point.

Technical debt / follow-on notes (for State Hub registration):

  • No real lint/CI enforcement yet (ruff configured but not wired).
  • No make targets or formal test/lint scripts beyond raw pytest.
  • README mentions USAGE but a dedicated short USAGE.md could be added later.
  • The fake adapter responses are intentionally simplistic; richer canned scenarios can be added in T07 follow-ups.
  • Confirmation flow is terminal-only (correct per spec); future voice bridge will need the same contract.

These items are captured here per T08 acceptance criteria and can be turned into State Hub extension-point or debt records after fix-consistency.


Status note: Created as the direct output of bootstrap Task T02. This file is the authoritative plan. Do not implement the tasks until the status is moved to active after stack review.