generated from coulomb/repo-seed
1019 lines
19 KiB
Markdown
1019 lines
19 KiB
Markdown
# CYA Speech Mode Extension
|
||
|
||
## Working Name
|
||
|
||
**cya speech mode**
|
||
|
||
Possible short names:
|
||
|
||
- `cya talk`
|
||
- `cya voice`
|
||
- `cya listen`
|
||
- `cya pair`
|
||
- `cya phone`
|
||
|
||
---
|
||
|
||
## Core Idea
|
||
|
||
`cya` should support a speech interaction mode where the user can use a phone as the microphone, speech-recognition device, and conversational front-end for a `cya` helper session running on a terminal.
|
||
|
||
This allows users to speak to `cya` even when the active terminal environment has no usable microphone, no audio stack, no GUI, or no convenient speech-recognition capability.
|
||
|
||
The terminal remains the operational context.
|
||
|
||
The phone becomes the voice interface.
|
||
|
||
The LLM interaction remains bound to the currently activated `cya` session.
|
||
|
||
---
|
||
|
||
## User Scenario
|
||
|
||
A user is working in a terminal on a server, VM, SSH session, minimal Linux installation, or development machine.
|
||
|
||
They want to ask:
|
||
|
||
> “What does this error mean?”
|
||
> “Generate a command to find all Python files importing requests.”
|
||
> “Explain the current git status.”
|
||
> “Create a commit message for these changes.”
|
||
> “Summarize the files in this folder.”
|
||
|
||
Instead of typing the request, they activate a `cya` helper session in the terminal and connect the phone app.
|
||
|
||
The phone enters conversation mode and uses its microphone and speech recognition.
|
||
|
||
The recognized text is sent to the active `cya` session.
|
||
|
||
The terminal-side `cya` helper answers in the terminal, and optionally also sends a spoken or textual response back to the phone.
|
||
|
||
---
|
||
|
||
## Basic Interaction Flow
|
||
|
||
### 1. User activates CYA in terminal
|
||
|
||
Example:
|
||
|
||
```bash
|
||
cya talk
|
||
````
|
||
|
||
or:
|
||
|
||
```bash
|
||
cya --voice
|
||
```
|
||
|
||
The CLI starts a local or remote helper session and displays a pairing instruction.
|
||
|
||
Example:
|
||
|
||
```text
|
||
CYA voice session started.
|
||
|
||
Pair with phone:
|
||
Open the CYA mobile app and scan this QR code.
|
||
|
||
Session:
|
||
repo: /home/bernd/projects/example
|
||
mode: voice bridge
|
||
expires: 5 minutes
|
||
```
|
||
|
||
The terminal may show a QR code containing a short-lived pairing token.
|
||
|
||
---
|
||
|
||
### 2. User opens the CYA phone app
|
||
|
||
The app scans the QR code or enters a pairing code.
|
||
|
||
The phone is now connected to the active `cya` session.
|
||
|
||
The app shows:
|
||
|
||
```text
|
||
Connected to:
|
||
example repo on tinker-base
|
||
|
||
Listening...
|
||
```
|
||
|
||
---
|
||
|
||
### 3. User speaks into the phone
|
||
|
||
The phone captures audio and performs speech recognition.
|
||
|
||
The recognized text is sent to the active terminal-side `cya` helper.
|
||
|
||
Example recognized utterance:
|
||
|
||
```text
|
||
Please explain the current git status and suggest a commit message.
|
||
```
|
||
|
||
---
|
||
|
||
### 4. Terminal-side CYA processes the request
|
||
|
||
The terminal-side `cya` helper has access to the local context:
|
||
|
||
* current working directory;
|
||
* selected files;
|
||
* git status;
|
||
* shell environment;
|
||
* project memory;
|
||
* user preferences;
|
||
* configured LLM backend through `llm-connect`;
|
||
* memory through `phase-memory`.
|
||
|
||
The phone does **not** need direct access to the filesystem.
|
||
|
||
That is important.
|
||
|
||
The phone provides speech input.
|
||
|
||
The CLI session owns the local operational context.
|
||
|
||
---
|
||
|
||
### 5. Response is returned
|
||
|
||
The response appears in the terminal.
|
||
|
||
Optionally, a concise response is also sent back to the phone.
|
||
|
||
Example terminal response:
|
||
|
||
```text
|
||
Current git status summary:
|
||
|
||
- 3 modified files
|
||
- 1 new file
|
||
- no staged changes
|
||
|
||
Suggested commit message:
|
||
|
||
feat(cli): add initial voice session pairing flow
|
||
|
||
Suggested next command:
|
||
|
||
git add src/voice-session.ts README.md
|
||
git commit -m "feat(cli): add initial voice session pairing flow"
|
||
```
|
||
|
||
The phone may show:
|
||
|
||
```text
|
||
I found three modified files and one new file. Suggested commit message:
|
||
feat(cli): add initial voice session pairing flow
|
||
```
|
||
|
||
Optionally, the phone can read that aloud.
|
||
|
||
---
|
||
|
||
## Key Design Principle
|
||
|
||
The phone should be a **voice bridge**, not the primary authority.
|
||
|
||
The terminal session remains the source of truth for:
|
||
|
||
* current working directory;
|
||
* filesystem access;
|
||
* repository context;
|
||
* execution permissions;
|
||
* local configuration;
|
||
* project memory;
|
||
* safety confirmations.
|
||
|
||
The phone handles:
|
||
|
||
* microphone input;
|
||
* speech recognition;
|
||
* maybe text-to-speech;
|
||
* session pairing;
|
||
* conversational convenience.
|
||
|
||
This avoids turning the phone app into a remote filesystem client or full IDE.
|
||
|
||
---
|
||
|
||
## Conceptual Architecture
|
||
|
||
```text
|
||
+------------------+ +----------------------+ +------------------+
|
||
| | speech | | text | |
|
||
| Phone App +----------> CYA Session Broker +----------> CYA CLI Session |
|
||
| | text | | result | |
|
||
+--------+---------+<----------+----------+-----------+<----------+--------+---------+
|
||
| | |
|
||
| | |
|
||
v v v
|
||
Microphone / STT Pairing / Routing Local Context
|
||
Text-to-Speech Session Registry Filesystem
|
||
Conversation UI Authentication Git / Shell
|
||
llm-connect
|
||
phase-memory
|
||
```
|
||
|
||
---
|
||
|
||
## Main Components
|
||
|
||
### 1. CYA CLI Voice Session
|
||
|
||
The CLI needs a mode that creates an active voice-addressable session.
|
||
|
||
Responsibilities:
|
||
|
||
* create session ID;
|
||
* generate short-lived pairing token;
|
||
* register itself with a broker;
|
||
* expose current terminal context;
|
||
* receive transcribed user messages;
|
||
* process messages through normal `cya` assistance flow;
|
||
* return responses;
|
||
* enforce safety and confirmation rules.
|
||
|
||
Possible command:
|
||
|
||
```bash
|
||
cya talk
|
||
```
|
||
|
||
or:
|
||
|
||
```bash
|
||
cya session start --voice
|
||
```
|
||
|
||
---
|
||
|
||
### 2. CYA Phone App
|
||
|
||
The phone app provides the user-facing speech interface.
|
||
|
||
Responsibilities:
|
||
|
||
* scan QR code or enter pairing code;
|
||
* establish secure connection to active session;
|
||
* capture microphone input;
|
||
* run speech recognition;
|
||
* show transcript before sending, depending on mode;
|
||
* send text requests to the active `cya` session;
|
||
* display responses;
|
||
* optionally read responses aloud;
|
||
* allow pause, mute, reconnect, and disconnect.
|
||
|
||
The app does not need to know the details of the user’s filesystem.
|
||
|
||
---
|
||
|
||
### 3. Session Broker
|
||
|
||
A broker is needed to connect the phone to the CLI session.
|
||
|
||
This could be implemented in different ways.
|
||
|
||
#### Option A: Local Network Broker
|
||
|
||
The CLI starts a small local WebSocket server.
|
||
|
||
The phone connects directly over the LAN.
|
||
|
||
Good for:
|
||
|
||
* home networks;
|
||
* office networks;
|
||
* local development;
|
||
* no cloud dependency.
|
||
|
||
Challenges:
|
||
|
||
* NAT/firewall issues;
|
||
* phone and terminal must be on same network;
|
||
* HTTPS/TLS handling;
|
||
* service discovery.
|
||
|
||
Example:
|
||
|
||
```bash
|
||
cya talk --listen 192.168.1.42:47391
|
||
```
|
||
|
||
QR code contains:
|
||
|
||
```text
|
||
cya://pair?host=192.168.1.42&port=47391&token=...
|
||
```
|
||
|
||
---
|
||
|
||
#### Option B: Relay Broker
|
||
|
||
A small relay service connects phone and CLI.
|
||
|
||
Good for:
|
||
|
||
* SSH sessions;
|
||
* cloud servers;
|
||
* NAT traversal;
|
||
* mobile networks;
|
||
* remote development.
|
||
|
||
The relay does not need to see filesystem data if messages are end-to-end encrypted or if the relay only routes session traffic.
|
||
|
||
QR code contains:
|
||
|
||
```text
|
||
cya://pair?relay=https://relay.cya.example&session=...&token=...
|
||
```
|
||
|
||
Challenges:
|
||
|
||
* requires hosted infrastructure;
|
||
* introduces trust and privacy questions;
|
||
* should be optional, not mandatory.
|
||
|
||
---
|
||
|
||
#### Option C: User-Controlled Broker
|
||
|
||
Advanced users can self-host the broker.
|
||
|
||
This fits the philosophy of `cya`.
|
||
|
||
Possible command:
|
||
|
||
```bash
|
||
cya-broker serve
|
||
```
|
||
|
||
Then:
|
||
|
||
```bash
|
||
cya talk --broker https://my-broker.example
|
||
```
|
||
|
||
This preserves the user-controlled infrastructure principle.
|
||
|
||
---
|
||
|
||
## Recommended Implementation Path
|
||
|
||
A practical implementation path could have four stages.
|
||
|
||
---
|
||
|
||
## Stage 1: Text Bridge Prototype
|
||
|
||
Do not start with speech.
|
||
|
||
Start with a phone-to-terminal text bridge.
|
||
|
||
Goal:
|
||
|
||
* CLI starts a session.
|
||
* Phone connects.
|
||
* User types text on phone.
|
||
* Text appears in the active `cya` session.
|
||
* CLI processes request and returns response.
|
||
|
||
This proves:
|
||
|
||
* pairing;
|
||
* routing;
|
||
* session ownership;
|
||
* broker model;
|
||
* authentication;
|
||
* terminal-side context handling.
|
||
|
||
Example:
|
||
|
||
```bash
|
||
cya talk
|
||
```
|
||
|
||
Phone app:
|
||
|
||
```text
|
||
Connected. Type your request.
|
||
```
|
||
|
||
This avoids early complexity around speech APIs.
|
||
|
||
---
|
||
|
||
## Stage 2: Speech-to-Text on Phone
|
||
|
||
Add microphone and speech recognition.
|
||
|
||
The phone converts speech to text locally or through platform speech recognition.
|
||
|
||
Options:
|
||
|
||
* iOS speech recognition;
|
||
* Android speech recognition;
|
||
* browser-based Web Speech API for a PWA;
|
||
* local speech model on device where feasible;
|
||
* cloud speech-to-text if user permits.
|
||
|
||
The important architectural point:
|
||
|
||
> `cya` receives text, not raw audio, unless explicitly configured otherwise.
|
||
|
||
This keeps the CLI helper simple and avoids requiring audio handling on the terminal machine.
|
||
|
||
---
|
||
|
||
## Stage 3: Conversation Mode
|
||
|
||
Add continuous or semi-continuous voice interaction.
|
||
|
||
Modes:
|
||
|
||
### Push-to-Talk Mode
|
||
|
||
Safest and easiest.
|
||
|
||
User presses and holds a button, speaks, releases, and sends.
|
||
|
||
Good default.
|
||
|
||
### Confirm-Before-Send Mode
|
||
|
||
The app shows recognized text first.
|
||
|
||
User taps send.
|
||
|
||
Useful when commands may be risky.
|
||
|
||
### Continuous Conversation Mode
|
||
|
||
The app listens continuously and sends utterances automatically.
|
||
|
||
Useful but riskier.
|
||
|
||
Should probably be opt-in.
|
||
|
||
---
|
||
|
||
## Stage 4: Bidirectional Voice
|
||
|
||
Add optional text-to-speech responses.
|
||
|
||
The phone can speak concise answers aloud.
|
||
|
||
This should be configurable:
|
||
|
||
```bash
|
||
cya talk --phone-speak concise
|
||
cya talk --phone-speak off
|
||
cya talk --phone-speak full
|
||
```
|
||
|
||
The terminal should still show the full response.
|
||
|
||
The phone should preferably receive a summarized voice response to avoid reading long command explanations aloud.
|
||
|
||
---
|
||
|
||
## Safety Model
|
||
|
||
Speech mode needs stricter safety defaults than typed CLI use.
|
||
|
||
Speech recognition can mishear commands.
|
||
|
||
Therefore:
|
||
|
||
### Never execute destructive commands directly from speech
|
||
|
||
For example, if the user says:
|
||
|
||
> Delete all generated files.
|
||
|
||
`cya` should produce a preview and require explicit terminal confirmation.
|
||
|
||
### Require confirmation for execution
|
||
|
||
Possible pattern:
|
||
|
||
```text
|
||
Suggested command:
|
||
|
||
rm -rf build/
|
||
|
||
This may delete files.
|
||
Run it? [y/N]
|
||
```
|
||
|
||
The confirmation should happen in the terminal by default, not only on the phone.
|
||
|
||
### Separate “ask”, “suggest”, and “run”
|
||
|
||
Useful command modes:
|
||
|
||
```bash
|
||
cya talk --suggest-only
|
||
cya talk --allow-run
|
||
cya talk --no-exec
|
||
```
|
||
|
||
Default should be:
|
||
|
||
```bash
|
||
cya talk --suggest-only
|
||
```
|
||
|
||
### Show recognized transcript
|
||
|
||
The phone should always make clear what it heard.
|
||
|
||
For risky requests, it should require confirmation before sending.
|
||
|
||
---
|
||
|
||
## Privacy Model
|
||
|
||
Speech mode touches sensitive areas:
|
||
|
||
* spoken input;
|
||
* local filesystem context;
|
||
* repository contents;
|
||
* personal memory;
|
||
* command history;
|
||
* notes.
|
||
|
||
So the design should make privacy visible and controllable.
|
||
|
||
The user should be able to see:
|
||
|
||
* which phone is connected;
|
||
* which terminal session it is connected to;
|
||
* what context is being sent to the LLM;
|
||
* which LLM backend is used;
|
||
* what is stored in memory;
|
||
* whether speech recognition is local or cloud-based;
|
||
* whether a relay broker is involved.
|
||
|
||
Example session banner:
|
||
|
||
```text
|
||
CYA voice session active
|
||
|
||
Phone:
|
||
Bernd's iPhone
|
||
|
||
Speech recognition:
|
||
on-device if available, platform fallback allowed
|
||
|
||
LLM backend:
|
||
llm-connect: local-openrouter-default
|
||
|
||
Memory:
|
||
phase-memory: user + project preferences
|
||
|
||
Broker:
|
||
local websocket
|
||
|
||
Execution:
|
||
suggest-only
|
||
```
|
||
|
||
---
|
||
|
||
## Pairing and Authentication
|
||
|
||
Pairing should be short-lived and explicit.
|
||
|
||
Possible pairing methods:
|
||
|
||
### QR Code
|
||
|
||
Best default.
|
||
|
||
```bash
|
||
cya talk
|
||
```
|
||
|
||
Terminal displays QR code.
|
||
|
||
Phone scans it.
|
||
|
||
### Numeric Code
|
||
|
||
Fallback for terminals that cannot show QR codes.
|
||
|
||
```text
|
||
Pairing code: 482-119
|
||
Expires in 5 minutes.
|
||
```
|
||
|
||
### Known Device Trust
|
||
|
||
After initial pairing, user may mark a phone as trusted.
|
||
|
||
Even then, each terminal voice session should be explicitly activated.
|
||
|
||
Trusted device should not mean always-on access.
|
||
|
||
---
|
||
|
||
## Session Ownership
|
||
|
||
Each voice interaction should be bound to a specific active `cya` session.
|
||
|
||
That session has:
|
||
|
||
* session ID;
|
||
* current working directory;
|
||
* user identity;
|
||
* host identity;
|
||
* start time;
|
||
* allowed capabilities;
|
||
* selected LLM backend;
|
||
* selected memory scope;
|
||
* execution policy.
|
||
|
||
This prevents accidental cross-talk where the phone sends a request to the wrong terminal or wrong repository.
|
||
|
||
Useful phone display:
|
||
|
||
```text
|
||
Connected to:
|
||
tinker-base
|
||
/home/bernd/repos/can-you-assist
|
||
|
||
Mode:
|
||
suggest-only
|
||
```
|
||
|
||
---
|
||
|
||
## Possible CLI Commands
|
||
|
||
### Start voice session
|
||
|
||
```bash
|
||
cya talk
|
||
```
|
||
|
||
### Start voice session for current repository
|
||
|
||
```bash
|
||
cya talk --repo
|
||
```
|
||
|
||
### Start with local broker
|
||
|
||
```bash
|
||
cya talk --broker local
|
||
```
|
||
|
||
### Start with relay broker
|
||
|
||
```bash
|
||
cya talk --broker relay
|
||
```
|
||
|
||
### Start with strict safety
|
||
|
||
```bash
|
||
cya talk --suggest-only
|
||
```
|
||
|
||
### Start with no memory
|
||
|
||
```bash
|
||
cya talk --no-memory
|
||
```
|
||
|
||
### Start with project memory
|
||
|
||
```bash
|
||
cya talk --memory project
|
||
```
|
||
|
||
### List active sessions
|
||
|
||
```bash
|
||
cya sessions
|
||
```
|
||
|
||
### End current session
|
||
|
||
```bash
|
||
cya session stop
|
||
```
|
||
|
||
---
|
||
|
||
## Example User Experience
|
||
|
||
Terminal:
|
||
|
||
```bash
|
||
cd ~/repos/can-you-assist
|
||
cya talk
|
||
```
|
||
|
||
Terminal output:
|
||
|
||
```text
|
||
CYA voice session started for:
|
||
|
||
~/repos/can-you-assist
|
||
|
||
Mode:
|
||
suggest-only
|
||
|
||
Pair your phone:
|
||
scan QR code or enter code 913-442
|
||
|
||
Waiting for phone...
|
||
```
|
||
|
||
Phone:
|
||
|
||
```text
|
||
Connected to can-you-assist on tinker-base.
|
||
|
||
Tap and speak.
|
||
```
|
||
|
||
User speaks:
|
||
|
||
> What would be a good initial module structure for this project?
|
||
|
||
Terminal:
|
||
|
||
```text
|
||
You asked:
|
||
"What would be a good initial module structure for this project?"
|
||
|
||
Suggested initial structure:
|
||
|
||
src/
|
||
cli/
|
||
main.ts
|
||
commands/
|
||
session/
|
||
voice-session.ts
|
||
pairing.ts
|
||
context/
|
||
collector.ts
|
||
git.ts
|
||
filesystem.ts
|
||
llm/
|
||
llm-connect-client.ts
|
||
memory/
|
||
phase-memory-client.ts
|
||
safety/
|
||
classifier.ts
|
||
confirmation.ts
|
||
|
||
Rationale:
|
||
...
|
||
```
|
||
|
||
Phone:
|
||
|
||
```text
|
||
I suggested a module structure with CLI, session, context, LLM, memory, and safety layers.
|
||
```
|
||
|
||
---
|
||
|
||
## Integration With `llm-connect`
|
||
|
||
Speech mode should not bypass `llm-connect`.
|
||
|
||
The flow should be:
|
||
|
||
```text
|
||
Phone speech
|
||
→ transcript
|
||
→ cya CLI session
|
||
→ llm-connect
|
||
→ selected LLM backend
|
||
→ cya response
|
||
→ terminal and/or phone
|
||
```
|
||
|
||
This keeps backend selection consistent with normal `cya` behavior.
|
||
|
||
The phone app should not directly call the LLM unless specifically configured for a future lightweight mode.
|
||
|
||
---
|
||
|
||
## Integration With `phase-memory`
|
||
|
||
Speech mode should use `phase-memory` for:
|
||
|
||
* preferred speech interaction style;
|
||
* trusted phones;
|
||
* known devices;
|
||
* user voice mode preferences;
|
||
* project-specific command conventions;
|
||
* common workflows;
|
||
* interaction summaries.
|
||
|
||
Examples of remembered preferences:
|
||
|
||
```text
|
||
User prefers speech responses to be concise.
|
||
User wants destructive commands previewed but never executed automatically.
|
||
User usually works with git commit messages in conventional commit style.
|
||
User prefers explanations before suggested shell commands.
|
||
```
|
||
|
||
Memory should remain inspectable and editable.
|
||
|
||
Possible commands:
|
||
|
||
```bash
|
||
cya memory show
|
||
cya memory edit
|
||
cya memory forget voice.devices
|
||
```
|
||
|
||
---
|
||
|
||
## Minimal Technical Design
|
||
|
||
A minimal first implementation could use:
|
||
|
||
### CLI Side
|
||
|
||
* `cya talk` starts a WebSocket server or connects to a broker.
|
||
* It creates a session token.
|
||
* It renders a QR code.
|
||
* It listens for incoming text messages.
|
||
* It routes messages through the normal `cya` assistant pipeline.
|
||
|
||
### Phone Side
|
||
|
||
A Progressive Web App may be enough initially.
|
||
|
||
The PWA can:
|
||
|
||
* scan QR codes;
|
||
* use browser speech recognition where available;
|
||
* send text over WebSocket;
|
||
* show responses.
|
||
|
||
This avoids building native iOS and Android apps immediately.
|
||
|
||
Later, native apps can provide better:
|
||
|
||
* speech recognition;
|
||
* background handling;
|
||
* push-to-talk UX;
|
||
* device trust;
|
||
* text-to-speech;
|
||
* secure local storage.
|
||
|
||
### Broker Side
|
||
|
||
For the first version, choose one:
|
||
|
||
#### Local-only prototype
|
||
|
||
Simpler, private, no cloud.
|
||
|
||
Good for proof of concept.
|
||
|
||
#### Minimal relay prototype
|
||
|
||
More useful for SSH and remote development.
|
||
|
||
Better real-world fit.
|
||
|
||
A good architectural compromise:
|
||
|
||
* implement a broker interface;
|
||
* start with local broker;
|
||
* allow relay broker later;
|
||
* keep protocol stable.
|
||
|
||
---
|
||
|
||
## Protocol Sketch
|
||
|
||
Phone sends:
|
||
|
||
```json
|
||
{
|
||
"type": "user_message",
|
||
"session_id": "cya_sess_123",
|
||
"message_id": "msg_001",
|
||
"input_mode": "speech",
|
||
"transcript": "Explain the current git status and suggest a commit message.",
|
||
"confidence": 0.91
|
||
}
|
||
```
|
||
|
||
CLI responds:
|
||
|
||
```json
|
||
{
|
||
"type": "assistant_response",
|
||
"session_id": "cya_sess_123",
|
||
"message_id": "msg_001",
|
||
"terminal_response": "...full response...",
|
||
"phone_response": "I found three modified files and one new file. Suggested commit message: ...",
|
||
"requires_confirmation": false
|
||
}
|
||
```
|
||
|
||
For risky actions:
|
||
|
||
```json
|
||
{
|
||
"type": "assistant_response",
|
||
"session_id": "cya_sess_123",
|
||
"message_id": "msg_002",
|
||
"terminal_response": "Suggested command: rm -rf build/",
|
||
"phone_response": "This may delete files. Please confirm in the terminal.",
|
||
"requires_confirmation": true,
|
||
"confirmation_channel": "terminal"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Important Product Distinction
|
||
|
||
This extension should not be framed as:
|
||
|
||
> “CYA becomes a mobile assistant.”
|
||
|
||
It should be framed as:
|
||
|
||
> “The phone becomes a microphone and conversation surface for the active terminal helper.”
|
||
|
||
That distinction protects the project scope.
|
||
|
||
The center remains the console.
|
||
|
||
---
|
||
|
||
## Updated INTENT Addition
|
||
|
||
This could be added to the `INTENT.md` under long-term direction or primary use cases:
|
||
|
||
```markdown
|
||
### Speech-Assisted Console Interaction
|
||
|
||
`cya` should eventually support a speech interaction mode where a phone or other capable device can act as a microphone, speech-recognition frontend, and lightweight conversation surface for an active `cya` CLI session.
|
||
|
||
This enables voice interaction even when the terminal environment itself has no microphone, audio stack, graphical interface, or speech-recognition capability.
|
||
|
||
In this mode, the phone does not become the primary execution environment. Instead, it connects to a currently activated `cya` helper session. The CLI session remains responsible for local filesystem context, repository context, memory scope, LLM backend selection, and safety confirmation.
|
||
|
||
The phone provides convenient speech input and optional spoken output, while `cya` preserves its console-native, user-controlled architecture.
|
||
```
|
||
|
||
---
|
||
|
||
## My Recommended Direction
|
||
|
||
I would treat this as a distinct but natural extension:
|
||
|
||
```text
|
||
cya-core
|
||
console assistant
|
||
|
||
cya-voice
|
||
speech bridge protocol and session mode
|
||
|
||
cya-mobile
|
||
phone app or PWA
|
||
|
||
cya-broker
|
||
optional pairing and relay service
|
||
```
|
||
|
||
The first implementation should probably be:
|
||
|
||
```text
|
||
cya talk
|
||
+ local WebSocket session
|
||
+ QR pairing
|
||
+ phone PWA
|
||
+ push-to-talk speech recognition
|
||
+ suggest-only mode
|
||
```
|
||
|
||
That gives you the core magic without overbuilding the system.
|
||
|
||
The deeper architectural insight is:
|
||
|
||
> Speech mode should not make the terminal speak.
|
||
> It should let a speech-capable companion device speak *to the terminal’s active assistant context*.
|
||
|