73 lines
3.4 KiB
Markdown
73 lines
3.4 KiB
Markdown
# TeleMcp Project
|
|
|
|
*Telemetry for autonomous control*
|
|
|
|
## What is TeleMcp?
|
|
|
|
TeleMcp is **mission control for Kubernetes hosts**. It collects health, performance, and alert signals from a Linux k8s cluster and exposes them through a single **Model Context Protocol (MCP)** interface so intelligent assistants can understand what's happening, triage problems, and help keep systems running smoothly — without constant human supervision.
|
|
|
|
The project name reflects its two halves:
|
|
|
|
- **Tele** — telemetry: metrics, logs, events, and cluster inventory
|
|
- **MCP** — the standardized bridge between observability backends and LLM agents
|
|
|
|
## Who is it for?
|
|
|
|
- **Operators** who want repeatable, one-command observability on a k3s or kubeadm host
|
|
- **LLM agent builders** who need a safe, read-only API for cluster situational awareness
|
|
- **Developers** running local or edge Kubernetes who want agent-assisted monitoring without wiring up bespoke integrations
|
|
|
|
## What problem does it solve?
|
|
|
|
Running a Kubernetes host means tracking signals across many systems. Humans reach for Grafana, `kubectl`, and ad-hoc PromQL. Agents need the same information through a **standardized, safe contract** — not raw shell access or scattered API credentials.
|
|
|
|
TeleMcp solves this in three steps:
|
|
|
|
1. **Collect** — deploy Prometheus, Loki, and supporting exporters via Helm
|
|
2. **Deploy** — bootstrap everything with a single Ansible playbook
|
|
3. **Bridge** — expose resources, tools, and prompts through `mcp-telemetry-bridge`
|
|
|
|
## What can an agent do today?
|
|
|
|
With the current scaffold, an agent connected to the bridge can:
|
|
|
|
- Query Prometheus with `promql.query`
|
|
- Search logs with `loki.query`
|
|
- Inspect Kubernetes objects with `k8s.get` and `k8s.events`
|
|
- Pull a cluster inventory snapshot with `inventory.snapshot`
|
|
- Use pre-built PromQL/LogQL resources for common triage queries
|
|
|
|
## What is planned?
|
|
|
|
Stretch goals — explicitly deferred in v1 — include host-level signals (systemd status, cert expiry, firewall summary), Alertmanager integration, additional prompts (`Capacity-Check`, `CrashLoop-Playbook`), and full MCP protocol transport. See [INTENT.md](../INTENT.md) for the authoritative scope list.
|
|
|
|
## Design principles
|
|
|
|
| Principle | Summary |
|
|
|-----------|---------|
|
|
| Read-only by default | No cluster mutations through the bridge |
|
|
| Standard stack | CNCF/Grafana components, not custom collectors |
|
|
| MCP as the interface | One bridge, one contract for agents |
|
|
| Deployable in one shot | Ansible + Helm, no manual assembly |
|
|
| Least privilege | Scoped RBAC and NetworkPolicy |
|
|
|
|
## Repository map
|
|
|
|
| Path | Contents |
|
|
|------|----------|
|
|
| [INTENT.md](../INTENT.md) | North star — goals, scope, current state |
|
|
| [README.md](../README.md) | Quick start and operational guide |
|
|
| [TeleMcpBlueprint.md](TeleMcpBlueprint.md) | Architecture and component rationale |
|
|
| `ansible/` | Bootstrap playbook |
|
|
| `helm/` | Chart values and bridge chart |
|
|
| `mcp-telemetry-bridge/` | FastAPI bridge source |
|
|
|
|
## Success criteria
|
|
|
|
TeleMcp is working when:
|
|
|
|
1. `ansible-playbook` brings up healthy pods in `monitoring`, `logging`, and `mcp` namespaces
|
|
2. `/mcp/schema` returns resources, tools, and prompts
|
|
3. An agent can query metrics, logs, and cluster state without direct API credentials
|
|
4. Default alert rules fire on induced failures and the agent can triage them
|
|
5. The stack redeploys cleanly on a fresh Ubuntu 24.04 + k3s/kubeadm host |