Files
tele-mcp/README.md
2026-06-22 19:09:24 +02:00

103 lines
3.7 KiB
Markdown

# TeleMcp
**Mission control for Kubernetes hosts, exposed to LLM agents through MCP.**
TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via **Ansible + Helm**, then surfaces metrics, logs, and cluster state through a read-only **MCP bridge** so an LLM agent can bootstrap, monitor, triage, and operate the box.
> For project goals, scope, and design principles, see **[INTENT.md](INTENT.md)**.
## Components
| Component | Namespace | Role |
|-----------|-----------|------|
| **kube-prometheus-stack** | `monitoring` | Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics |
| **Loki + Promtail** | `logging` | Log aggregation and shipping |
| **OpenTelemetry Collector** | `observability` | Optional OTLP fan-out to Prometheus and Loki |
| **mcp-telemetry-bridge** | `mcp` | FastAPI service exposing MCP resources, tools, and prompts |
## Quick Start
### 0) Prereqs
- Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and `kubectl` context configured
- Ansible 2.15+ on your control machine
- Helm 3 on the host (Ansible role installs if missing)
### 1) Run Ansible
```bash
cd ansible
ansible-playbook -i inventories/local.ini playbook.yml
```
### 2) Smoke tests
From any machine with a `kubectl` context:
```bash
kubectl get pods -n monitoring
kubectl get pods -n logging
kubectl get pods -n mcp
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/mcp/schema | jq .
curl http://localhost:8080/healthz
```
### 3) Point your LLM agent
Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).
**Implemented tools:**
| Tool | Description |
|------|-------------|
| `promql.query` | Run a PromQL expression against Prometheus |
| `loki.query` | Run a LogQL query against Loki |
| `k8s.get` | Fetch Kubernetes objects (pods, nodes, deployments, etc.) |
| `k8s.events` | List cluster or namespace events |
| `inventory.snapshot` | JSON snapshot of nodes, namespaces, and workloads |
**Saved resources** (via `/mcp/resource?uri=...`):
- `res://dashboards/top-pods-by-cpu.promql`
- `res://dashboards/pod-restarts.promql`
- `res://dashboards/warn-events.logql`
> The bridge currently exposes an HTTP schema approximation (`/mcp/schema`, `/tools/...`). Full MCP transport (stdio/SSE) is planned — see [INTENT.md](INTENT.md).
## Repo layout
```
tele-mcp/
INTENT.md # Project north star — goals, scope, current state
ansible/ # Bootstrap playbook and roles
helm/
values/ # Chart values for monitoring, logging, OTel
mcp-telemetry-bridge/ # Bridge Helm chart
mcp-telemetry-bridge/ # FastAPI bridge application
environments/ # Per-environment overrides
wiki/ # Extended project and design docs
```
## Documentation
| Document | Purpose |
|----------|---------|
| [INTENT.md](INTENT.md) | Goals, principles, scope, success criteria |
| [wiki/TeleMcpProject.md](wiki/TeleMcpProject.md) | Project overview and audience |
| [wiki/TeleMcpBlueprint.md](wiki/TeleMcpBlueprint.md) | Component rationale and bridge design |
| [environments/dev/README.md](environments/dev/README.md) | Dev environment notes |
## Security
- MCP bridge ServiceAccount is read-only (`get` / `list` / `watch` only)
- NetworkPolicy limits bridge egress to Prometheus and Loki
- Consider mTLS or OIDC if exposing the bridge outside the cluster
## Current limitations
See [INTENT.md — Current State](INTENT.md#current-state-as-of-initial-scaffold) for the full list. Notable gaps:
- Bridge container image is a placeholder (`ghcr.io/example/telemcp-bridge`)
- No Alertmanager integration in the bridge yet
- Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar