Updated by fix-consistency on 2026-06-24: - update .custodian-brief.md for tele-mcp
TeleMcp
Mission control for Kubernetes hosts, exposed to LLM agents through MCP.
TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via Ansible + Helm, then surfaces metrics, logs, and cluster state through a read-only MCP bridge so an LLM agent can bootstrap, monitor, triage, and operate the box.
For project goals, scope, and design principles, see INTENT.md.
Components
| Component | Namespace | Role |
|---|---|---|
| kube-prometheus-stack | monitoring |
Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics |
| Loki + Promtail | logging |
Log aggregation and shipping |
| OpenTelemetry Collector | observability |
Optional OTLP fan-out to Prometheus and Loki |
| mcp-telemetry-bridge | mcp |
FastAPI service exposing MCP resources, tools, and prompts |
Local development (no cluster)
Work on the MCP bridge without deploying the full observability stack.
Install and verify
make bridge-install # venv + deps (once)
make bridge-test # pytest smoke: /healthz, /mcp/schema, /mcp/resource
Or from mcp-telemetry-bridge/:
./scripts/verify-local.sh
Run locally
make bridge-run
# or: cd mcp-telemetry-bridge && ./scripts/run-local.sh
With the server up, optional live HTTP checks:
make bridge-smoke
# or: RUN_LIVE=1 ./mcp-telemetry-bridge/scripts/verify-local.sh
Manual curls:
curl http://127.0.0.1:8080/healthz
curl http://127.0.0.1:8080/mcp/schema | jq .
curl "http://127.0.0.1:8080/mcp/resource?uri=res://dashboards/top-pods-by-cpu.promql"
Tool calls use POST /tools/<name> with a JSON body (Prometheus/Loki/K8s backends are only reachable in-cluster).
Agent quickstart
When changing the bridge, agents should:
- Run
make bridge-testafter edits — fast, no cluster needed. - Introspect
GET /mcp/schemafor the current tools, resources, and prompts. - Call tools via
POST /tools/<tool-name>(e.g.POST /tools/promql.querywith{"expr":"up"}). - Fetch saved queries via
GET /mcp/resource?uri=<uri>.
Expected smoke-test surface:
| Endpoint | Method | Purpose |
|---|---|---|
/healthz |
GET | Liveness |
/mcp/schema |
GET | MCP catalog (tools, resources, prompts) |
/mcp/resource |
GET | Saved PromQL/LogQL query by URI |
/tools/* |
POST | Execute a tool (needs in-cluster backends) |
Quick Start (full cluster deploy)
0) Prereqs
- Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and
kubectlcontext configured - Ansible 2.15+ on your control machine
- Helm 3 on the host (Ansible role installs if missing)
1) Run Ansible
cd ansible
ansible-playbook -i inventories/local.ini playbook.yml
2) Smoke tests
From any machine with a kubectl context:
kubectl get pods -n monitoring
kubectl get pods -n logging
kubectl get pods -n mcp
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/mcp/schema | jq .
curl http://localhost:8080/healthz
3) Point your LLM agent
Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).
Implemented tools:
| Tool | Description |
|---|---|
promql.query |
Run a PromQL expression against Prometheus |
loki.query |
Run a LogQL query against Loki |
k8s.get |
Fetch Kubernetes objects (pods, nodes, deployments, etc.) |
k8s.events |
List cluster or namespace events |
inventory.snapshot |
JSON snapshot of nodes, namespaces, and workloads |
Saved resources (via /mcp/resource?uri=...):
res://dashboards/top-pods-by-cpu.promqlres://dashboards/pod-restarts.promqlres://dashboards/warn-events.logql
The bridge currently exposes an HTTP schema approximation (
/mcp/schema,/tools/...). Full MCP transport (stdio/SSE) is planned — see INTENT.md.
Repo layout
tele-mcp/
INTENT.md # Project north star — goals, scope, current state
ansible/ # Bootstrap playbook and roles
helm/
values/ # Chart values for monitoring, logging, OTel
mcp-telemetry-bridge/ # Bridge Helm chart
mcp-telemetry-bridge/ # FastAPI bridge application
scripts/ # run-local.sh, verify-local.sh
tests/ # pytest smoke tests
environments/ # Per-environment overrides
wiki/ # Extended project and design docs
Documentation
| Document | Purpose |
|---|---|
| INTENT.md | Goals, principles, scope, success criteria |
| wiki/TeleMcpProject.md | Project overview and audience |
| wiki/TeleMcpBlueprint.md | Component rationale and bridge design |
| environments/dev/README.md | Dev environment notes |
Security
- MCP bridge ServiceAccount is read-only (
get/list/watchonly) - NetworkPolicy limits bridge egress to Prometheus and Loki
- Consider mTLS or OIDC if exposing the bridge outside the cluster
Current limitations
See INTENT.md — Current State for the full list. Notable gaps:
- Bridge container image is a placeholder (
ghcr.io/example/telemcp-bridge) - No Alertmanager integration in the bridge yet
- Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar