Files
tele-mcp/wiki/TeleMcpProject.md
2026-06-22 19:09:24 +02:00

3.4 KiB

TeleMcp Project

Telemetry for autonomous control

What is TeleMcp?

TeleMcp is mission control for Kubernetes hosts. It collects health, performance, and alert signals from a Linux k8s cluster and exposes them through a single Model Context Protocol (MCP) interface so intelligent assistants can understand what's happening, triage problems, and help keep systems running smoothly — without constant human supervision.

The project name reflects its two halves:

  • Tele — telemetry: metrics, logs, events, and cluster inventory
  • MCP — the standardized bridge between observability backends and LLM agents

Who is it for?

  • Operators who want repeatable, one-command observability on a k3s or kubeadm host
  • LLM agent builders who need a safe, read-only API for cluster situational awareness
  • Developers running local or edge Kubernetes who want agent-assisted monitoring without wiring up bespoke integrations

What problem does it solve?

Running a Kubernetes host means tracking signals across many systems. Humans reach for Grafana, kubectl, and ad-hoc PromQL. Agents need the same information through a standardized, safe contract — not raw shell access or scattered API credentials.

TeleMcp solves this in three steps:

  1. Collect — deploy Prometheus, Loki, and supporting exporters via Helm
  2. Deploy — bootstrap everything with a single Ansible playbook
  3. Bridge — expose resources, tools, and prompts through mcp-telemetry-bridge

What can an agent do today?

With the current scaffold, an agent connected to the bridge can:

  • Query Prometheus with promql.query
  • Search logs with loki.query
  • Inspect Kubernetes objects with k8s.get and k8s.events
  • Pull a cluster inventory snapshot with inventory.snapshot
  • Use pre-built PromQL/LogQL resources for common triage queries

What is planned?

Stretch goals — explicitly deferred in v1 — include host-level signals (systemd status, cert expiry, firewall summary), Alertmanager integration, additional prompts (Capacity-Check, CrashLoop-Playbook), and full MCP protocol transport. See INTENT.md for the authoritative scope list.

Design principles

Principle Summary
Read-only by default No cluster mutations through the bridge
Standard stack CNCF/Grafana components, not custom collectors
MCP as the interface One bridge, one contract for agents
Deployable in one shot Ansible + Helm, no manual assembly
Least privilege Scoped RBAC and NetworkPolicy

Repository map

Path Contents
INTENT.md North star — goals, scope, current state
README.md Quick start and operational guide
TeleMcpBlueprint.md Architecture and component rationale
ansible/ Bootstrap playbook
helm/ Chart values and bridge chart
mcp-telemetry-bridge/ FastAPI bridge source

Success criteria

TeleMcp is working when:

  1. ansible-playbook brings up healthy pods in monitoring, logging, and mcp namespaces
  2. /mcp/schema returns resources, tools, and prompts
  3. An agent can query metrics, logs, and cluster state without direct API credentials
  4. Default alert rules fire on induced failures and the agent can triage them
  5. The stack redeploys cleanly on a fresh Ubuntu 24.04 + k3s/kubeadm host