tegwick f03e808c37 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-24:
  - update .custodian-brief.md for tele-mcp
2026-06-24 18:19:28 +02:00
2025-09-07 23:39:44 +02:00
2025-09-07 23:39:44 +02:00
2025-09-07 23:39:44 +02:00
2026-06-22 19:09:24 +02:00
2026-06-22 19:09:24 +02:00

TeleMcp

Mission control for Kubernetes hosts, exposed to LLM agents through MCP.

TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via Ansible + Helm, then surfaces metrics, logs, and cluster state through a read-only MCP bridge so an LLM agent can bootstrap, monitor, triage, and operate the box.

For project goals, scope, and design principles, see INTENT.md.

Components

Component Namespace Role
kube-prometheus-stack monitoring Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics
Loki + Promtail logging Log aggregation and shipping
OpenTelemetry Collector observability Optional OTLP fan-out to Prometheus and Loki
mcp-telemetry-bridge mcp FastAPI service exposing MCP resources, tools, and prompts

Local development (no cluster)

Work on the MCP bridge without deploying the full observability stack.

Install and verify

make bridge-install   # venv + deps (once)
make bridge-test      # pytest smoke: /healthz, /mcp/schema, /mcp/resource

Or from mcp-telemetry-bridge/:

./scripts/verify-local.sh

Run locally

make bridge-run
# or: cd mcp-telemetry-bridge && ./scripts/run-local.sh

With the server up, optional live HTTP checks:

make bridge-smoke
# or: RUN_LIVE=1 ./mcp-telemetry-bridge/scripts/verify-local.sh

Manual curls:

curl http://127.0.0.1:8080/healthz
curl http://127.0.0.1:8080/mcp/schema | jq .
curl "http://127.0.0.1:8080/mcp/resource?uri=res://dashboards/top-pods-by-cpu.promql"

Tool calls use POST /tools/<name> with a JSON body (Prometheus/Loki/K8s backends are only reachable in-cluster).

Agent quickstart

When changing the bridge, agents should:

  1. Run make bridge-test after edits — fast, no cluster needed.
  2. Introspect GET /mcp/schema for the current tools, resources, and prompts.
  3. Call tools via POST /tools/<tool-name> (e.g. POST /tools/promql.query with {"expr":"up"}).
  4. Fetch saved queries via GET /mcp/resource?uri=<uri>.

Expected smoke-test surface:

Endpoint Method Purpose
/healthz GET Liveness
/mcp/schema GET MCP catalog (tools, resources, prompts)
/mcp/resource GET Saved PromQL/LogQL query by URI
/tools/* POST Execute a tool (needs in-cluster backends)

Quick Start (full cluster deploy)

0) Prereqs

  • Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and kubectl context configured
  • Ansible 2.15+ on your control machine
  • Helm 3 on the host (Ansible role installs if missing)

1) Run Ansible

cd ansible
ansible-playbook -i inventories/local.ini playbook.yml

2) Smoke tests

From any machine with a kubectl context:

kubectl get pods -n monitoring
kubectl get pods -n logging
kubectl get pods -n mcp
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/mcp/schema | jq .
curl http://localhost:8080/healthz

3) Point your LLM agent

Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).

Implemented tools:

Tool Description
promql.query Run a PromQL expression against Prometheus
loki.query Run a LogQL query against Loki
k8s.get Fetch Kubernetes objects (pods, nodes, deployments, etc.)
k8s.events List cluster or namespace events
inventory.snapshot JSON snapshot of nodes, namespaces, and workloads

Saved resources (via /mcp/resource?uri=...):

  • res://dashboards/top-pods-by-cpu.promql
  • res://dashboards/pod-restarts.promql
  • res://dashboards/warn-events.logql

The bridge currently exposes an HTTP schema approximation (/mcp/schema, /tools/...). Full MCP transport (stdio/SSE) is planned — see INTENT.md.

Repo layout

tele-mcp/
  INTENT.md                 # Project north star — goals, scope, current state
  ansible/                  # Bootstrap playbook and roles
  helm/
    values/                 # Chart values for monitoring, logging, OTel
    mcp-telemetry-bridge/   # Bridge Helm chart
  mcp-telemetry-bridge/       # FastAPI bridge application
    scripts/                  # run-local.sh, verify-local.sh
    tests/                    # pytest smoke tests
  environments/             # Per-environment overrides
  wiki/                     # Extended project and design docs

Documentation

Document Purpose
INTENT.md Goals, principles, scope, success criteria
wiki/TeleMcpProject.md Project overview and audience
wiki/TeleMcpBlueprint.md Component rationale and bridge design
environments/dev/README.md Dev environment notes

Security

  • MCP bridge ServiceAccount is read-only (get / list / watch only)
  • NetworkPolicy limits bridge egress to Prometheus and Loki
  • Consider mTLS or OIDC if exposing the bridge outside the cluster

Current limitations

See INTENT.md — Current State for the full list. Notable gaps:

  • Bridge container image is a placeholder (ghcr.io/example/telemcp-bridge)
  • No Alertmanager integration in the bridge yet
  • Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar
Description
Telemetry for autonomous control
https://coulomb.social/open/TeleMcpBlueprint
Readme 1.9 MiB
Languages
Python 76.2%
Shell 16.5%
Makefile 4.5%
Dockerfile 2.8%