tegwick f061364951 Complete State Hub bootstrap workplans (WP-0001)
- Review integration files; fill SCOPE where templated
- Document dev workflow in stack-and-commands.md
- Seed WP-0002 implementation workplan; mark bootstrap finished
- Hub sync via fix-consistency
2026-06-22 23:35:20 +02:00
2025-09-07 23:39:44 +02:00
2025-09-07 23:39:44 +02:00
2025-09-07 23:39:44 +02:00
2026-06-22 19:09:24 +02:00
2026-06-22 19:09:24 +02:00
2026-06-22 19:09:24 +02:00

TeleMcp

Mission control for Kubernetes hosts, exposed to LLM agents through MCP.

TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via Ansible + Helm, then surfaces metrics, logs, and cluster state through a read-only MCP bridge so an LLM agent can bootstrap, monitor, triage, and operate the box.

For project goals, scope, and design principles, see INTENT.md.

Components

Component Namespace Role
kube-prometheus-stack monitoring Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics
Loki + Promtail logging Log aggregation and shipping
OpenTelemetry Collector observability Optional OTLP fan-out to Prometheus and Loki
mcp-telemetry-bridge mcp FastAPI service exposing MCP resources, tools, and prompts

Quick Start

0) Prereqs

  • Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and kubectl context configured
  • Ansible 2.15+ on your control machine
  • Helm 3 on the host (Ansible role installs if missing)

1) Run Ansible

cd ansible
ansible-playbook -i inventories/local.ini playbook.yml

2) Smoke tests

From any machine with a kubectl context:

kubectl get pods -n monitoring
kubectl get pods -n logging
kubectl get pods -n mcp
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/mcp/schema | jq .
curl http://localhost:8080/healthz

3) Point your LLM agent

Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).

Implemented tools:

Tool Description
promql.query Run a PromQL expression against Prometheus
loki.query Run a LogQL query against Loki
k8s.get Fetch Kubernetes objects (pods, nodes, deployments, etc.)
k8s.events List cluster or namespace events
inventory.snapshot JSON snapshot of nodes, namespaces, and workloads

Saved resources (via /mcp/resource?uri=...):

  • res://dashboards/top-pods-by-cpu.promql
  • res://dashboards/pod-restarts.promql
  • res://dashboards/warn-events.logql

The bridge currently exposes an HTTP schema approximation (/mcp/schema, /tools/...). Full MCP transport (stdio/SSE) is planned — see INTENT.md.

Repo layout

tele-mcp/
  INTENT.md                 # Project north star — goals, scope, current state
  ansible/                  # Bootstrap playbook and roles
  helm/
    values/                 # Chart values for monitoring, logging, OTel
    mcp-telemetry-bridge/   # Bridge Helm chart
  mcp-telemetry-bridge/       # FastAPI bridge application
  environments/             # Per-environment overrides
  wiki/                     # Extended project and design docs

Documentation

Document Purpose
INTENT.md Goals, principles, scope, success criteria
wiki/TeleMcpProject.md Project overview and audience
wiki/TeleMcpBlueprint.md Component rationale and bridge design
environments/dev/README.md Dev environment notes

Security

  • MCP bridge ServiceAccount is read-only (get / list / watch only)
  • NetworkPolicy limits bridge egress to Prometheus and Loki
  • Consider mTLS or OIDC if exposing the bridge outside the cluster

Current limitations

See INTENT.md — Current State for the full list. Notable gaps:

  • Bridge container image is a placeholder (ghcr.io/example/telemcp-bridge)
  • No Alertmanager integration in the bridge yet
  • Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar
Description
Telemetry for autonomous control
https://coulomb.social/open/TeleMcpBlueprint
Readme 1.9 MiB
Languages
Python 95.4%
Dockerfile 4.6%