Go to file

tegwick f03e808c37 chore(consistency): sync task status from DB [auto]

Updated by fix-consistency on 2026-06-24:
  - update .custodian-brief.md for tele-mcp

2026-06-24 18:19:28 +02:00

.claude/rules

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

ansible

Bootstrap initial repo state

2025-09-07 23:39:44 +02:00

assets

Bootstrap initial repo state

2025-09-07 23:39:44 +02:00

environments/dev

Bootstrap initial repo state

2025-09-07 23:39:44 +02:00

helm

Bootstrap initial repo state

2025-09-07 23:39:44 +02:00

mcp-telemetry-bridge

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

wiki

Seeded intent and wiki pages

2026-06-22 19:09:24 +02:00

workplans

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

.custodian-brief.md

chore(consistency): sync task status from DB [auto]

2026-06-24 18:19:28 +02:00

.gitignore

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

AGENTS.md

Normalize agent instructions and workplan frontmatter (STATE-WP-0067)

2026-06-22 23:16:28 +02:00

CLAUDE.md

Normalize agent instructions and workplan frontmatter (STATE-WP-0067)

2026-06-22 23:16:28 +02:00

INTENT.md

Seeded intent and wiki pages

2026-06-22 19:09:24 +02:00

Makefile

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

README.md

Add MCP bridge local verification harness (TELE-WP-0002)

2026-06-24 18:18:00 +02:00

SCOPE.md

Normalize agent instructions and workplan frontmatter (STATE-WP-0067)

2026-06-22 23:16:28 +02:00

README.md

TeleMcp

Mission control for Kubernetes hosts, exposed to LLM agents through MCP.

TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via Ansible + Helm, then surfaces metrics, logs, and cluster state through a read-only MCP bridge so an LLM agent can bootstrap, monitor, triage, and operate the box.

For project goals, scope, and design principles, see INTENT.md.

Components

Component	Namespace	Role
kube-prometheus-stack	`monitoring`	Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics
Loki + Promtail	`logging`	Log aggregation and shipping
OpenTelemetry Collector	`observability`	Optional OTLP fan-out to Prometheus and Loki
mcp-telemetry-bridge	`mcp`	FastAPI service exposing MCP resources, tools, and prompts

Local development (no cluster)

Work on the MCP bridge without deploying the full observability stack.

Install and verify

make bridge-install   # venv + deps (once)
make bridge-test      # pytest smoke: /healthz, /mcp/schema, /mcp/resource

Or from mcp-telemetry-bridge/:

./scripts/verify-local.sh

Run locally

make bridge-run
# or: cd mcp-telemetry-bridge && ./scripts/run-local.sh

With the server up, optional live HTTP checks:

make bridge-smoke
# or: RUN_LIVE=1 ./mcp-telemetry-bridge/scripts/verify-local.sh

Manual curls:

curl http://127.0.0.1:8080/healthz
curl http://127.0.0.1:8080/mcp/schema | jq .
curl "http://127.0.0.1:8080/mcp/resource?uri=res://dashboards/top-pods-by-cpu.promql"

Tool calls use POST /tools/<name> with a JSON body (Prometheus/Loki/K8s backends are only reachable in-cluster).

Agent quickstart

When changing the bridge, agents should:

Run make bridge-test after edits — fast, no cluster needed.
Introspect GET /mcp/schema for the current tools, resources, and prompts.
Call tools via POST /tools/<tool-name> (e.g. POST /tools/promql.query with {"expr":"up"}).
Fetch saved queries via GET /mcp/resource?uri=<uri>.

Expected smoke-test surface:

Endpoint	Method	Purpose
`/healthz`	GET	Liveness
`/mcp/schema`	GET	MCP catalog (tools, resources, prompts)
`/mcp/resource`	GET	Saved PromQL/LogQL query by URI
`/tools/*`	POST	Execute a tool (needs in-cluster backends)

Quick Start (full cluster deploy)

0) Prereqs

Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and kubectl context configured
Ansible 2.15+ on your control machine
Helm 3 on the host (Ansible role installs if missing)

1) Run Ansible

cd ansible
ansible-playbook -i inventories/local.ini playbook.yml

2) Smoke tests

From any machine with a kubectl context:

kubectl get pods -n monitoring
kubectl get pods -n logging
kubectl get pods -n mcp
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
curl http://localhost:8080/mcp/schema | jq .
curl http://localhost:8080/healthz

3) Point your LLM agent

Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).

Implemented tools:

Tool	Description
`promql.query`	Run a PromQL expression against Prometheus
`loki.query`	Run a LogQL query against Loki
`k8s.get`	Fetch Kubernetes objects (pods, nodes, deployments, etc.)
`k8s.events`	List cluster or namespace events
`inventory.snapshot`	JSON snapshot of nodes, namespaces, and workloads

Saved resources (via /mcp/resource?uri=...):

res://dashboards/top-pods-by-cpu.promql
res://dashboards/pod-restarts.promql
res://dashboards/warn-events.logql

The bridge currently exposes an HTTP schema approximation (/mcp/schema, /tools/...). Full MCP transport (stdio/SSE) is planned — see INTENT.md.

Repo layout

tele-mcp/
  INTENT.md                 # Project north star — goals, scope, current state
  ansible/                  # Bootstrap playbook and roles
  helm/
    values/                 # Chart values for monitoring, logging, OTel
    mcp-telemetry-bridge/   # Bridge Helm chart
  mcp-telemetry-bridge/       # FastAPI bridge application
    scripts/                  # run-local.sh, verify-local.sh
    tests/                    # pytest smoke tests
  environments/             # Per-environment overrides
  wiki/                     # Extended project and design docs

Documentation

Document	Purpose
INTENT.md	Goals, principles, scope, success criteria
wiki/TeleMcpProject.md	Project overview and audience
wiki/TeleMcpBlueprint.md	Component rationale and bridge design
environments/dev/README.md	Dev environment notes

Security

MCP bridge ServiceAccount is read-only (get / list / watch only)
NetworkPolicy limits bridge egress to Prometheus and Loki
Consider mTLS or OIDC if exposing the bridge outside the cluster

Current limitations

See INTENT.md — Current State for the full list. Notable gaps:

Bridge container image is a placeholder (ghcr.io/example/telemcp-bridge)
No Alertmanager integration in the bridge yet
Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar