Introduce pytest smoke tests, run/verify scripts, and Makefile targets so the bridge can be developed and validated without a full cluster deploy. Document the local workflow and agent quickstart in README.
166 lines
5.4 KiB
Markdown
166 lines
5.4 KiB
Markdown
# TeleMcp
|
|
|
|
**Mission control for Kubernetes hosts, exposed to LLM agents through MCP.**
|
|
|
|
TeleMcp deploys a standard observability stack onto a Linux Kubernetes host via **Ansible + Helm**, then surfaces metrics, logs, and cluster state through a read-only **MCP bridge** so an LLM agent can bootstrap, monitor, triage, and operate the box.
|
|
|
|
> For project goals, scope, and design principles, see **[INTENT.md](INTENT.md)**.
|
|
|
|
## Components
|
|
|
|
| Component | Namespace | Role |
|
|
|-----------|-----------|------|
|
|
| **kube-prometheus-stack** | `monitoring` | Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics |
|
|
| **Loki + Promtail** | `logging` | Log aggregation and shipping |
|
|
| **OpenTelemetry Collector** | `observability` | Optional OTLP fan-out to Prometheus and Loki |
|
|
| **mcp-telemetry-bridge** | `mcp` | FastAPI service exposing MCP resources, tools, and prompts |
|
|
|
|
## Local development (no cluster)
|
|
|
|
Work on the MCP bridge without deploying the full observability stack.
|
|
|
|
### Install and verify
|
|
|
|
```bash
|
|
make bridge-install # venv + deps (once)
|
|
make bridge-test # pytest smoke: /healthz, /mcp/schema, /mcp/resource
|
|
```
|
|
|
|
Or from `mcp-telemetry-bridge/`:
|
|
|
|
```bash
|
|
./scripts/verify-local.sh
|
|
```
|
|
|
|
### Run locally
|
|
|
|
```bash
|
|
make bridge-run
|
|
# or: cd mcp-telemetry-bridge && ./scripts/run-local.sh
|
|
```
|
|
|
|
With the server up, optional live HTTP checks:
|
|
|
|
```bash
|
|
make bridge-smoke
|
|
# or: RUN_LIVE=1 ./mcp-telemetry-bridge/scripts/verify-local.sh
|
|
```
|
|
|
|
Manual curls:
|
|
|
|
```bash
|
|
curl http://127.0.0.1:8080/healthz
|
|
curl http://127.0.0.1:8080/mcp/schema | jq .
|
|
curl "http://127.0.0.1:8080/mcp/resource?uri=res://dashboards/top-pods-by-cpu.promql"
|
|
```
|
|
|
|
Tool calls use `POST /tools/<name>` with a JSON body (Prometheus/Loki/K8s backends are only reachable in-cluster).
|
|
|
|
### Agent quickstart
|
|
|
|
When changing the bridge, agents should:
|
|
|
|
1. Run `make bridge-test` after edits — fast, no cluster needed.
|
|
2. Introspect `GET /mcp/schema` for the current tools, resources, and prompts.
|
|
3. Call tools via `POST /tools/<tool-name>` (e.g. `POST /tools/promql.query` with `{"expr":"up"}`).
|
|
4. Fetch saved queries via `GET /mcp/resource?uri=<uri>`.
|
|
|
|
Expected smoke-test surface:
|
|
|
|
| Endpoint | Method | Purpose |
|
|
|----------|--------|---------|
|
|
| `/healthz` | GET | Liveness |
|
|
| `/mcp/schema` | GET | MCP catalog (tools, resources, prompts) |
|
|
| `/mcp/resource` | GET | Saved PromQL/LogQL query by URI |
|
|
| `/tools/*` | POST | Execute a tool (needs in-cluster backends) |
|
|
|
|
---
|
|
|
|
## Quick Start (full cluster deploy)
|
|
|
|
### 0) Prereqs
|
|
|
|
- Ubuntu 24.04 host with k8s (k3s or kubeadm) reachable and `kubectl` context configured
|
|
- Ansible 2.15+ on your control machine
|
|
- Helm 3 on the host (Ansible role installs if missing)
|
|
|
|
### 1) Run Ansible
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventories/local.ini playbook.yml
|
|
```
|
|
|
|
### 2) Smoke tests
|
|
|
|
From any machine with a `kubectl` context:
|
|
|
|
```bash
|
|
kubectl get pods -n monitoring
|
|
kubectl get pods -n logging
|
|
kubectl get pods -n mcp
|
|
kubectl port-forward -n mcp svc/mcp-telemetry-bridge 8080:80
|
|
curl http://localhost:8080/mcp/schema | jq .
|
|
curl http://localhost:8080/healthz
|
|
```
|
|
|
|
### 3) Point your LLM agent
|
|
|
|
Configure your agent's MCP client to the bridge endpoint (ClusterIP, Ingress, or port-forward).
|
|
|
|
**Implemented tools:**
|
|
|
|
| Tool | Description |
|
|
|------|-------------|
|
|
| `promql.query` | Run a PromQL expression against Prometheus |
|
|
| `loki.query` | Run a LogQL query against Loki |
|
|
| `k8s.get` | Fetch Kubernetes objects (pods, nodes, deployments, etc.) |
|
|
| `k8s.events` | List cluster or namespace events |
|
|
| `inventory.snapshot` | JSON snapshot of nodes, namespaces, and workloads |
|
|
|
|
**Saved resources** (via `/mcp/resource?uri=...`):
|
|
|
|
- `res://dashboards/top-pods-by-cpu.promql`
|
|
- `res://dashboards/pod-restarts.promql`
|
|
- `res://dashboards/warn-events.logql`
|
|
|
|
> The bridge currently exposes an HTTP schema approximation (`/mcp/schema`, `/tools/...`). Full MCP transport (stdio/SSE) is planned — see [INTENT.md](INTENT.md).
|
|
|
|
## Repo layout
|
|
|
|
```
|
|
tele-mcp/
|
|
INTENT.md # Project north star — goals, scope, current state
|
|
ansible/ # Bootstrap playbook and roles
|
|
helm/
|
|
values/ # Chart values for monitoring, logging, OTel
|
|
mcp-telemetry-bridge/ # Bridge Helm chart
|
|
mcp-telemetry-bridge/ # FastAPI bridge application
|
|
scripts/ # run-local.sh, verify-local.sh
|
|
tests/ # pytest smoke tests
|
|
environments/ # Per-environment overrides
|
|
wiki/ # Extended project and design docs
|
|
```
|
|
|
|
## Documentation
|
|
|
|
| Document | Purpose |
|
|
|----------|---------|
|
|
| [INTENT.md](INTENT.md) | Goals, principles, scope, success criteria |
|
|
| [wiki/TeleMcpProject.md](wiki/TeleMcpProject.md) | Project overview and audience |
|
|
| [wiki/TeleMcpBlueprint.md](wiki/TeleMcpBlueprint.md) | Component rationale and bridge design |
|
|
| [environments/dev/README.md](environments/dev/README.md) | Dev environment notes |
|
|
|
|
## Security
|
|
|
|
- MCP bridge ServiceAccount is read-only (`get` / `list` / `watch` only)
|
|
- NetworkPolicy limits bridge egress to Prometheus and Loki
|
|
- Consider mTLS or OIDC if exposing the bridge outside the cluster
|
|
|
|
## Current limitations
|
|
|
|
See [INTENT.md — Current State](INTENT.md#current-state-as-of-initial-scaffold) for the full list. Notable gaps:
|
|
|
|
- Bridge container image is a placeholder (`ghcr.io/example/telemcp-bridge`)
|
|
- No Alertmanager integration in the bridge yet
|
|
- Host-level signals (systemd, certs, firewall) are deferred to a future DaemonSet sidecar |