Document semantic attractors concept

2026-06-06 00:52:21 +02:00
parent fe2e98e2a4
commit e23ed0c06b
4 changed files with 403 additions and 2 deletions
--- a/docs/graph-explorer-contract.md
+++ b/docs/graph-explorer-contract.md
@@ -104,6 +104,32 @@ edges are intentionally shortest and most elastic; deployment-to-repo edges are
 longer and looser so infrastructure placement does not collapse into the repo
 node.

+## Semantic Attractor Modes
+
+Semantic attractors are view-only topic poles that can pull graph entities
+toward conceptual neighborhoods in spring-based layouts. For repository maps,
+an operator might choose attractors such as `security`, `development`, and
+`operations`; Fabric can then score each repository's semantic closeness to
+those attractors from repo-owned `SCOPE.md` evidence and map the score to
+layout strength.
+
+Attractors are not domain edges and do not change Fabric graph data. They may
+be materialized as synthetic display-only nodes and `semantic_attraction`
+edges, or carried as top-level view metadata that the renderer turns into
+layout forces. Attraction scores should remain inspectable, with source
+references and confidence, so the operator can understand why a repository was
+pulled toward a topic.
+
+Unlike zones, attractors may overlap. A repository can be close to both
+`development` and `operations`, and the layout should place it between those
+poles. Zone resolvers, boundary diagnostics, dependency queries, blast-radius
+queries, and collapsed-zone boundary edges should ignore semantic attraction
+edges unless a host explicitly promotes an attractor relation into canonical
+graph data.
+
+See `docs/semantic-attractors.md` for the concept model, scoring semantics,
+payload direction, and implementation path.
+
 ## Display State Ownership

 The contract allows either the host service or the engine to evaluate display
--- a/docs/semantic-attractors.md
+++ b/docs/semantic-attractors.md
@@ -0,0 +1,340 @@
+# Semantic Attractors
+
+## Intent
+
+Semantic attractors are view entities that help an operator orient inside a
+medium or large graph. An attractor represents a topic, concern, capability
+area, operating mode, or other conceptual pole such as `security`,
+`development`, `operations`, `identity`, `data`, or `delivery`.
+
+The graph explorer can place attractors on the canvas and connect graph
+entities to them with view-only relationship strength. The stronger an
+entity's semantic closeness to an attractor, the more that attractor should
+pull the entity in force-directed or spring-based layouts.
+
+The first motivating use case is repository orientation. Given a set of
+repositories, the operator defines attractors such as `security`,
+`development`, and `operations`. Railiance Fabric reads each repository's
+`SCOPE.md`, estimates semantic closeness to those attractors, and maps that
+score to layout force. The resulting map becomes a navigational surface: repos
+with similar purpose drift toward the same conceptual pole without replacing
+the underlying dependency or responsibility graph.
+
+## What Attractors Are
+
+An attractor is not a fabric node in the source graph. It is a graph-view
+artifact with these responsibilities:
+
+- name a topic or concern that is useful for orientation;
+- define how closeness to that topic is measured;
+- expose a score for each eligible entity;
+- translate that score into layout hints and optional visual edges;
+- keep the scoring evidence inspectable so the map does not become mysterious.
+
+Attractors should be saved as view/profile configuration, operator presets, or
+host-provided explorer configuration. They should not mutate repo-owned Fabric
+declarations, and they should not imply that a repository provides or consumes
+a capability.
+
+## Why This Helps
+
+Dependency edges answer "what depends on what?" Ownership and deployment
+metadata answer "who owns this?" and "where does this run?" Those questions are
+necessary, but they can still leave a large repo collection hard to scan.
+
+Attractors answer a softer question: "what is this near, conceptually?"
+
+This gives operators a fast way to discover clusters such as:
+
+- repos that are security-heavy but not obvious from their names;
+- operations tooling that depends on development systems;
+- application repos that are unexpectedly close to platform/runtime concerns;
+- thin adapter repos that sit between two conceptual poles;
+- orphaned or ambiguous repos that have weak attraction to every known topic.
+
+## Core Model
+
+An attractor definition should be serializable and stable:
+
+```yaml
+id: security
+label: Security
+description: Identity, authorization, secrets, MFA, audit, policy, and trust boundaries.
+applies_to:
+  layers: [repository]
+evidence:
+  sources:
+    - type: scope_markdown
+      path: SCOPE.md
+scoring:
+  method: lexical_semantic_profile
+  anchors:
+    - security
+    - identity
+    - authorization
+    - secrets
+    - audit
+    - policy
+    - mfa
+  negative_anchors:
+    - unrelated
+normalization:
+  mode: per_entity_softmax
+layout:
+  min_score: 0.15
+  max_score: 1.0
+  strength_scale: 0.8
+  ideal_length:
+    min: 80
+    max: 420
+presentation:
+  color: "#be123c"
+  edge_style: dashed
+```
+
+The exact schema can evolve, but the responsibilities should remain separate:
+
+- `applies_to` chooses which graph elements can be scored.
+- `evidence` declares which text or metadata is used.
+- `scoring` defines the semantic metric.
+- `normalization` turns raw scores into comparable view weights.
+- `layout` maps weights to graph layout hints.
+- `presentation` controls the optional visual attractor node and edges.
+
+## Scoring From SCOPE.md
+
+`SCOPE.md` is a useful first evidence source because it is intentionally short,
+repo-owned, and written to explain when a repository is relevant. For repository
+attraction, the scorer should use sections such as:
+
+- `One-liner`
+- `Core Idea`
+- `In Scope`
+- `Relevant When`
+- `Provided Capabilities`
+- `Related / Overlapping Repositories`
+- `Terminology`
+
+Sections such as `Out of Scope` and `Not Relevant When` should be used
+carefully. They can reduce false positives, but they should not erase a topic
+just because the repo mentions a boundary. For example, a repo can say it is
+not an authorization engine while still being semantically near security
+because it models secrets, policy, or trust boundaries.
+
+The first implementation can use a transparent lexical profile:
+
+1. Parse `SCOPE.md` into sections.
+2. Tokenize section text and provided capability keywords.
+3. Weight section matches, with `One-liner`, `Core Idea`, `In Scope`, and
+   capability keywords carrying more weight than incidental notes.
+4. Score each attractor by matching configured anchors and related terms.
+5. Normalize scores per entity so one verbose `SCOPE.md` does not dominate.
+6. Store the score, confidence, and top evidence snippets in the view payload.
+
+Later implementations can replace or augment lexical scoring with embeddings,
+LLM-assisted classification, or operator-reviewed labels. The contract should
+not depend on a particular scorer.
+
+## Score Semantics
+
+Attractor scores should be continuous values in `[0, 1]`.
+
+Suggested interpretation:
+
+| Score | Meaning |
+|-------|---------|
+| `0.00` | no useful evidence of semantic closeness |
+| `0.10` to `0.30` | weak signal; useful only as a faint layout hint |
+| `0.30` to `0.60` | moderate closeness; entity should visibly lean toward the attractor |
+| `0.60` to `0.85` | strong closeness; entity likely belongs near the attractor cluster |
+| `0.85` to `1.00` | primary semantic identity or explicit operator label |
+
+Every score should carry a confidence separate from closeness. A repo with a
+thin or missing `SCOPE.md` may have low confidence even if a few terms match.
+
+Attractors should also support multi-attraction. A repository can be close to
+both `development` and `operations`; the layout should then place it between
+those poles instead of forcing a single category. This is the main difference
+from zones: zones preserve a single-surface invariant, while attractors are
+allowed to overlap because they are layout forces, not containers.
+
+## Layout Mapping
+
+Attraction scores become layout hints. They should not become domain edges.
+
+A graph explorer can map scores to synthetic view edges:
+
+```json
+{
+  "data": {
+    "id": "attractor:security->repo:flex-auth",
+    "source": "attractor:security",
+    "target": "repo:flex-auth",
+    "edgeType": "semantic_attraction",
+    "displayOnly": true,
+    "score": 0.82,
+    "confidence": 0.74,
+    "strength": "strong",
+    "layoutAffinity": 0.82,
+    "layoutIdealLength": 110,
+    "layoutElasticity": 0.9,
+    "sourceReferences": [
+      {
+        "type": "scope_markdown",
+        "path": "SCOPE.md",
+        "section": "In Scope"
+      }
+    ]
+  },
+  "classes": "semantic-attraction"
+}
+```
+
+For force-directed layouts:
+
+- stronger scores should increase spring strength or edge weight;
+- stronger scores should shorten ideal length;
+- weak scores may be hidden visually while still applying a small force;
+- edges below a configured threshold should not affect layout;
+- display-only attraction edges should be excluded from dependency, boundary,
+  blast-radius, and zone-connectivity diagnostics.
+
+Attractor nodes can be pinned, arranged on a ring, placed by the operator, or
+computed from the current profile. For first use, a stable radial placement is
+usually enough: place three to eight attractors around the graph, then let
+repositories find their balance.
+
+## View Payload Shape
+
+The graph explorer payload should be able to carry attractor metadata without
+changing the canonical Fabric graph.
+
+Recommended top-level view extension:
+
+```json
+{
+  "view": {
+    "attractors": {
+      "enabled": true,
+      "definitionSet": "repo-concerns-v1",
+      "definitions": [
+        {
+          "id": "security",
+          "label": "Security",
+          "description": "Identity, authorization, secrets, audit, and policy.",
+          "color": "#be123c"
+        }
+      ],
+      "scores": [
+        {
+          "attractor_id": "security",
+          "element_id": "repo:flex-auth",
+          "score": 0.82,
+          "confidence": 0.74,
+          "method": "lexical_semantic_profile",
+          "evidence": [
+            {
+              "source": "SCOPE.md",
+              "section": "Core Idea",
+              "terms": ["authorization", "policy"]
+            }
+          ]
+        }
+      ]
+    }
+  }
+}
+```
+
+The renderer may choose to materialize these into synthetic nodes and edges at
+runtime. A host may also emit synthetic display-only elements directly if that
+is easier for the current engine.
+
+## Operator Workflow
+
+A useful attractor workflow should feel like mapmaking:
+
+1. Choose a preset such as `Security / Development / Operations`.
+2. Review the generated scores and evidence for a few known repos.
+3. Hide or pin attractors that are not useful for the current question.
+4. Save the attractor definition set in the graph profile.
+5. Use the resulting layout to discover ambiguous, central, or misplaced repos.
+
+The UI should expose:
+
+- a toggle for semantic attractors;
+- a definition-set selector;
+- score threshold controls;
+- optional visual attraction edges;
+- pinned/unpinned attractor placement;
+- detail panels explaining why a repo is close to an attractor;
+- diagnostics for missing evidence, low confidence, and overly broad
+  attractors.
+
+## Relationship To Zones
+
+Zones and attractors solve different orientation problems.
+
+Zones are bounded drawing surfaces. A visible node belongs to zero or one zone
+in a given view. They are useful for deployment environments, access zones,
+ownership surfaces, and other container-like questions.
+
+Attractors are semantic force points. A visible node can be pulled by multiple
+attractors at once. They are useful for topical orientation, concern mapping,
+and discovering conceptual neighborhoods.
+
+The two concepts can combine cleanly:
+
+- zones can show where entities run;
+- attractors can pull repos inside or outside those zones based on semantic
+  concern;
+- zone diagnostics should ignore semantic attraction edges unless explicitly
+  configured otherwise;
+- attractor scores can be summarized inside zone details.
+
+## Initial Presets
+
+A first repository-orientation preset should keep the set small:
+
+| Attractor | Topic Signal |
+|-----------|--------------|
+| `security` | identity, secrets, authorization, policy, audit, MFA, trust boundaries |
+| `development` | source code, build, CI/CD, package publishing, scaffolding, developer workflows |
+| `operations` | deployment, runtime, monitoring, backups, incidents, infrastructure lifecycle |
+
+Useful follow-up presets:
+
+- `data`, `identity`, `delivery`, `governance`
+- `platform`, `application`, `tooling`
+- `financial`, `runtime`, `coordination`
+
+Attractors should start as operator-chosen presets rather than global truth.
+The same repository can be viewed through different conceptual lenses.
+
+## Implementation Path
+
+The concept can be implemented incrementally:
+
+1. Add an attractor definition format for graph explorer profiles.
+2. Parse repo `SCOPE.md` files during registry sync or graph export.
+3. Compute transparent lexical scores for repositories.
+4. Include attractor scores and evidence in the graph explorer payload.
+5. Add synthetic attractor nodes and display-only attraction edges in the UI.
+6. Map attraction scores to layout hints for the force-directed layout.
+7. Add detail-panel evidence and low-confidence diagnostics.
+8. Support saved attractor presets and operator score overrides.
+
+This keeps attractors as a view concern until the scoring model proves useful.
+If a semantic relation becomes durable domain knowledge, it can later be
+promoted into a proper Fabric declaration with separate evidence and review.
+
+## Open Questions
+
+- Should attractor definitions live in graph profiles, repo config, or a shared
+  registry preset file?
+- Should scoring run during registry sync, export, or entirely in the browser?
+- How much operator override should be allowed before scores become maintained
+  labels rather than computed evidence?
+- What is the right default for missing or stale `SCOPE.md` evidence?
+- Should the first implementation use only lexical scoring, or should it also
+  prepare a pluggable embedding scorer interface?