Commit Graph

9 Commits

Author SHA1 Message Date
81a4c8796a feat(infospace): add L2 entity classification with type × VSM matrix (S2.9)
Implements the L2 typed-entities layer — each entity is assigned an
Entity Type (Element, Process, Relation, Principle, Institution) and a
VSM System (S1–S5) by an LLM, with one-sentence rationales for each.

New modules:
- markitect/infospace/classification.py — EntityClassification dataclass
  + ENTITY_TYPES / VSM_SYSTEMS controlled vocabularies
- markitect/infospace/classification_io.py — write/read classification
  files (YAML frontmatter + markdown body, mirrors evaluation_io)
- markitect/infospace/classifier.py — build_classification_prompt(),
  parse_classification_response(), run_entity_classification(); batch
  runner writes files incrementally (same resumable pattern as evaluate)

CLI: markitect infospace classify [--entity SLUG] [--provider P] [--model M]
  - Incremental skip: checks output/classifications/ for existing files
  - Defaults to openrouter provider; 2000 max_tokens (Gemini 2.5 Flash
    uses ~787 thinking tokens, so 800 was too low)

CLI: markitect infospace classify-summary [--update-metrics]
  - Entity type counts + VSM system counts with percentages
  - 5 × 6 type × VSM matrix (spots structural blind spots at a glance)
  - --update-metrics writes type_distribution, type_entropy,
    vsm_type_matrix_cells to metrics.yaml

Config: InfospaceConfig gains classifications_dir (default output/classifications)
Schema: schemas/typed-entity-schema-v1.0.md — type/VSM vocabulary tables,
  rationale format rules, validation rules, metrics enabled at L2
infospace.yaml: schemas.typed_entity references typed-entity-schema-v1.0.md

Seed classifications (3): division_of_labour (Process/S1),
  natural_price_as_central_price (Principle/S2),
  invisible_hand_mechanism (Principle/S4)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 09:35:58 +01:00
2d45425b25 feat(infospace): add L3 relation graph with VSM-aware triplets (S2.8)
Implements the L3 relation graph layer — a directed graph of (Subject,
Predicate, Object) triplets annotated with VSM channel codes and feedback
roles. Triplets are authored as markdown files under output/relations/,
parsed into RelationMeta dataclasses, and analysed with networkx.

New modules:
- markitect/infospace/relation_models.py — RelationMeta dataclass +
  RELATION_TYPES controlled vocabulary (15 relation classes → VSM codes)
- markitect/infospace/relation_parser.py — parse_relation_file() and
  parse_relations_directory()

New schema: examples/infospace-with-history/schemas/relation-schema-v1.0.md
  — file naming convention, required sections, controlled vocabulary table

15 seed relation files covering the three core WoN feedback loops:
  - Capital Accumulation loop (positive reinforcement, S1/S3)
  - Market Price Balancing loop (negative feedback, S2/S3)
  - Market Extent mutual dependency (S1/S2)
  Plus structural relations: wages regulation, rent residual, price
  decomposition, invisible hand coordination

CLI: markitect infospace relations [--entity SLUG] [--vsm FILTER]
     [--loops] [--stats]
  - Builds directed graph from parsed files
  - Detects feedback loops via nx.simple_cycles()
  - 6 loops found from 15 seed relations (3 intended + 3 emergent)
  - --stats aggregates by VSM system code (strips parentheticals)

Config: InfospaceConfig gains relations_dir (default output/relations)
infospace.yaml: schemas.relation references relation-schema-v1.0.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 06:04:28 +01:00
7f1eecbdb2 feat(infospace): add eval-summary command and improve evaluate pipeline (S3.3)
- Fix evaluate dimensions to match template file:
  definition_precision, source_grounding, domain_placement,
  vsm_relevance, explanatory_value (was domain_relevance,
  discipline_alignment, conceptual_clarity)
- Add VSM background context to evaluation prompt so LLM can
  score vsm_relevance without macro injection
- Fix model_name bug: was sending literal "default" to API (HTTP 400)
- Refactor run_entity_evaluation to write files incrementally via
  callback rather than all at once after the batch — long runs are
  now resumable if interrupted
- Add incremental skip in CLI: entities with existing eval files
  are skipped automatically on re-run (acts as resume)
- Add eval-summary command: reads all eval files, shows per-dimension
  means, optionally writes per_entity_mean to metrics.yaml
- Fix record_check_results to merge rather than overwrite metrics.yaml
  so per_entity_mean survives subsequent check runs
- Add per_entity_mean viability threshold (min: 3.5) to infospace.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:26:45 +01:00
9c32ad1837 fix(infospace): exclude raw LLM output from entity parsing; lower coverage threshold
- Add `.*-raw\.md$` to `_DEFAULT_EXCLUDE_PATTERNS` in entity_parser.py to
  prevent per-chapter raw LLM output files from being parsed as entities.
  This eliminates 33 malformed domain values where delimiter text was
  bleeding into the Economic Domain field.
- Lower coverage_ratio threshold from 0.50 → 0.40 in infospace.yaml to
  reflect realistic multi-book corpus expectations (documented rationale
  in METRICS-METHODOLOGY.md).

Post-fix metrics: 988 entities, 0 malformed, coverage_ratio=0.619 (pass).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 09:28:20 +01:00
679f482e49 config(example): increase extract-entities max_tokens to 8000
Chapters with many pre-existing entities were still truncating at 6000 tokens
because the LLM needs space to output the full list of candidates even when
most are skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 18:48:33 +01:00
90ca14dd85 config(example): increase max_tokens for map-to-vsm (10k) and synthesize (4k)
map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis
sometimes truncated at 3000 for chapters with many entities.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 15:21:04 +01:00
df1fdf1842 feat(pipeline): per-stage max_tokens, LLM provenance, processing log
- PipelineStage now supports max_tokens to override the 4096 default
- SourcePipeline records provider/model on each entity file as HTML comment
- output/processing-log.yaml tracks tokens, cost, duration, retries, errors
- _call_llm returns (content, metadata) for downstream traceability
- _http.py wraps JSON parse errors with body preview for debugging
- infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:50:49 +01:00
72d9904485 feat(infospace): add process command for batch source file processing
- Extend PipelineStage with name, output_dir, output_macro,
  split_entities, and macros fields for declarative pipeline config
- Add SourcePipeline class (pipeline.py) using simple @{macro}
  substitution — no SQLite dependency, skip-if-exists per stage,
  LLM retry on rate limits, git commit per source
- Add `markitect infospace process [GLOB_PATTERN]` CLI command with
  --all, --provider, --model, --check-after-each, --no-commit flags
- Update infospace.yaml with output_dir, output_macro, split_entities,
  and macros for each pipeline stage in the WoN example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:29:50 +01:00
94cb2063af feat(example): migrate to infospace config with tooling integration (S3.1)
Add infospace.yaml declaring topic, disciplines, schemas, viability
thresholds. Integrate infospace tooling into process_chapters.py with
--infospace-status, --infospace-check, and --infospace-viability flags.

Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only
7/35 chapters processed so far).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 02:29:53 +01:00