markitect-main

Author	SHA1	Message	Date
tegwick	da9c5fce80	infospace: process book-4-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:19:39 +01:00
tegwick	cd87ebfdc0	infospace: process book-4-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:13:08 +01:00
tegwick	666f78d1ba	infospace: process book-4-introduction Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:02:00 +01:00
tegwick	579e02989b	infospace: process book-3-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:46:20 +01:00
tegwick	8401c69ff2	infospace: process book-3-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:40:35 +01:00
tegwick	1b9a31665c	fix(pipeline): retry on all LLM errors (not just rate limits) Free-tier APIs intermittently return invalid JSON or empty responses. Now any exception in _call_llm retries up to 3 times with a 5s back-off, rather than failing immediately on non-rate-limit errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 20:32:23 +01:00
tegwick	06e904ccf5	infospace: process book-3-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:30:22 +01:00
tegwick	59d42b1665	infospace: process book-3-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:18:15 +01:00
tegwick	8c11e13fef	infospace: process book-2-chapter-05 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:03:11 +01:00
tegwick	ac4e508aff	infospace: process book-2-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:57:59 +01:00
tegwick	8e1943afdb	infospace: process book-2-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:50:53 +01:00
tegwick	05711e541d	infospace: process book-2-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:43:19 +01:00
tegwick	8cb9ee6f6e	infospace: process book-2-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:26:57 +01:00
tegwick	db129fde6b	infospace: process book-1-chapter-11 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:19:20 +01:00
tegwick	6d9ec4e34b	infospace: process book-1-chapter-10 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 18:59:36 +01:00
tegwick	679f482e49	config(example): increase extract-entities max_tokens to 8000 Chapters with many pre-existing entities were still truncating at 6000 tokens because the LLM needs space to output the full list of candidates even when most are skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 18:48:33 +01:00
tegwick	368571905a	infospace: process book-1-chapter-09 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:58:08 +01:00
tegwick	9c95912d68	infospace: process book-1-chapter-08 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:47:12 +01:00
tegwick	0828581269	infospace: process book-1-chapter-07 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:40:24 +01:00
tegwick	283abac378	infospace: process book-1-chapter-06 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:29:59 +01:00
tegwick	90ca14dd85	config(example): increase max_tokens for map-to-vsm (10k) and synthesize (4k) map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis sometimes truncated at 3000 for chapters with many entities. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 15:21:04 +01:00
tegwick	098b781f92	infospace: process book-1-chapter-05 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:20:35 +01:00
tegwick	eea397a380	infospace: process book-1-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:12:54 +01:00
tegwick	7615beb139	chore(example): update metrics after chapter-03 collection check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 15:06:03 +01:00
tegwick	c2e06c15d7	infospace: process book-1-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:04:57 +01:00
tegwick	df1fdf1842	feat(pipeline): per-stage max_tokens, LLM provenance, processing log - PipelineStage now supports max_tokens to override the 4096 default - SourcePipeline records provider/model on each entity file as HTML comment - output/processing-log.yaml tracks tokens, cost, duration, retries, errors - _call_llm returns (content, metadata) for downstream traceability - _http.py wraps JSON parse errors with body preview for debugging - infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 14:50:49 +01:00
tegwick	5ede1de4b8	fix(pipeline): retry on 0-entity response, save raw debug, improve template - SourcePipeline: retry split_entities stage once when 0 entity delimiters are found (free-tier models intermittently return short non-formatted responses); save raw LLM response to <stage>-raw.md alongside prompts - Return None (pause pipeline) rather than writing empty view file when no entities found after max retries - _http.py: wrap json.JSONDecodeError in LLMAPIError with body preview - extract-entities.md: add explicit H2-heading format example to Output Format section to prevent models from using inline "Section:" format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 14:26:28 +01:00
tegwick	72d9904485	feat(infospace): add process command for batch source file processing - Extend PipelineStage with name, output_dir, output_macro, split_entities, and macros fields for declarative pipeline config - Add SourcePipeline class (pipeline.py) using simple @{macro} substitution — no SQLite dependency, skip-if-exists per stage, LLM retry on rate limits, git commit per source - Add `markitect infospace process [GLOB_PATTERN]` CLI command with --all, --provider, --model, --check-after-each, --no-commit flags - Update infospace.yaml with output_dir, output_macro, split_entities, and macros for each pipeline stage in the WoN example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 13:29:50 +01:00
tegwick	77dd3fee6d	fix(example): standardise domain enum and source chapter format in schema/rules Two root causes of metric fragmentation observed in collection checks: 1. Schema's Economic Domain used free-form examples ("labour economics, trade theory") which overrode the enum in extraction-rules.md, causing the LLM to produce multi-domain strings and non-canonical values. Fix: schema now specifies the exact 7-value enum with descriptions. 2. Source Chapter had no format constraint, producing 9 different formats for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks). Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly. These fixes are prerequisites for clean reprocessing (S3.2 continuation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 13:02:05 +01:00
tegwick	715ef19d1c	infospace: remove example output — will replay chapter by chapter This commit clears the tangled example output so each chapter can be re-committed cleanly via S3.2.	2026-02-19 09:22:55 +01:00
tegwick	3ac8447c10	feat(example): add baseline metrics snapshot from collection checks run Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters): coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0, consistency_cycles=0.0, granularity_entropy=2.69 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 07:44:01 +01:00
tegwick	94cb2063af	feat(example): migrate to infospace config with tooling integration (S3.1) Add infospace.yaml declaring topic, disciplines, schemas, viability thresholds. Integrate infospace tooling into process_chapters.py with --infospace-status, --infospace-check, and --infospace-viability flags. Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only 7/35 chapters processed so far). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:29:53 +01:00
tegwick	d1c6e53754	docs: add infospace primitives reference (S2.7) Reference document covering all infospace tooling primitives: config, entity metadata, schema validation, per-entity evaluation, collection checks, metrics history, viability, composition, and CLI commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:05:09 +01:00
tegwick	b76d6d38c1	feat(infospace): add composition model for discipline binding (S2.6) Discipline resolution, viability checking, entity access, stale mapping detection, and binding management. CLI commands: bind-discipline, disciplines, stale-mappings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:03:54 +01:00
tegwick	ce7f78d57d	feat(infospace): add metrics history and viability tracking (S2.5) History module with snapshot creation from check results, metrics file I/O, auto-append to history after checks, date-based snapshot lookup, and metric trend extraction. CLI commands: history, history-diff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:01:00 +01:00
tegwick	11585e6968	feat(infospace): add collection-level quality checks C1–C5 (S2.4) Five concern checks: Redundancy (embedding/word overlap), Coverage (FCA gap analysis), Coherence (graph connectivity), Consistency (cycle detection), Granularity (Shannon entropy). Orchestrator runs all or selected checks, CLI `markitect infospace check` command added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:54:22 +01:00
tegwick	3461d2f354	feat(infospace): add per-entity evaluation pipeline and CLI command (S2.3) Evaluation pipeline builds prompts from entity metadata, delegates to BatchEvaluator, parses structured LLM responses into ScoreEntry objects, and writes evaluation files. CLI: 'markitect infospace evaluate' with --provider, --entity, --chapter filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:48:34 +01:00
tegwick	3726503adb	feat(infospace): add lifecycle CLI commands — init, status, entities, viability (S2.2) Adds 'markitect infospace' command group with init (create config), status (entity count/domains/disciplines), entities (list with sort), and viability (threshold dashboard with pass/fail). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:46:54 +01:00
tegwick	b20fe4db68	feat(infospace): add infospace configuration model and state (S2.1) InfospaceConfig (topic, disciplines, schemas, competency questions, viability thresholds, pipeline) with YAML load/save and directory discovery. InfospaceState aggregates entities, evaluations, and viability checks for status reporting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:44:14 +01:00
tegwick	144a88c0c2	feat(prompts): add batch LLM evaluation orchestrator (S1.6) BatchEvaluator runs evaluation prompts across item batches with incremental evaluation (skip unchanged via content digest), per-item error isolation, progress callbacks, and aggregate token usage tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:40:13 +01:00
tegwick	dc22017b7c	feat(analysis): add Formal Concept Analysis for coverage gap detection (S1.7) Pure-Python FCA implementation: FormalContext (entity × attribute binary relation with extent/intent/closure), ConceptLattice via NextClosure algorithm, find_gap_concepts() for structural coverage gaps, and find_empty_cells() for cross-tabulation analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:38:35 +01:00
tegwick	f8c9ab33f0	feat(infospace): add structured evaluation output with history and diffing (S1.5) Add data models (ScoreEntry, EntityEvaluation, EvaluationSnapshot, SnapshotDiff) and I/O utilities for YAML frontmatter evaluation files, snapshot persistence, history append, and snapshot diffing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:35:22 +01:00
tegwick	bad01e32bd	feat(analysis): add graph analysis utilities with networkx (S1.4) Add connected components, betweenness centrality, Louvain community detection, modularity scoring, degree distribution, and cohesion/coupling computation. Wraps DependencyGraph via networkx (optional dependency) for downstream collection-level coherence metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:34:53 +01:00
tegwick	267368eb60	feat(llm): add embedding adapter with cache and similarity utils (S1.3) Add OpenAI-compatible embedding support (works with both OpenAI and OpenRouter), file-based embedding cache with content-digest invalidation, and pure-Python cosine similarity utilities for downstream redundancy detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 01:22:21 +01:00
tegwick	9031e1162c	feat(infospace): add schema compliance validator (S1.2) Deterministic validation of EntityMeta against declarative schemas: section presence/word counts, heading format, domain enum values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 00:48:57 +01:00
tegwick	03c6c5e8de	feat(infospace): add entity metadata parser (S1.1) Extract section-tree algorithm from SchemaGenerator into standalone core/section_tree.py and build markitect/infospace/ package with EntityMeta dataclass and parse_entity_file/parse_entity_directory. Foundation for schema compliance, coverage, and granularity metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 00:27:45 +01:00
tegwick	b5e994b014	docs: preliminary introduction to Viable Information Spaces Conceptual overview of infospaces as structured, evaluable, composable knowledge collections. Establishes the vocabulary (topic, discipline, entity, viability), the build cycle (extract, map, evaluate, refine), the five collection quality concerns, and the composition model (hierarchical, networked, swarm). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:54:53 +01:00
tegwick	4ce856d4d0	docs: metrics methodology, collection-level tasks, and infospace tooling roadmap Add METRICS-METHODOLOGY.md documenting the theoretical frameworks (SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for two-layer evaluation (LLM-Eval + deterministic aggregation) across five collection concerns: redundancy, coverage, coherence, consistency, and granularity balance. Extend INFRA-TASKS.md with assignment assessment (tasks 4-7), per-concept metrics (tasks 8-12), and collection-level metrics (tasks 13-19). Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace, topic, discipline, entity, evaluation, viability) and a three-stage implementation plan: Stage 1 platform additions, Stage 2 infospace tooling layer, Stage 3 example revision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:21 +01:00
tegwick	2f0989f9bf	docs(infospace): document infospace.db and add to .gitignore The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 22:27:08 +01:00
tegwick	60f33443ae	feat(schema): add semantic schema generation as default mode Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details schema-generate now builds content-aware schemas from the document's section hierarchy instead of counting markdown syntax elements. Detects key-value tables, data tables, link lists, and mixed content patterns to produce schemas that reflect the actual document outline. Old behavior preserved via --mode syntactic. Validator and visualization tools pinned to syntactic mode for compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 18:49:50 +01:00

1 2 3 4 5 ...

590 Commits