markitect-main

Author	SHA1	Message	Date
tegwick	5245dbbfc8	infospace: process book-4-chapter-08 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 22:25:52 +01:00
tegwick	4319d2a32b	infospace: process book-4-chapter-07 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 22:14:18 +01:00
tegwick	efdaa884c8	infospace: process book-4-chapter-06 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 22:01:44 +01:00
tegwick	2804de3d24	infospace: process book-4-chapter-05 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:47:52 +01:00
tegwick	3e96ac7b8d	infospace: process book-4-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:36:17 +01:00
tegwick	a687e508f3	infospace: process book-4-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:31:40 +01:00
tegwick	da9c5fce80	infospace: process book-4-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:19:39 +01:00
tegwick	cd87ebfdc0	infospace: process book-4-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:13:08 +01:00
tegwick	666f78d1ba	infospace: process book-4-introduction Extract entities, map to VSM, and synthesize analysis.	2026-02-19 21:02:00 +01:00
tegwick	579e02989b	infospace: process book-3-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:46:20 +01:00
tegwick	8401c69ff2	infospace: process book-3-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:40:35 +01:00
tegwick	06e904ccf5	infospace: process book-3-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:30:22 +01:00
tegwick	59d42b1665	infospace: process book-3-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:18:15 +01:00
tegwick	8c11e13fef	infospace: process book-2-chapter-05 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 20:03:11 +01:00
tegwick	ac4e508aff	infospace: process book-2-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:57:59 +01:00
tegwick	8e1943afdb	infospace: process book-2-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:50:53 +01:00
tegwick	05711e541d	infospace: process book-2-chapter-02 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:43:19 +01:00
tegwick	8cb9ee6f6e	infospace: process book-2-chapter-01 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:26:57 +01:00
tegwick	db129fde6b	infospace: process book-1-chapter-11 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 19:19:20 +01:00
tegwick	6d9ec4e34b	infospace: process book-1-chapter-10 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 18:59:36 +01:00
tegwick	679f482e49	config(example): increase extract-entities max_tokens to 8000 Chapters with many pre-existing entities were still truncating at 6000 tokens because the LLM needs space to output the full list of candidates even when most are skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 18:48:33 +01:00
tegwick	368571905a	infospace: process book-1-chapter-09 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:58:08 +01:00
tegwick	9c95912d68	infospace: process book-1-chapter-08 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:47:12 +01:00
tegwick	0828581269	infospace: process book-1-chapter-07 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:40:24 +01:00
tegwick	283abac378	infospace: process book-1-chapter-06 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:29:59 +01:00
tegwick	90ca14dd85	config(example): increase max_tokens for map-to-vsm (10k) and synthesize (4k) map-to-vsm was consistently truncating at 6000 tokens; synthesize-analysis sometimes truncated at 3000 for chapters with many entities. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 15:21:04 +01:00
tegwick	098b781f92	infospace: process book-1-chapter-05 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:20:35 +01:00
tegwick	eea397a380	infospace: process book-1-chapter-04 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:12:54 +01:00
tegwick	7615beb139	chore(example): update metrics after chapter-03 collection check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 15:06:03 +01:00
tegwick	c2e06c15d7	infospace: process book-1-chapter-03 Extract entities, map to VSM, and synthesize analysis.	2026-02-19 15:04:57 +01:00
tegwick	df1fdf1842	feat(pipeline): per-stage max_tokens, LLM provenance, processing log - PipelineStage now supports max_tokens to override the 4096 default - SourcePipeline records provider/model on each entity file as HTML comment - output/processing-log.yaml tracks tokens, cost, duration, retries, errors - _call_llm returns (content, metadata) for downstream traceability - _http.py wraps JSON parse errors with body preview for debugging - infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 14:50:49 +01:00
tegwick	5ede1de4b8	fix(pipeline): retry on 0-entity response, save raw debug, improve template - SourcePipeline: retry split_entities stage once when 0 entity delimiters are found (free-tier models intermittently return short non-formatted responses); save raw LLM response to <stage>-raw.md alongside prompts - Return None (pause pipeline) rather than writing empty view file when no entities found after max retries - _http.py: wrap json.JSONDecodeError in LLMAPIError with body preview - extract-entities.md: add explicit H2-heading format example to Output Format section to prevent models from using inline "Section:" format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 14:26:28 +01:00
tegwick	72d9904485	feat(infospace): add process command for batch source file processing - Extend PipelineStage with name, output_dir, output_macro, split_entities, and macros fields for declarative pipeline config - Add SourcePipeline class (pipeline.py) using simple @{macro} substitution — no SQLite dependency, skip-if-exists per stage, LLM retry on rate limits, git commit per source - Add `markitect infospace process [GLOB_PATTERN]` CLI command with --all, --provider, --model, --check-after-each, --no-commit flags - Update infospace.yaml with output_dir, output_macro, split_entities, and macros for each pipeline stage in the WoN example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 13:29:50 +01:00
tegwick	77dd3fee6d	fix(example): standardise domain enum and source chapter format in schema/rules Two root causes of metric fragmentation observed in collection checks: 1. Schema's Economic Domain used free-form examples ("labour economics, trade theory") which overrode the enum in extraction-rules.md, causing the LLM to produce multi-domain strings and non-canonical values. Fix: schema now specifies the exact 7-value enum with descriptions. 2. Source Chapter had no format constraint, producing 9 different formats for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks). Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly. These fixes are prerequisites for clean reprocessing (S3.2 continuation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 13:02:05 +01:00
tegwick	715ef19d1c	infospace: remove example output — will replay chapter by chapter This commit clears the tangled example output so each chapter can be re-committed cleanly via S3.2.	2026-02-19 09:22:55 +01:00
tegwick	3ac8447c10	feat(example): add baseline metrics snapshot from collection checks run Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details Initial metrics from S2.4 checks on 85 entities (7 of 35 chapters): coverage_ratio=0.361, redundancy=0.0, coherence_components=0.0, consistency_cycles=0.0, granularity_entropy=2.69 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 07:44:01 +01:00
tegwick	94cb2063af	feat(example): migrate to infospace config with tooling integration (S3.1) Add infospace.yaml declaring topic, disciplines, schemas, viability thresholds. Integrate infospace tooling into process_chapters.py with --infospace-status, --infospace-check, and --infospace-viability flags. Initial check: 85 entities, 4/5 viable (coverage 0.36 < 0.50 — only 7/35 chapters processed so far). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 02:29:53 +01:00
tegwick	4ce856d4d0	docs: metrics methodology, collection-level tasks, and infospace tooling roadmap Add METRICS-METHODOLOGY.md documenting the theoretical frameworks (SEQUAL, OntoClean, OOPS!, OntoQA, FCA, DSL principles) adapted for two-layer evaluation (LLM-Eval + deterministic aggregation) across five collection concerns: redundancy, coverage, coherence, consistency, and granularity balance. Extend INFRA-TASKS.md with assignment assessment (tasks 4-7), per-concept metrics (tasks 8-12), and collection-level metrics (tasks 13-19). Add roadmap/infospace-tooling/PLAN.md defining terminology (infospace, topic, discipline, entity, evaluation, viability) and a three-stage implementation plan: Stage 1 platform additions, Stage 2 infospace tooling layer, Stage 3 example revision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:53:21 +01:00
tegwick	2f0989f9bf	docs(infospace): document infospace.db and add to .gitignore The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 22:27:08 +01:00
tegwick	41773f1320	feat(llm): add OpenAI adapter, entity archive policy, process chapters 5-7 Add OpenAIAdapter for the OpenAI chat completions API (apikey-chatgpt.txt or OPENAI_API_KEY). Set default model to arcee-ai/trinity-large-preview:free for the infospace pipeline and increase max_tokens from 4096 to 8192. Reprocess chapter 05 with Trinity Large (was Gemini: 1 truncated entity, now 19 complete entities). Process chapters 06 (Aurora Alpha, 10 entities) and 07 (Trinity Large, 15 entities including regenerated violent-policy.md). Canonical set now at 85 unique entities. Add entity archive policy: entities are never silently deleted. Retired entities move to output/entities/archive/ with a dated reason header. New CLI option: --archive-entity <slug> --reason "...". The --list output shows the archive count alongside the canonical set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:39:44 +01:00
tegwick	880c1d1374	feat(llm): add Gemini adapter and process book-1-chapter-05 Add GeminiAdapter calling Google's Generative Language REST API (default model: gemini-2.5-flash). Register "gemini" as third provider in the factory and CLI. Add rate-limit retry with exponential backoff to the pipeline's _call_llm helper. Increase default max_tokens from 2000 to 4096. Process book-1-chapter-05 via Gemini free tier — 1 new entity extracted (necessaries-conveniencies-and-amusements-of-life), 41 existing entities correctly skipped by dedup. Canonical set now at 42 unique entities. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 22:54:37 +01:00
tegwick	2d1282a61e	feat(infospace): flat canonical entity set with cross-chapter deduplication Restructure entity storage from per-chapter subdirectories to a flat canonical set in output/entities/. Each entity exists as a single file; duplicates across chapters are detected by slug collision and skipped (first occurrence wins). Chapter views use {{ include }} transclusion to reference shared entity files. Add @{existing_entities} macro to extract-entities template so the LLM knows which entities already exist and focuses on genuinely new ones. Refactor _call_llm() from _execute_llm() for callers that handle their own file I/O. 41 unique entities from 4 chapters (2 duplicates removed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 22:24:20 +01:00
tegwick	01b9596ce6	docs(examples): add infospace-with-history tutorial Comprehensive walkthrough covering schema design, prompt templates, artifact population, pipeline usage, LLM integration, git history tracking, metrics, and how to complete the remaining 31 chapters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 01:50:49 +01:00
tegwick	ad84dd3a41	infospace: process book-1-chapter-04 via OpenRouter All 3 stages (entities, mappings, analysis) auto-generated. 1m53s wall time, 9,478 tokens (real), ~$0.07 est. cost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 01:42:05 +01:00
tegwick	e806a701ca	infospace: process book-1-chapter-03 with LLM integration Auto-generated mappings and analysis via Claude Code CLI adapter. Entities were already present from a previous session. Stats: 5m04s wall time, ~51K estimated tokens, ~$0.35 estimated cost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 01:32:24 +01:00
tegwick	fecc2fd4fa	feat(llm): add LLM integration module with OpenRouter and Claude Code adapters Implements markitect/llm/ package with concrete LLMAdapter implementations: - OpenRouterAdapter: HTTP via urllib with retry/backoff on 429/5xx - ClaudeCodeAdapter: subprocess-based Claude CLI with stdin piping - Factory pattern: create_adapter("openrouter") or create_adapter("claude-code") - API key resolution chain: constructor > env var > project-root key file - 42 unit tests, 2 integration tests (gated on API key / CLI availability) Also adds the infospace-with-history example with Wealth of Nations VSM analysis pipeline, templates, schemas, source chapters, and processed output for chapters 1-2. process_chapters.py now supports --provider and --model flags for automatic LLM-driven processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 01:17:58 +01:00
tegwick	360c3b1de2	feat(examples): add content-generator example demonstrating Prompt Dependency Resolution Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details This example demonstrates the full workflow of generating InfoTech primers using MarkiTect's Prompt Dependency Resolution infrastructure. Features demonstrated: - Artifact creation and storage with content-based addressing - PromptTemplate with @{macro} resolution across multiple spaces - Automatic dependency tracking and graph construction - Provenance tracing from outputs back to inputs - Visualization export (Mermaid format) - Incremental execution with change detection Files added: - generate_primers.py: Complete working example - README.md: Quick start guide and architecture overview - TUTORIAL.md: Comprehensive 500+ line tutorial - templates/generate-primer.md: Template with macros - artifacts/topics/: ETL and Microservices topic definitions - artifacts/guidelines/: Authoring rules and research protocol - prepdr/: Original manual system (preserved for reference) Example output: - Generates 2 primers (ETL, Microservices) - Creates 8 artifacts across 4 information spaces - Records 8 dependency edges in SQLite database - Exports dependency graph visualization Run with: cd examples/content-generator && python generate_primers.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-09 23:50:07 +01:00
tegwick	5e3646fdff	feat: complete schema-evolution topic with ADR schema and markdown support This commit closes the schema-evolution topic (260105) by adding the final deliverable (ADR schema) and fixing markdown schema support across commands. ADR Schema Created: - Comprehensive Architecture Decision Record validation schema - 12 section classifications (7 required, 2 recommended, 2 optional, 3 improper/discouraged) - Content pattern validation for ADR formatting rules (status dates, decision statements, rationale structure) - Quality metrics for completeness (word counts, sentence counts) - Follows title case naming convention (Status, Context, Decision, etc.) Markdown Schema Support Fixed: - Fixed `markitect validate` command to support .md schemas - Added load_schema_from_path() for both .json and .md files - Updated structural and semantic validation to use schema dict - Fixed `markitect generate-stub` command to support .md schemas - Uses load_schema_from_path() instead of direct JSON loading - Created DocumentWrapper class in semantic_validator.py - Extracts headings from AST tokens (heading_open, inline) - Provides get_headings_by_level() interface expected by validators - Enables section validation to work with real documents Topic Closure: - Updated SCHEMA_EVOLUTION_WORKPLAN.md with completion summary - Phases 1-3: 100% complete (via Schema-of-Schemas and Semantic Validation) - Phase 4: Deferred as future enhancement (15-20 sessions) - Phase 5: 70% complete (docs done, CI/CD templates deferred) - Created DONE.md with comprehensive task checklist - Generated ADR template stub (examples/templates/adr-template.md) - Moved topic from roadmap/ to history/260105-schema-evolution/ Files Changed: - markitect/cli.py: Added markdown schema support to validate and generate-stub - markitect/semantic_validator.py: Added DocumentWrapper class for AST parsing - markitect/schemas/adr-schema-v1.0.md: New ADR validation schema (560 lines) - examples/templates/adr-template.md: Generated ADR template stub - history/260105-schema-evolution/: Moved completed topic to history Status: Schema evolution topic successfully closed with ADR schema as final deliverable. All schema commands now support markdown schemas. Section validation working correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-06 12:32:38 +01:00
tegwick	d32dc41315	docs: update manpage and terminology examples to schema-of-schemas standard Some checks failed Test Suite / unit-tests (3.11) (push) Has been cancelled Details Test Suite / unit-tests (3.12) (push) Has been cancelled Details Test Suite / integration-tests (push) Has been cancelled Details Test Suite / e2e-tests (push) Has been cancelled Details Test Suite / performance-tests (push) Has been cancelled Details Test Suite / code-quality (push) Has been cancelled Details Test Suite / security-scan (push) Has been cancelled Details Test Suite / test-summary (push) Has been cancelled Details Updated example documentation to use the new schema-of-schemas standard with markdown schema format and multi-schema validation commands. Manpage Example Updates: - Changed schema reference from markdown-manpage-schema.json to manpage-schema-v1.0.md - Updated all commands to use new multi-schema validation syntax - Added examples of number-based validation (markitect schema-validate 2) - Added examples of batch validation (--all, ranges, lists) - Updated integration examples (CI/CD, pre-commit hooks, Makefile) - Documented schema registry workflow Terminology Example Updates: - Changed schema reference from terminology-schema.json to terminology-schema-v1.0.md - Updated all validation commands to use new CLI syntax - Added examples of schema-list and numbered selection - Added batch validation examples - Updated GitHub Actions and pre-commit hook examples - Documented schema registry access methods Key Changes: - All schema filenames now follow {domain}-schema-v{major}.{minor}.md convention - Commands use schema registry with numbered or filename selection - Batch validation examples added throughout - Integration examples updated to new standard - Documentation reflects markdown-first schema format All schemas validated successfully against metaschema. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-05 13:13:24 +01:00
tegwick	b6f95066a3	chore: establish schema-of-schemas workplan and reorganize roadmap This commit sets up the comprehensive workplan for implementing a markdown-first schema management system with naming conventions, versioning, and self-validation capabilities. ## Directory Reorganization - Renamed `todo/` → `roadmap/` for better organization - Created `roadmap/schema-of-schemas/` subdirectory - Moved schema management planning artifacts to dedicated directory ## Planning Artifacts Created ### Workplan & Documentation - WORKPLAN.md (19KB) - Comprehensive 6-phase implementation plan - SCHEMA_MANAGEMENT_PROPOSAL.md - Full analysis with 4 options - SCHEMA_MANAGEMENT_SUMMARY.md - Executive summary - README.md - Quick reference guide ### Example Schema - examples/schemas/manpage-schema-v1.md - Demonstrates markdown format ## Schema Management System Design ### Naming Convention Format: `{domain}-schema-v{major}.{minor}.md` Examples: - `manpage-schema-v1.0.md` - `terminology-schema-v1.0.md` - `api-documentation-schema-v1.0.md` ### Markdown-First Format Schemas will be markdown files with: - YAML frontmatter for metadata - Rich documentation sections - Embedded JSON schema in code block - Version history and examples ### Implementation Phases (8-10 days) Phase 0: Planning & Setup ✅ (0.5 days) - COMPLETE Phase 1: Filename Convention (1 day) - NEXT Phase 2: Markdown Loader (2-3 days) Phase 3: Schema-for-Schemas (2 days) Phase 4: Schema Migration (1-2 days) Phase 5: CLI & Documentation (1 day) Phase 6: Testing & Validation (1 day) ### Goals 1. ✅ Establish naming convention 2. ⏳ Implement filename validation 3. ⏳ Create markdown schema loader 4. ⏳ Build schema-for-schemas metaschema 5. ⏳ Migrate 5 existing schemas (remove 2 duplicates) 6. ⏳ Update CLI and documentation ## Updated Tracking ### TODO.md - Added Schema-of-Schemas as active work item - Documented Phase 1 tasks and timeline - Paused capability extraction work ### CHANGELOG.md - Added schema management system to [Unreleased] - Documented directory reorganization - Added "In Progress" section for current work ## Next Steps Begin Phase 1: 1. Implement schema_naming.py with validation 2. Add unit tests 3. Update CLI schema-ingest command 4. Create naming specification document ## Files Changed - CHANGELOG.md - Added unreleased schema management features - TODO.md - Updated active work tracking - roadmap/ - Reorganized from todo/ - roadmap/schema-of-schemas/ - New planning directory - examples/schemas/ - Example markdown schema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-04 23:47:02 +01:00

1 2

67 Commits