feat(pipeline): per-stage max_tokens, LLM provenance, processing log

- PipelineStage now supports max_tokens to override the 4096 default - SourcePipeline records provider/model on each entity file as HTML comment - output/processing-log.yaml tracks tokens, cost, duration, retries, errors - _call_llm returns (content, metadata) for downstream traceability - _http.py wraps JSON parse errors with body preview for debugging - infospace.yaml stages: extract/map=6000 tokens, synthesize=3000 tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:50:49 +01:00
parent 5ede1de4b8
commit df1fdf1842
4 changed files with 191 additions and 32 deletions
--- a/markitect/infospace/cli.py
+++ b/markitect/infospace/cli.py
@@ -575,7 +575,13 @@ def process(
    # Run pipeline
    from markitect.infospace.pipeline import SourcePipeline

-    pipeline = SourcePipeline(cfg, root, adapter=adapter, no_commit=no_commit)
+    pipeline = SourcePipeline(
+        cfg, root,
+        adapter=adapter,
+        provider=provider or "",
+        model=(model or _PROVIDER_DEFAULTS.get(provider or "", "")) if provider else "",
+        no_commit=no_commit,
+    )

    total = len(source_files)
    completed = 0