generic source-to-infospace generator

2026-05-14 19:33:22 +02:00
parent 065e17f42e
commit 46aad3cce8
20 changed files with 1629 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -31,6 +31,7 @@ Start with:
 - `docs/legacy-infospace-migration-guide.md`
 - `docs/replacement-readiness-decision.md`
 - `docs/wealth-vsm-generation-pipeline.md`
+- `docs/generic-source-generator.md`
 - `infospaces/bootstrap-pilot/`
 - `infospaces/wealth-vsm-legacy-slice/`
 - `infospaces/wealth-vsm-generation-pilot/`
--- a/docs/generic-source-generator.md
+++ b/docs/generic-source-generator.md
@@ -0,0 +1,94 @@
+# Generic Source Generator
+
+Date: 2026-05-14
+
+## Purpose
+
+`infospace-bench generate` turns a local article, ebook-like file, or folder of
+knowledge sources into a manifest-backed infospace. It generalizes the
+Wealth/VSM pilot into an explicit workflow path with deterministic fixture
+support and an optional OpenRouter provider.
+
+## Deterministic Run
+
+Use fixture responses for repeatable tests and demos:
+
+```bash
+infospace-bench generate from-source ./examples/article.md \
+  --workspace . \
+  --slug article-space \
+  --name "Article Space" \
+  --profile general-knowledge \
+  --fixture-responses ./examples/responses.yaml \
+  --apply
+```
+
+The command creates normalized source chunks, installs the selected profile,
+runs the declared workflows, writes entities, relations, evaluations, metrics,
+history, and a generation report, then registers artifacts in
+`artifacts/index.yaml`.
+
+## Stepwise Workflow
+
+```bash
+infospace-bench generate init ./book.epub \
+  --workspace . \
+  --slug book-space \
+  --name "Book Space" \
+  --profile general-knowledge \
+  --max-chunks 3
+
+infospace-bench generate plan ./infospaces/book-space --stage all
+infospace-bench generate run ./infospaces/book-space \
+  --fixture-responses ./responses.yaml
+infospace-bench generate status ./infospaces/book-space
+```
+
+`--max-chunks` caps early experiments and provider cost. `generate status`
+shows chunk counts, generated artifact counts, evaluations, metrics, history,
+and stale source/profile inputs.
+
+## OpenRouter
+
+Live model calls are explicit:
+
+```bash
+export OPENROUTER_API_KEY=...
+
+infospace-bench generate run ./infospaces/book-space \
+  --provider openrouter \
+  --model openai/gpt-4o-mini \
+  --stage all
+```
+
+Choose the `--model` value from OpenRouter model IDs. The API key is read from
+`OPENROUTER_API_KEY`; it is not written to `infospace.yaml`. Default tests never
+make live provider calls.
+
+## Resume
+
+Use resume for interrupted or reviewed runs:
+
+```bash
+infospace-bench generate resume ./infospaces/book-space \
+  --provider openrouter \
+  --model openai/gpt-4o-mini
+```
+
+Unchanged completed runs are skipped. Use `--force` when you intentionally want
+to rerun completed work. Stale status is reported when source artifact digests
+or installed profile/template files change.
+
+## Review Path
+
+After generation:
+
+- inspect `artifacts/sources/` for normalized input chunks
+- inspect `artifacts/entities/` and `artifacts/relations/` for generated claims
+- inspect `output/evaluations/` for rubric output
+- run `infospace-bench validate <root>` and `infospace-bench graph <root>`
+- review `reports/generation-summary.md`
+
+Move from the generic profile to a specialized profile when the source domain
+needs stricter terminology, narrower extraction granularity, or a discipline
+lens such as VSM.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,6 +11,9 @@ dependencies = [
 [project.scripts]
 infospace-bench = "infospace_bench.cli:main"

+[tool.setuptools.package-data]
+infospace_bench = ["profiles/**/*"]
+
 [tool.pytest.ini_options]
 pythonpath = ["src", "../markitect-tool/src"]
 testpaths = ["tests"]
--- a/src/infospace_bench/cli.py
+++ b/src/infospace_bench/cli.py
@@ -10,6 +10,12 @@ from .checks import run_collection_checks
 from .engine import engine_capability_contract, plan_asset_sync, sync_assets
 from .errors import InfospaceError
 from .evaluation_io import read_entity_evaluations
+from .generator import (
+    init_generation_infospace,
+    plan_generation,
+    run_generation,
+    status_generation,
+)
 from .history import (
    build_viability_report,
    find_snapshot,
@@ -123,6 +129,72 @@ def build_parser() -> argparse.ArgumentParser:
        help="Run assisted stages with deterministic fixture responses",
    )

+    generate = sub.add_parser("generate", help="Generate infospaces from sources")
+    generate_sub = generate.add_subparsers(dest="generate_command", required=True)
+
+    generate_init = generate_sub.add_parser(
+        "init",
+        help="Create a generation infospace from a local source",
+    )
+    generate_init.add_argument("source")
+    generate_init.add_argument("--workspace", default=".")
+    generate_init.add_argument("--slug", required=True)
+    generate_init.add_argument("--name", required=True)
+    generate_init.add_argument("--profile", default="general-knowledge")
+    generate_init.add_argument("--max-chunks", type=int, default=0)
+
+    generate_plan = generate_sub.add_parser(
+        "plan",
+        help="Plan generator work without provider calls",
+    )
+    generate_plan.add_argument("root")
+    generate_plan.add_argument("--stage", default="all")
+
+    generate_run = generate_sub.add_parser(
+        "run",
+        help="Run generator workflows for an infospace",
+    )
+    generate_run.add_argument("root")
+    generate_run.add_argument("--stage", default="all")
+    generate_run.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
+    generate_run.add_argument("--model", default="")
+    generate_run.add_argument("--fixture-responses", default="")
+    generate_run.add_argument("--resume", action="store_true")
+    generate_run.add_argument("--force", action="store_true")
+
+    generate_resume = generate_sub.add_parser(
+        "resume",
+        help="Resume generator workflows for an infospace",
+    )
+    generate_resume.add_argument("root")
+    generate_resume.add_argument("--stage", default="all")
+    generate_resume.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
+    generate_resume.add_argument("--model", default="")
+    generate_resume.add_argument("--fixture-responses", default="")
+    generate_resume.add_argument("--force", action="store_true")
+
+    generate_status = generate_sub.add_parser(
+        "status",
+        help="Inspect generator status for an infospace",
+    )
+    generate_status.add_argument("root")
+
+    generate_from_source = generate_sub.add_parser(
+        "from-source",
+        help="Initialize and optionally run generation from a local source",
+    )
+    generate_from_source.add_argument("source")
+    generate_from_source.add_argument("--workspace", default=".")
+    generate_from_source.add_argument("--slug", required=True)
+    generate_from_source.add_argument("--name", required=True)
+    generate_from_source.add_argument("--profile", default="general-knowledge")
+    generate_from_source.add_argument("--stage", default="all")
+    generate_from_source.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
+    generate_from_source.add_argument("--model", default="")
+    generate_from_source.add_argument("--fixture-responses", default="")
+    generate_from_source.add_argument("--max-chunks", type=int, default=0)
+    generate_from_source.add_argument("--apply", action="store_true")
+
    engine = sub.add_parser("engine", help="Inspect and sync engine boundary state")
    engine_sub = engine.add_subparsers(dest="engine_command", required=True)

@@ -284,6 +356,73 @@ def main(argv: list[str] | None = None) -> int:
                )
            else:
                parser.error(f"Unhandled workflow command: {args.workflow_command}")
+        elif args.command == "generate":
+            if args.generate_command == "init":
+                infospace = init_generation_infospace(
+                    Path(args.workspace),
+                    Path(args.source),
+                    args.slug,
+                    name=args.name,
+                    profile=args.profile,
+                    max_chunks=_optional_positive(args.max_chunks),
+                )
+                _write_json(
+                    {
+                        "slug": infospace.config.slug,
+                        "root": str(infospace.root),
+                        "status": "initialized",
+                    }
+                )
+            elif args.generate_command == "plan":
+                _write_json(plan_generation(Path(args.root), stage=args.stage))
+            elif args.generate_command == "run":
+                _write_json(
+                    run_generation(
+                        Path(args.root),
+                        stage=args.stage,
+                        provider=args.provider,
+                        model=args.model,
+                        fixture_responses=args.fixture_responses or None,
+                        resume=args.resume,
+                        force=args.force,
+                    ).to_dict()
+                )
+            elif args.generate_command == "resume":
+                _write_json(
+                    run_generation(
+                        Path(args.root),
+                        stage=args.stage,
+                        provider=args.provider,
+                        model=args.model,
+                        fixture_responses=args.fixture_responses or None,
+                        resume=True,
+                        force=args.force,
+                    ).to_dict()
+                )
+            elif args.generate_command == "status":
+                _write_json(status_generation(Path(args.root)))
+            elif args.generate_command == "from-source":
+                infospace = init_generation_infospace(
+                    Path(args.workspace),
+                    Path(args.source),
+                    args.slug,
+                    name=args.name,
+                    profile=args.profile,
+                    max_chunks=_optional_positive(args.max_chunks),
+                )
+                if args.apply:
+                    result = run_generation(
+                        infospace.root,
+                        stage=args.stage,
+                        provider=args.provider,
+                        model=args.model,
+                        fixture_responses=args.fixture_responses or None,
+                    )
+                    _write_json(result.to_dict())
+                else:
+                    _write_json(plan_generation(infospace.root, stage=args.stage))
+            else:
+                parser.error(f"Unhandled generate command: {args.generate_command}")
        elif args.command == "engine":
            if args.engine_command == "inspect":
                _write_json(
@@ -377,3 +516,7 @@ def _relationship_summary_payload(summary) -> dict:

 def _write_json(payload: dict) -> None:
    print(json.dumps(payload, indent=2))
+
+
+def _optional_positive(value: int) -> int | None:
+    return value if value > 0 else None
--- a/src/infospace_bench/generator.py
+++ b/src/infospace_bench/generator.py
@@ -0,0 +1,525 @@
+from __future__ import annotations
+
+import hashlib
+import shutil
+from dataclasses import asdict, dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from .checks import run_collection_checks
+from .errors import InfospaceError
+from .evaluation_io import read_entity_evaluations
+from .history import get_history, read_metrics_file, record_check_results
+from .lifecycle import create_infospace, load_infospace, register_artifact
+from .openrouter import OpenRouterAssistedGenerationAdapter
+from .source_intake import SourceChunk, normalize_source
+from .workflow import (
+    AssistedGenerationAdapter,
+    FixtureAssistedGenerationAdapter,
+    WorkflowRunResult,
+    plan_workflow,
+    run_workflow,
+)
+
+STATE_PATH = Path("output/workflows/generation-state.yaml")
+DEFAULT_PROFILE = "general-knowledge"
+WORKFLOW_BY_STAGE = {
+    "summary": ["generic-source-summary"],
+    "summarize": ["generic-source-summary"],
+    "extract": ["generic-source-entities"],
+    "entities": ["generic-source-entities"],
+    "relations": ["generic-source-relations"],
+    "evaluate": ["generic-source-evaluations"],
+    "evaluation": ["generic-source-evaluations"],
+    "all": [
+        "generic-source-summary",
+        "generic-source-entities",
+        "generic-source-relations",
+        "generic-source-evaluations",
+    ],
+}
+
+
+@dataclass(frozen=True)
+class GenerationRunResult:
+    root: str
+    status: str
+    stage: str
+    skipped: bool = False
+    stale: bool = False
+    workflows: list[dict[str, Any]] = field(default_factory=list)
+    metrics: dict[str, Any] = field(default_factory=dict)
+    history_snapshot_id: str = ""
+
+    def to_dict(self) -> dict[str, Any]:
+        data = asdict(self)
+        return {key: value for key, value in data.items() if value not in ("", [], {})}
+
+
+def init_generation_infospace(
+    workspace: str | Path,
+    source: str | Path,
+    slug: str,
+    *,
+    name: str,
+    profile: str = DEFAULT_PROFILE,
+    max_chunks: int | None = None,
+) -> Any:
+    chunks = normalize_source(source, max_chunks=max_chunks)
+    infospace = create_infospace(Path(workspace), slug, name=name)
+    _install_profile(infospace.root, profile)
+    _write_workflows(infospace.root, profile)
+    _register_source_chunks(infospace.root, chunks)
+    _write_state(
+        infospace.root,
+        {
+            "profile": profile,
+            "source": str(Path(source)),
+            "source_chunks": _source_state(infospace.root),
+            "profile_digest": _profile_digest(infospace.root, profile),
+            "stage_status": {},
+            "completed": False,
+            "created_at": _now(),
+            "updated_at": _now(),
+        },
+    )
+    return load_infospace(infospace.root)
+
+
+def plan_generation(root: str | Path, *, stage: str = "all") -> dict[str, Any]:
+    root_path = Path(root)
+    workflow_ids = _workflow_ids_for_stage(stage)
+    plans: list[dict[str, Any]] = []
+    for workflow_id in workflow_ids:
+        try:
+            plans.append(plan_workflow(root_path, workflow_id).to_dict())
+        except InfospaceError as exc:
+            plans.append(
+                {
+                    "workflow_id": workflow_id,
+                    "status": "blocked",
+                    "error": exc.to_dict(),
+                }
+            )
+    status = status_generation(root_path)
+    return {
+        "root": str(root_path),
+        "stage": stage,
+        "status": "planned",
+        "stale": status["stale"],
+        "source_chunk_count": status["source_chunk_count"],
+        "workflows": plans,
+    }
+
+
+def run_generation(
+    root: str | Path,
+    *,
+    stage: str = "all",
+    provider: str = "fixture",
+    model: str = "",
+    fixture_responses: str | Path | None = None,
+    resume: bool = False,
+    force: bool = False,
+) -> GenerationRunResult:
+    root_path = Path(root)
+    stage_key = stage.strip().lower()
+    state = _read_state(root_path)
+    status = status_generation(root_path)
+    workflow_ids = _workflow_ids_for_stage(stage_key)
+    if resume and not force and state.get("completed") is True and not status["stale"]:
+        return GenerationRunResult(
+            root=str(root_path),
+            status="skipped",
+            stage=stage,
+            skipped=True,
+            stale=False,
+            workflows=[],
+            metrics=status.get("metrics", {}),
+        )
+
+    adapter = (
+        _adapter_for(provider, model=model, fixture_responses=fixture_responses)
+        if workflow_ids
+        else None
+    )
+    workflow_results: list[dict[str, Any]] = []
+    for workflow_id in workflow_ids:
+        result = run_workflow(root_path, workflow_id, assisted_adapter=adapter)
+        workflow_results.append(result.to_dict())
+        state = _mark_workflow_completed(state, result)
+
+    metrics: dict[str, Any] = {}
+    snapshot_id = ""
+    if stage_key in {"all", "metrics"}:
+        check_result = _record_metrics(root_path)
+        metrics = check_result.metrics
+        snapshot_id = check_result.snapshot.snapshot_id
+        _write_generation_report(root_path, metrics, snapshot_id)
+
+    state.update(
+        {
+            "source_chunks": _source_state(root_path),
+            "profile_digest": _profile_digest(root_path, str(state.get("profile") or DEFAULT_PROFILE)),
+            "completed": stage_key in {"all", "metrics"},
+            "updated_at": _now(),
+            "last_run": {
+                "stage": stage,
+                "provider": provider,
+                "model": model,
+                "workflow_count": len(workflow_results),
+                "snapshot_id": snapshot_id,
+                "completed_at": _now(),
+            },
+        }
+    )
+    _write_state(root_path, state)
+    return GenerationRunResult(
+        root=str(root_path),
+        status="completed",
+        stage=stage,
+        skipped=False,
+        stale=False,
+        workflows=workflow_results,
+        metrics=metrics,
+        history_snapshot_id=snapshot_id,
+    )
+
+
+def status_generation(root: str | Path) -> dict[str, Any]:
+    root_path = Path(root)
+    infospace = load_infospace(root_path)
+    state = _read_state(root_path)
+    stale_sources = _stale_source_ids(infospace.root)
+    profile = str(state.get("profile") or DEFAULT_PROFILE)
+    stale_profile = bool(
+        state.get("profile_digest")
+        and state.get("profile_digest") != _profile_digest(infospace.root, profile)
+    )
+    evaluations = read_entity_evaluations(infospace.root / "output" / "evaluations")
+    history = get_history(infospace.root)
+    return {
+        "root": str(infospace.root),
+        "slug": infospace.config.slug,
+        "profile": profile,
+        "source_chunk_count": sum(1 for item in infospace.artifacts if item.kind == "source"),
+        "entity_count": sum(1 for item in infospace.artifacts if item.kind == "entity"),
+        "relation_count": sum(1 for item in infospace.artifacts if item.kind == "relation"),
+        "evaluation_count": len(evaluations),
+        "generated_count": sum(1 for item in infospace.artifacts if item.kind == "generated"),
+        "metrics": read_metrics_file(infospace.root / "output" / "metrics" / "metrics.yaml"),
+        "history_snapshot_count": len(history),
+        "latest_snapshot_id": history[-1].snapshot_id if history else "",
+        "stale": bool(stale_sources or stale_profile),
+        "stale_sources": stale_sources,
+        "stale_profile": stale_profile,
+        "completed": bool(state.get("completed", False)),
+        "stage_status": state.get("stage_status", {}),
+    }
+
+
+def _adapter_for(
+    provider: str,
+    *,
+    model: str,
+    fixture_responses: str | Path | None,
+) -> AssistedGenerationAdapter:
+    if fixture_responses:
+        return FixtureAssistedGenerationAdapter.from_file(Path(fixture_responses))
+    if provider == "openrouter":
+        return OpenRouterAssistedGenerationAdapter(model=model)
+    raise InfospaceError(
+        "missing_assisted_generation_adapter",
+        "Assisted generation requires --fixture-responses or --provider openrouter",
+        {"provider": provider},
+    )
+
+
+def _register_source_chunks(root: Path, chunks: list[SourceChunk]) -> None:
+    for chunk in chunks:
+        path = root / "artifacts" / "sources" / f"{chunk.chunk_id}.md"
+        path.parent.mkdir(parents=True, exist_ok=True)
+        path.write_text(chunk.markdown, encoding="utf-8")
+        register_artifact(
+            root,
+            artifact_id=f"source/{chunk.chunk_id}.md",
+            path=path,
+            kind="source",
+            title=chunk.title,
+            provenance={
+                "original_path": chunk.original_path,
+                "source_type": chunk.source_type,
+                "digest": chunk.digest,
+                "chunk_id": chunk.chunk_id,
+                "chunk_index": chunk.chunk_index,
+                "chunk_count": chunk.chunk_count,
+                "imported_at": chunk.imported_at,
+                "extractor_version": chunk.extractor_version,
+            },
+        )
+
+
+def _install_profile(root: Path, profile: str) -> None:
+    source = Path(__file__).parent / "profiles" / profile
+    if not source.is_dir():
+        raise InfospaceError(
+            "missing_generation_profile",
+            f"Generation profile does not exist: {profile}",
+            {"profile": profile, "path": str(source)},
+        )
+    profile_target = root / "profiles" / profile
+    template_target = root / "workflows" / "templates" / profile
+    shutil.copytree(source, profile_target, dirs_exist_ok=True)
+    shutil.copytree(source / "templates", template_target, dirs_exist_ok=True)
+
+
+def _write_workflows(root: Path, profile: str) -> None:
+    config_path = root / "infospace.yaml"
+    config = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
+    config["schemas"] = {
+        **dict(config.get("schemas") or {}),
+        "entity": f"profiles/{profile}/contracts/entity.contract.md",
+        "relation": f"profiles/{profile}/contracts/relation.contract.md",
+        "evaluation": f"profiles/{profile}/contracts/evaluation.contract.md",
+    }
+    config["workflows"] = _profile_workflows(profile)
+    config_path.write_text(yaml.safe_dump(config, sort_keys=False), encoding="utf-8")
+
+
+def _profile_workflows(profile: str) -> list[dict[str, Any]]:
+    base = f"workflows/templates/{profile}"
+    return [
+        {
+            "id": "generic-source-summary",
+            "description": "Summarize normalized source chunks.",
+            "inputs": {"source": {"kind": "source"}},
+            "static_macros": {"profile": profile},
+            "stages": [
+                {
+                    "id": "summarize-source",
+                    "kind": "assisted",
+                    "input": "source",
+                    "template": f"{base}/summarize-source.md",
+                    "provider_hint": "openrouter",
+                    "output": {
+                        "path": "artifacts/generated/{{ input.slug }}-summary.md",
+                        "artifact_id": "generated/{{ input.slug }}-summary.md",
+                        "kind": "generated",
+                        "title": "{{ input.title }} Summary",
+                    },
+                }
+            ],
+        },
+        {
+            "id": "generic-source-entities",
+            "description": "Extract reusable entity artifacts from source chunks.",
+            "inputs": {"source": {"kind": "source"}},
+            "static_macros": {"profile": profile},
+            "stages": [
+                {
+                    "id": "extract-entities",
+                    "kind": "assisted",
+                    "input": "source",
+                    "template": f"{base}/extract-entities.md",
+                    "provider_hint": "openrouter",
+                    "output": {
+                        "path": "artifacts/generated/{{ input.slug }}-entities.md",
+                        "artifact_id": "generated/{{ input.slug }}-entities.md",
+                        "kind": "generated",
+                        "title": "{{ input.title }} Entity Bundle",
+                    },
+                },
+                {
+                    "id": "split-entities",
+                    "kind": "split_entities",
+                    "input": "source",
+                    "template": "",
+                    "static_macros": {"bundle_stage": "extract-entities"},
+                },
+            ],
+        },
+        {
+            "id": "generic-source-relations",
+            "description": "Extract relation artifacts from source chunks.",
+            "inputs": {"source": {"kind": "source"}},
+            "static_macros": {"profile": profile},
+            "stages": [
+                {
+                    "id": "extract-relations",
+                    "kind": "assisted",
+                    "input": "source",
+                    "template": f"{base}/extract-relations.md",
+                    "provider_hint": "openrouter",
+                    "output": {
+                        "path": "artifacts/relations/{{ input.slug }}-relations.md",
+                        "artifact_id": "relation/{{ input.slug }}-relations.md",
+                        "kind": "relation",
+                        "title": "{{ input.title }} Relations",
+                    },
+                }
+            ],
+        },
+        {
+            "id": "generic-source-evaluations",
+            "description": "Evaluate generated entities with the profile rubric.",
+            "inputs": {"entity": {"kind": "entity"}},
+            "static_macros": {"profile": profile},
+            "stages": [
+                {
+                    "id": "evaluate-entity",
+                    "kind": "assisted",
+                    "input": "entity",
+                    "template": f"{base}/evaluate-entity.md",
+                    "provider_hint": "openrouter",
+                    "output": {
+                        "path": "output/evaluations/{{ input.slug }}.md",
+                        "artifact_id": "generated/evaluation-{{ input.slug }}.md",
+                        "kind": "generated",
+                        "title": "{{ input.title }} Evaluation",
+                    },
+                }
+            ],
+        },
+    ]
+
+
+def _record_metrics(root: Path) -> Any:
+    infospace = load_infospace(root)
+    return record_check_results(
+        infospace.root,
+        run_collection_checks(infospace.artifacts),
+        artifact_evaluations=read_entity_evaluations(infospace.root / "output" / "evaluations"),
+        schema_name="generic-source",
+        metadata={"generator": "generic-source"},
+    )
+
+
+def _write_generation_report(root: Path, metrics: dict[str, Any], snapshot_id: str) -> None:
+    status = status_generation(root)
+    text = "\n".join(
+        [
+            "# Generation Report",
+            "",
+            f"Snapshot: {snapshot_id}",
+            f"Sources: {status['source_chunk_count']}",
+            f"Entities: {status['entity_count']}",
+            f"Relations: {status['relation_count']}",
+            f"Evaluations: {status['evaluation_count']}",
+            "",
+            "## Metrics",
+            "",
+            *[f"- {name}: {value}" for name, value in sorted(metrics.items())],
+            "",
+        ]
+    )
+    path = root / "reports" / "generation-summary.md"
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(text, encoding="utf-8")
+    register_artifact(
+        root,
+        artifact_id="generated/generation-summary.md",
+        path=path,
+        kind="generated",
+        title="Generation Summary",
+        provenance={"workflow_id": "generic-source-generator", "snapshot_id": snapshot_id},
+    )
+
+
+def _workflow_ids_for_stage(stage: str) -> list[str]:
+    normalized = stage.strip().lower()
+    if normalized == "intake":
+        return []
+    if normalized == "metrics":
+        return []
+    if normalized not in WORKFLOW_BY_STAGE:
+        raise InfospaceError(
+            "invalid_generation_stage",
+            f"Unsupported generation stage: {stage}",
+            {
+                "stage": stage,
+                "valid_stages": sorted([*WORKFLOW_BY_STAGE, "intake", "metrics"]),
+            },
+        )
+    return WORKFLOW_BY_STAGE[normalized]
+
+
+def _source_state(root: Path) -> dict[str, Any]:
+    infospace = load_infospace(root)
+    return {
+        item.id: {
+            "path": item.path,
+            "digest": item.provenance.get("digest", ""),
+            "title": item.title,
+            "source_type": item.provenance.get("source_type", ""),
+            "chunk_id": item.provenance.get("chunk_id", ""),
+        }
+        for item in infospace.artifacts
+        if item.kind == "source"
+    }
+
+
+def _stale_source_ids(root: Path) -> list[str]:
+    infospace = load_infospace(root)
+    stale: list[str] = []
+    for item in infospace.artifacts:
+        if item.kind != "source":
+            continue
+        path = infospace.root / item.path
+        expected = str(item.provenance.get("digest") or "")
+        if not path.is_file() or (expected and _digest_text(path.read_text(encoding="utf-8")) != expected):
+            stale.append(item.id)
+    return stale
+
+
+def _mark_workflow_completed(
+    state: dict[str, Any],
+    result: WorkflowRunResult,
+) -> dict[str, Any]:
+    stage_status = dict(state.get("stage_status") or {})
+    stage_status[result.workflow_id] = {
+        "status": result.status,
+        "run_id": result.run_id,
+        "output_artifact_ids": [output.artifact_id for output in result.outputs],
+        "updated_at": _now(),
+    }
+    return {**state, "stage_status": stage_status}
+
+
+def _profile_digest(root: Path, profile: str) -> str:
+    files: list[Path] = []
+    for base in (
+        root / "profiles" / profile,
+        root / "workflows" / "templates" / profile,
+    ):
+        if base.is_dir():
+            files.extend(path for path in sorted(base.rglob("*")) if path.is_file())
+    hasher = hashlib.sha256()
+    for path in files:
+        hasher.update(str(path.relative_to(root)).encode("utf-8"))
+        hasher.update(path.read_bytes())
+    return hasher.hexdigest()
+
+
+def _read_state(root: Path) -> dict[str, Any]:
+    path = root / STATE_PATH
+    if not path.is_file():
+        return {}
+    data = yaml.safe_load(path.read_text(encoding="utf-8"))
+    return data if isinstance(data, dict) else {}
+
+
+def _write_state(root: Path, state: dict[str, Any]) -> None:
+    path = root / STATE_PATH
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(yaml.safe_dump(state, sort_keys=False), encoding="utf-8")
+
+
+def _digest_text(text: str) -> str:
+    return hashlib.sha256(text.encode("utf-8")).hexdigest()
+
+
+def _now() -> str:
+    return datetime.now(timezone.utc).isoformat()
--- a/src/infospace_bench/openrouter.py
+++ b/src/infospace_bench/openrouter.py
@@ -0,0 +1,142 @@
+from __future__ import annotations
+
+import json
+import os
+import time
+import urllib.error
+import urllib.request
+from dataclasses import dataclass
+from typing import Any, Callable
+
+from .errors import InfospaceError
+from .workflow import AssistedGenerationRequest, AssistedGenerationResult
+
+OPENROUTER_ENDPOINT = "https://openrouter.ai/api/v1/chat/completions"
+Transport = Callable[[dict[str, Any], dict[str, str], str], dict[str, Any]]
+
+
+@dataclass(frozen=True)
+class OpenRouterAssistedGenerationAdapter:
+    model: str
+    api_key: str = ""
+    endpoint: str = OPENROUTER_ENDPOINT
+    transport: Transport | None = None
+    retry_limit: int = 2
+    timeout_seconds: float = 60.0
+
+    def __post_init__(self) -> None:
+        key = self.api_key or os.environ.get("OPENROUTER_API_KEY", "")
+        if not key:
+            raise InfospaceError(
+                "missing_openrouter_api_key",
+                "OPENROUTER_API_KEY is required for the OpenRouter provider",
+                {"env": "OPENROUTER_API_KEY"},
+            )
+        object.__setattr__(self, "api_key", key)
+        if not self.model:
+            raise InfospaceError(
+                "missing_openrouter_model",
+                "OpenRouter provider requires an explicit model",
+                {"option": "--model"},
+            )
+
+    def generate(
+        self,
+        request: AssistedGenerationRequest,
+    ) -> AssistedGenerationResult:
+        payload = {
+            "model": self.model,
+            "messages": [
+                {
+                    "role": "system",
+                    "content": (
+                        "Return concise, valid Markdown only. Preserve explicit "
+                        "contracts requested in the user prompt."
+                    ),
+                },
+                {"role": "user", "content": request.prompt},
+            ],
+            "metadata": {
+                "workflow_id": request.workflow_id,
+                "stage_id": request.stage_id,
+                "input_artifact_id": request.input_artifact_id,
+            },
+        }
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+            "HTTP-Referer": "https://github.com/markitect/infospace-bench",
+            "X-Title": "infospace-bench",
+        }
+        started = time.monotonic()
+        retry_count = 0
+        last_error = ""
+        while True:
+            try:
+                response = (
+                    self.transport(payload, headers, self.endpoint)
+                    if self.transport is not None
+                    else self._default_transport(payload, headers, self.endpoint)
+                )
+                choice = (response.get("choices") or [{}])[0]
+                message = choice.get("message") or {}
+                markdown = str(message.get("content") or "")
+                if not markdown:
+                    raise InfospaceError(
+                        "empty_openrouter_response",
+                        "OpenRouter returned an empty assistant response",
+                        {"model": self.model, "response_id": response.get("id")},
+                    )
+                return AssistedGenerationResult(
+                    markdown=markdown,
+                    provider="openrouter",
+                    metadata={
+                        "model": self.model,
+                        "request_id": str(response.get("id") or ""),
+                        "usage": response.get("usage") or {},
+                        "retry_count": retry_count,
+                        "duration_seconds": round(time.monotonic() - started, 3),
+                    },
+                )
+            except (urllib.error.HTTPError, urllib.error.URLError, TimeoutError) as exc:
+                last_error = str(exc)
+            except InfospaceError:
+                raise
+            except Exception as exc:  # pragma: no cover - defensive provider boundary
+                last_error = str(exc)
+
+            if retry_count >= self.retry_limit:
+                raise InfospaceError(
+                    "openrouter_request_failed",
+                    "OpenRouter request failed after bounded retries",
+                    {
+                        "model": self.model,
+                        "retry_count": retry_count,
+                        "error": last_error,
+                    },
+                )
+            retry_count += 1
+            time.sleep(min(2**retry_count, 8))
+
+    def _default_transport(
+        self,
+        payload: dict[str, Any],
+        headers: dict[str, str],
+        endpoint: str,
+    ) -> dict[str, Any]:
+        request = urllib.request.Request(
+            endpoint,
+            data=json.dumps(payload).encode("utf-8"),
+            headers=headers,
+            method="POST",
+        )
+        with urllib.request.urlopen(request, timeout=self.timeout_seconds) as response:
+            data = response.read().decode("utf-8")
+        parsed = json.loads(data)
+        if not isinstance(parsed, dict):
+            raise InfospaceError(
+                "invalid_openrouter_response",
+                "OpenRouter returned a non-object JSON response",
+                {"model": self.model},
+            )
+        return parsed
--- a/src/infospace_bench/profiles/general-knowledge/contracts/entity.contract.md
+++ b/src/infospace_bench/profiles/general-knowledge/contracts/entity.contract.md
@@ -0,0 +1,9 @@
+# Entity Contract
+
+Each generated entity must be a Markdown artifact with:
+
+- one top-level heading containing the entity title
+- a `## Definition` section
+- optional `## Context`, `## Source Evidence`, and `## Review Notes` sections
+
+Entity titles should be stable, short, and reusable across source chunks.
--- a/src/infospace_bench/profiles/general-knowledge/contracts/evaluation.contract.md
+++ b/src/infospace_bench/profiles/general-knowledge/contracts/evaluation.contract.md
@@ -0,0 +1,10 @@
+# Evaluation Contract
+
+Each evaluation must be Markdown with YAML frontmatter containing:
+
+- `artifact_id`
+- `evaluator`
+- `evaluated_at`
+- `scores`
+
+Scores should include groundedness and usefulness on a 0 to 5 scale.
--- a/src/infospace_bench/profiles/general-knowledge/contracts/relation.contract.md
+++ b/src/infospace_bench/profiles/general-knowledge/contracts/relation.contract.md
@@ -0,0 +1,11 @@
+# Relation Contract
+
+Each generated relation must be a Markdown artifact with:
+
+- one top-level heading containing the relation title
+- `## Subject`
+- `## Predicate`
+- `## Object`
+- optional `## Relation Type`, `## Evidence`, and `## Feedback Role`
+
+Subject and object values should match generated entity titles whenever possible.
--- a/src/infospace_bench/profiles/general-knowledge/contracts/summary.contract.md
+++ b/src/infospace_bench/profiles/general-knowledge/contracts/summary.contract.md
@@ -0,0 +1,7 @@
+# Summary Contract
+
+Each source summary should preserve:
+
+- the core claims or concepts
+- evidence phrases useful for later review
+- unresolved ambiguities or extraction risks
--- a/src/infospace_bench/profiles/general-knowledge/profile.yaml
+++ b/src/infospace_bench/profiles/general-knowledge/profile.yaml
@@ -0,0 +1,14 @@
+id: general-knowledge
+name: General Knowledge
+description: Generic infospace generation profile for local articles, ebooks, and knowledge collections.
+terminology:
+  source_chunk: Normalized source artifact
+  entity: Durable concept, claim, method, person, place, work, or object
+  relation: Typed link between two generated entities
+granularity:
+  default: Extract entities that can stand alone as useful review artifacts.
+evaluation_criteria:
+  - groundedness
+  - usefulness
+  - clarity
+  - provenance
--- a/src/infospace_bench/profiles/general-knowledge/templates/evaluate-entity.md
+++ b/src/infospace_bench/profiles/general-knowledge/templates/evaluate-entity.md
@@ -0,0 +1,14 @@
+# Evaluate Entity
+
+Profile: {{ macros.profile }}
+
+Evaluate the generated entity as Markdown with YAML frontmatter. Include
+`artifact_id`, `evaluator`, `evaluated_at`, and scores for groundedness and
+usefulness on a 0 to 5 scale.
+
+Entity artifact: {{ input.artifact_id }}
+Entity title: {{ input.title }}
+
+## Entity
+
+{{ input.content }}
--- a/src/infospace_bench/profiles/general-knowledge/templates/extract-entities.md
+++ b/src/infospace_bench/profiles/general-knowledge/templates/extract-entities.md
@@ -0,0 +1,15 @@
+# Extract Entities
+
+Profile: {{ macros.profile }}
+
+Extract reusable infospace entities from the source chunk. Return one Markdown
+bundle where each entity starts with `# Entity Title` and contains at least a
+`## Definition` section. Prefer durable concepts, claims, named methods,
+people, places, works, and objects over sentence fragments.
+
+Source title: {{ input.title }}
+Source artifact: {{ input.artifact_id }}
+
+## Source
+
+{{ input.content }}
--- a/src/infospace_bench/profiles/general-knowledge/templates/extract-relations.md
+++ b/src/infospace_bench/profiles/general-knowledge/templates/extract-relations.md
@@ -0,0 +1,14 @@
+# Extract Relations
+
+Profile: {{ macros.profile }}
+
+Extract a small set of important relations from the source chunk. Return one
+Markdown relation artifact with sections `## Subject`, `## Predicate`, and
+`## Object`. Use entity-style names for subject and object.
+
+Source title: {{ input.title }}
+Source artifact: {{ input.artifact_id }}
+
+## Source
+
+{{ input.content }}
--- a/src/infospace_bench/profiles/general-knowledge/templates/summarize-source.md
+++ b/src/infospace_bench/profiles/general-knowledge/templates/summarize-source.md
@@ -0,0 +1,13 @@
+# Summarize Source Chunk
+
+Profile: {{ macros.profile }}
+
+Summarize the source chunk as Markdown. Preserve concrete claims, named concepts,
+and evidence phrases that should guide later entity and relation extraction.
+
+Source title: {{ input.title }}
+Source artifact: {{ input.artifact_id }}
+
+## Source
+
+{{ input.content }}
--- a/src/infospace_bench/profiles/general-knowledge/templates/synthesize-report.md
+++ b/src/infospace_bench/profiles/general-knowledge/templates/synthesize-report.md
@@ -0,0 +1,6 @@
+# Synthesize Collection Report
+
+Profile: {{ macros.profile }}
+
+Synthesize a concise report from generated source summaries, entities,
+relations, evaluations, and collection metrics.
--- a/src/infospace_bench/source_intake.py
+++ b/src/infospace_bench/source_intake.py
@@ -0,0 +1,273 @@
+from __future__ import annotations
+
+import hashlib
+import html
+import re
+import zipfile
+from dataclasses import asdict, dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Iterable
+
+from .errors import InfospaceError
+from .semantics import slugify
+
+EXTRACTOR_VERSION = "generic-source-intake-v1"
+SUPPORTED_EXTENSIONS = {".md", ".markdown", ".txt", ".html", ".htm", ".epub"}
+HTML_TITLE_RE = re.compile(r"<title[^>]*>(?P<title>.*?)</title>", re.I | re.S)
+HTML_H1_RE = re.compile(r"<h1[^>]*>(?P<title>.*?)</h1>", re.I | re.S)
+SCRIPT_STYLE_RE = re.compile(r"<(script|style)[^>]*>.*?</\1>", re.I | re.S)
+TAG_RE = re.compile(r"<[^>]+>")
+
+
+@dataclass(frozen=True)
+class SourceChunk:
+    chunk_id: str
+    title: str
+    markdown: str
+    source_type: str
+    original_path: str
+    digest: str
+    chunk_index: int
+    chunk_count: int
+    imported_at: str
+    extractor_version: str = EXTRACTOR_VERSION
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+
+@dataclass(frozen=True)
+class _SourceDocument:
+    title: str
+    markdown: str
+    source_type: str
+    original_path: str
+    base_slug: str
+
+
+def normalize_source(
+    source: str | Path,
+    *,
+    max_words: int = 800,
+    max_chunks: int | None = None,
+) -> list[SourceChunk]:
+    source_path = Path(source)
+    if not source_path.exists():
+        raise InfospaceError(
+            "missing_source",
+            f"Source path does not exist: {source_path}",
+            {"source": str(source_path)},
+        )
+    documents = list(_iter_documents(source_path))
+    if not documents:
+        raise InfospaceError(
+            "unsupported_source",
+            f"No supported source documents found: {source_path}",
+            {
+                "source": str(source_path),
+                "supported_extensions": sorted(SUPPORTED_EXTENSIONS),
+            },
+        )
+    imported_at = datetime.now(timezone.utc).isoformat()
+    chunks: list[SourceChunk] = []
+    used_ids: set[str] = set()
+    for document in documents:
+        pieces = _chunk_markdown(document.markdown, max_words=max_words)
+        for index, piece in enumerate(pieces):
+            title = document.title if len(pieces) == 1 else f"{document.title} Part {index + 1}"
+            base_id = (
+                document.base_slug if len(pieces) == 1 else f"{document.base_slug}-part-{index + 1:03d}"
+            )
+            chunk_id = _dedupe_chunk_id(base_id, used_ids)
+            chunks.append(
+                SourceChunk(
+                    chunk_id=chunk_id,
+                    title=title,
+                    markdown=piece,
+                    source_type=document.source_type,
+                    original_path=document.original_path,
+                    digest=_digest_text(piece),
+                    chunk_index=index,
+                    chunk_count=len(pieces),
+                    imported_at=imported_at,
+                )
+            )
+            if max_chunks is not None and max_chunks > 0 and len(chunks) >= max_chunks:
+                return chunks
+    return chunks
+
+
+def _iter_documents(source_path: Path) -> Iterable[_SourceDocument]:
+    if source_path.is_dir():
+        for path in sorted(source_path.rglob("*")):
+            if path.is_file() and path.suffix.lower() in SUPPORTED_EXTENSIONS:
+                yield from _iter_documents(path)
+        return
+
+    suffix = source_path.suffix.lower()
+    if suffix in (".md", ".markdown"):
+        yield _markdown_document(source_path)
+    elif suffix == ".txt":
+        yield _text_document(source_path)
+    elif suffix in (".html", ".htm"):
+        yield _html_document(source_path, source_type="html")
+    elif suffix == ".epub":
+        yield from _epub_documents(source_path)
+
+
+def _markdown_document(path: Path) -> _SourceDocument:
+    markdown = _normalize_newlines(path.read_text(encoding="utf-8")).strip() + "\n"
+    title = _markdown_title(markdown) or _title_from_path(path)
+    return _SourceDocument(
+        title=title,
+        markdown=_ensure_h1(markdown, title),
+        source_type="markdown",
+        original_path=str(path),
+        base_slug=slugify(title) or slugify(path.stem) or "source",
+    )
+
+
+def _text_document(path: Path) -> _SourceDocument:
+    title = _title_from_path(path)
+    body = _normalize_newlines(path.read_text(encoding="utf-8")).strip()
+    markdown = f"# {title}\n\n{body}\n"
+    return _SourceDocument(
+        title=title,
+        markdown=markdown,
+        source_type="text",
+        original_path=str(path),
+        base_slug=slugify(title) or "source",
+    )
+
+
+def _html_document(
+    path: Path,
+    *,
+    source_type: str,
+    original_path: str | None = None,
+    text: str | None = None,
+) -> _SourceDocument:
+    raw = text if text is not None else path.read_text(encoding="utf-8")
+    title = _html_title(raw) or _title_from_path(path)
+    body = _html_to_text(raw)
+    if body.lower().startswith(title.lower()):
+        body = body[len(title) :].strip()
+    markdown = f"# {title}\n\n{body}\n"
+    return _SourceDocument(
+        title=title,
+        markdown=markdown,
+        source_type=source_type,
+        original_path=original_path or str(path),
+        base_slug=slugify(title) or slugify(path.stem) or "source",
+    )
+
+
+def _epub_documents(path: Path) -> Iterable[_SourceDocument]:
+    try:
+        with zipfile.ZipFile(path) as archive:
+            names = [
+                name
+                for name in sorted(archive.namelist())
+                if Path(name).suffix.lower() in {".html", ".htm", ".xhtml", ".txt", ".md"}
+                and not name.endswith("/")
+            ]
+            for name in names:
+                raw = archive.read(name).decode("utf-8", errors="replace")
+                pseudo_path = Path(name)
+                if pseudo_path.suffix.lower() in {".txt", ".md"}:
+                    title = _markdown_title(raw) or _title_from_path(pseudo_path)
+                    markdown = _ensure_h1(_normalize_newlines(raw).strip() + "\n", title)
+                    yield _SourceDocument(
+                        title=title,
+                        markdown=markdown,
+                        source_type="epub",
+                        original_path=f"{path}!{name}",
+                        base_slug=slugify(title) or slugify(pseudo_path.stem) or "source",
+                    )
+                else:
+                    yield _html_document(
+                        pseudo_path,
+                        source_type="epub",
+                        original_path=f"{path}!{name}",
+                        text=raw,
+                    )
+    except zipfile.BadZipFile as exc:
+        raise InfospaceError(
+            "invalid_epub_source",
+            f"EPUB source is not a readable zip archive: {path}",
+            {"source": str(path)},
+        ) from exc
+
+
+def _chunk_markdown(markdown: str, *, max_words: int) -> list[str]:
+    text = markdown.strip()
+    if max_words <= 0:
+        return [text + "\n"]
+    words = text.split()
+    if len(words) <= max_words:
+        return [text + "\n"]
+    chunks: list[str] = []
+    heading = _markdown_title(text) or "Source"
+    body_words = re.sub(r"(?m)^# .+?\n+", "", text, count=1).split()
+    for start in range(0, len(body_words), max_words):
+        part = " ".join(body_words[start : start + max_words]).strip()
+        chunks.append(f"# {heading} Part {len(chunks) + 1}\n\n{part}\n")
+    return chunks
+
+
+def _html_title(raw: str) -> str:
+    match = HTML_TITLE_RE.search(raw) or HTML_H1_RE.search(raw)
+    if not match:
+        return ""
+    return _collapse_ws(_html_to_text(match.group("title")))
+
+
+def _html_to_text(raw: str) -> str:
+    cleaned = SCRIPT_STYLE_RE.sub(" ", raw)
+    cleaned = re.sub(r"</(p|div|section|article|h[1-6]|li)>", "\n", cleaned, flags=re.I)
+    cleaned = TAG_RE.sub(" ", cleaned)
+    cleaned = html.unescape(cleaned)
+    lines = [_collapse_ws(line) for line in cleaned.splitlines()]
+    return "\n\n".join(line for line in lines if line).strip()
+
+
+def _ensure_h1(markdown: str, title: str) -> str:
+    if re.search(r"(?m)^#\s+\S", markdown):
+        return markdown
+    return f"# {title}\n\n{markdown.strip()}\n"
+
+
+def _markdown_title(markdown: str) -> str:
+    match = re.search(r"(?m)^#\s+(?P<title>.+?)\s*$", markdown)
+    return match.group("title").strip() if match else ""
+
+
+def _title_from_path(path: Path) -> str:
+    words = re.sub(r"[^A-Za-z0-9]+", " ", path.stem).strip()
+    return words.title() if words else "Source"
+
+
+def _dedupe_chunk_id(base_id: str, used_ids: set[str]) -> str:
+    candidate = base_id or "source"
+    if candidate not in used_ids:
+        used_ids.add(candidate)
+        return candidate
+    index = 2
+    while f"{candidate}-{index}" in used_ids:
+        index += 1
+    deduped = f"{candidate}-{index}"
+    used_ids.add(deduped)
+    return deduped
+
+
+def _digest_text(text: str) -> str:
+    return hashlib.sha256(text.encode("utf-8")).hexdigest()
+
+
+def _collapse_ws(value: str) -> str:
+    return re.sub(r"\s+", " ", value).strip()
+
+
+def _normalize_newlines(value: str) -> str:
+    return value.replace("\r\n", "\n").replace("\r", "\n")
--- a/src/infospace_bench/workflow.py
+++ b/src/infospace_bench/workflow.py
@@ -273,10 +273,12 @@ class WorkflowStageRecord:
    input_artifact_id: str
    output_artifact_id: str = ""
    message: str = ""
+    provider: str = ""
+    metadata: dict[str, Any] = field(default_factory=dict)

    def to_dict(self) -> dict[str, Any]:
        data = asdict(self)
-        return {key: value for key, value in data.items() if value != ""}
+        return {key: value for key, value in data.items() if value not in ("", {}, [])}


@dataclass(frozen=True)
@@ -442,6 +444,7 @@ def _execute_workflow(
                    infospace.root,
                    dry_run=False,
                    provider=result.provider,
+                    provider_metadata=result.metadata,
                )
                outputs.append(output)
                stage_outputs[stage.id] = {
@@ -458,6 +461,8 @@ def _execute_workflow(
                        status="completed",
                        input_artifact_id=input_record.artifact_id,
                        output_artifact_id=output.artifact_id,
+                        provider=result.provider,
+                        metadata=result.metadata,
                    )
                )
            elif stage.kind == "split_entities":
@@ -645,6 +650,7 @@ def _resolve_output(
    *,
    dry_run: bool,
    provider: str = "",
+    provider_metadata: dict[str, Any] | None = None,
 ) -> WorkflowOutputRecord:
    if stage.output is None:
        raise InfospaceError(
@@ -673,6 +679,11 @@ def _resolve_output(
                    "stage_id": stage.id,
                    "input_artifact_id": input_record.artifact_id,
                    **({"provider": provider} if provider else {}),
+                    **(
+                        {"provider_metadata": provider_metadata}
+                        if provider_metadata
+                        else {}
+                    ),
                },
                relationships=[
                    {
--- a/tests/test_generic_generator.py
+++ b/tests/test_generic_generator.py
@@ -0,0 +1,301 @@
+import json
+import os
+import subprocess
+import sys
+import zipfile
+from pathlib import Path
+
+import yaml
+
+from infospace_bench.generator import (
+    init_generation_infospace,
+    run_generation,
+    status_generation,
+)
+from infospace_bench.openrouter import OpenRouterAssistedGenerationAdapter
+from infospace_bench.source_intake import normalize_source
+
+
+def cli_env() -> dict[str, str]:
+    env = os.environ.copy()
+    env["PYTHONPATH"] = "src:/home/worsch/markitect-tool/src"
+    return env
+
+
+def fixture_responses(path: Path) -> None:
+    data = {
+        "responses": [
+            {
+                "stage_id": "summarize-source",
+                "input_artifact_id": "*",
+                "markdown": "# Source Summary\n\nThe source describes reusable knowledge work.\n",
+            },
+            {
+                "stage_id": "extract-entities",
+                "input_artifact_id": "*",
+                "markdown": (
+                    "# Knowledge Artifact\n\n"
+                    "## Definition\n\n"
+                    "A durable unit of structured knowledge derived from a source.\n\n"
+                    "## Context\n\n"
+                    "Generated from a generic source workflow.\n\n"
+                    "# Source Claim\n\n"
+                    "## Definition\n\n"
+                    "A claim preserved from the source for later review.\n\n"
+                    "## Context\n\n"
+                    "Used to keep provenance visible.\n"
+                ),
+            },
+            {
+                "stage_id": "extract-relations",
+                "input_artifact_id": "*",
+                "markdown": (
+                    "# Knowledge Artifact Supports Source Claim\n\n"
+                    "## Subject\n\n"
+                    "Knowledge Artifact\n\n"
+                    "## Predicate\n\n"
+                    "supports\n\n"
+                    "## Object\n\n"
+                    "Source Claim\n\n"
+                    "## Relation Type\n\n"
+                    "support\n\n"
+                    "## Evidence\n\n"
+                    "The source links durable artifacts to explicit claims.\n"
+                ),
+            },
+            {
+                "stage_id": "evaluate-entity",
+                "input_artifact_id": "*",
+                "markdown": (
+                    "---\n"
+                    "artifact_id: entity/knowledge-artifact.md\n"
+                    "evaluator: fixture\n"
+                    "evaluated_at: '2026-05-14T00:00:00'\n"
+                    "scores:\n"
+                    "  - name: groundedness\n"
+                    "    value: 4.0\n"
+                    "    max_value: 5.0\n"
+                    "  - name: usefulness\n"
+                    "    value: 4.0\n"
+                    "    max_value: 5.0\n"
+                    "---\n"
+                    "\n"
+                    "# Evaluation: entity/knowledge-artifact.md\n"
+                ),
+            },
+        ]
+    }
+    path.write_text(yaml.safe_dump(data, sort_keys=False), encoding="utf-8")
+
+
+def write_epub_fixture(path: Path) -> None:
+    with zipfile.ZipFile(path, "w") as archive:
+        archive.writestr("OEBPS/chapter1.xhtml", "<h1>Chapter One</h1><p>Alpha beta.</p>")
+        archive.writestr("OEBPS/chapter2.xhtml", "<h1>Chapter Two</h1><p>Gamma delta.</p>")
+
+
+def test_source_intake_accepts_article_ebook_and_folder(tmp_path: Path) -> None:
+    article = tmp_path / "article.html"
+    article.write_text(
+        "<html><head><title>Article Title</title></head>"
+        "<body><h1>Article Title</h1><p>One two three.</p></body></html>",
+        encoding="utf-8",
+    )
+    ebook = tmp_path / "book.epub"
+    write_epub_fixture(ebook)
+    folder = tmp_path / "collection"
+    folder.mkdir()
+    (folder / "note.md").write_text("# Note\n\nMarkdown source.", encoding="utf-8")
+    (folder / "memo.txt").write_text("Plain text source.", encoding="utf-8")
+
+    article_chunks = normalize_source(article)
+    ebook_chunks = normalize_source(ebook)
+    folder_chunks = normalize_source(folder)
+
+    assert article_chunks[0].source_type == "html"
+    assert article_chunks[0].title == "Article Title"
+    assert article_chunks[0].chunk_id == "article-title"
+    assert article_chunks[0].digest == normalize_source(article)[0].digest
+    assert [chunk.source_type for chunk in ebook_chunks] == ["epub", "epub"]
+    assert {chunk.source_type for chunk in folder_chunks} == {"markdown", "text"}
+    assert all(chunk.markdown.startswith("# ") for chunk in folder_chunks)
+
+
+def test_generate_from_source_cli_fixture_builds_infospace(tmp_path: Path) -> None:
+    source = tmp_path / "article.md"
+    source.write_text(
+        "# Reusable Knowledge\n\nA source about claims and durable artifacts.",
+        encoding="utf-8",
+    )
+    fixture = tmp_path / "responses.yaml"
+    fixture_responses(fixture)
+
+    result = subprocess.run(
+        [
+            sys.executable,
+            "-m",
+            "infospace_bench",
+            "generate",
+            "from-source",
+            str(source),
+            "--workspace",
+            str(tmp_path),
+            "--slug",
+            "article-space",
+            "--name",
+            "Article Space",
+            "--fixture-responses",
+            str(fixture),
+            "--apply",
+        ],
+        check=False,
+        env=cli_env(),
+        text=True,
+        capture_output=True,
+    )
+    assert result.returncode == 0, result.stderr
+    payload = json.loads(result.stdout)
+    root = Path(payload["root"])
+    status = subprocess.run(
+        [
+            sys.executable,
+            "-m",
+            "infospace_bench",
+            "generate",
+            "status",
+            str(root),
+        ],
+        check=False,
+        env=cli_env(),
+        text=True,
+        capture_output=True,
+    )
+    assert status.returncode == 0, status.stderr
+    status_payload = json.loads(status.stdout)
+
+    assert payload["status"] == "completed"
+    assert (root / "artifacts" / "sources" / "reusable-knowledge.md").is_file()
+    assert (root / "artifacts" / "entities" / "knowledge-artifact.md").is_file()
+    assert (root / "artifacts" / "relations" / "reusable-knowledge-relations.md").is_file()
+    assert (root / "output" / "metrics" / "metrics.yaml").is_file()
+    assert status_payload["source_chunk_count"] == 1
+    assert status_payload["entity_count"] == 2
+    assert status_payload["relation_count"] == 1
+    assert status_payload["stale"] is False
+
+
+def test_generate_from_ebook_and_folder_fixtures(tmp_path: Path) -> None:
+    fixture = tmp_path / "responses.yaml"
+    fixture_responses(fixture)
+    ebook = tmp_path / "book.epub"
+    write_epub_fixture(ebook)
+    folder = tmp_path / "folder"
+    folder.mkdir()
+    (folder / "first.md").write_text("# First\n\nOne source.", encoding="utf-8")
+    (folder / "second.txt").write_text("Second source.", encoding="utf-8")
+
+    for source, slug, expected_sources in (
+        (ebook, "book-space", 2),
+        (folder, "folder-space", 2),
+    ):
+        result = subprocess.run(
+            [
+                sys.executable,
+                "-m",
+                "infospace_bench",
+                "generate",
+                "from-source",
+                str(source),
+                "--workspace",
+                str(tmp_path),
+                "--slug",
+                slug,
+                "--name",
+                slug.replace("-", " ").title(),
+                "--fixture-responses",
+                str(fixture),
+                "--apply",
+            ],
+            check=False,
+            env=cli_env(),
+            text=True,
+            capture_output=True,
+        )
+        assert result.returncode == 0, result.stderr
+        payload = json.loads(result.stdout)
+        status = status_generation(Path(payload["root"]))
+        assert status["source_chunk_count"] == expected_sources
+        assert status["entity_count"] == 2
+        assert status["relation_count"] == expected_sources
+        assert status["history_snapshot_count"] == 1
+
+
+def test_generator_resume_is_idempotent_and_detects_stale_source(tmp_path: Path) -> None:
+    source = tmp_path / "note.md"
+    source.write_text("# Note\n\nInitial source.", encoding="utf-8")
+    fixture = tmp_path / "responses.yaml"
+    fixture_responses(fixture)
+    root = init_generation_infospace(tmp_path, source, "note-space", name="Note Space").root
+
+    first = run_generation(root, fixture_responses=fixture)
+    second = run_generation(root, fixture_responses=fixture, resume=True)
+    generated_source = root / "artifacts" / "sources" / "note.md"
+    generated_source.write_text("# Note\n\nChanged source.", encoding="utf-8")
+    stale_status = status_generation(root)
+
+    assert first.status == "completed"
+    assert second.status == "skipped"
+    assert second.skipped is True
+    assert stale_status["stale"] is True
+    assert stale_status["stale_sources"] == ["source/note.md"]
+
+
+def test_openrouter_adapter_uses_model_and_records_metadata() -> None:
+    requests: list[dict] = []
+
+    def transport(payload: dict, headers: dict[str, str], endpoint: str) -> dict:
+        requests.append({"payload": payload, "headers": headers, "endpoint": endpoint})
+        return {
+            "id": "or-request-1",
+            "choices": [{"message": {"content": "# Generated\n\nContent."}}],
+            "usage": {"prompt_tokens": 5, "completion_tokens": 3},
+        }
+
+    adapter = OpenRouterAssistedGenerationAdapter(
+        api_key="test-key",
+        model="openai/gpt-4o-mini",
+        transport=transport,
+        retry_limit=0,
+    )
+    result = adapter.generate(
+        type(
+            "Request",
+            (),
+            {
+                "prompt": "Generate markdown.",
+                "stage_id": "extract-entities",
+                "workflow_id": "generic-source-extract",
+                "input_artifact_id": "source/example.md",
+                "provider_hint": "openrouter",
+                "metadata": {},
+            },
+        )()
+    )
+
+    assert requests[0]["payload"]["model"] == "openai/gpt-4o-mini"
+    assert requests[0]["headers"]["Authorization"] == "Bearer test-key"
+    assert result.markdown == "# Generated\n\nContent."
+    assert result.provider == "openrouter"
+    assert result.metadata["model"] == "openai/gpt-4o-mini"
+    assert result.metadata["request_id"] == "or-request-1"
+    assert result.metadata["usage"]["completion_tokens"] == 3
+
+
+def test_generic_generator_docs_cover_openrouter_resume_and_cost_caps() -> None:
+    text = Path("docs/generic-source-generator.md").read_text(encoding="utf-8")
+
+    assert "OPENROUTER_API_KEY" in text
+    assert "--model" in text
+    assert "--max-chunks" in text
+    assert "resume" in text.lower()
+    assert "fixture-responses" in text
--- a/workplans/IB-WP-0015-generic-source-infospace-generator-cli.md
+++ b/workplans/IB-WP-0015-generic-source-infospace-generator-cli.md
@@ -4,7 +4,7 @@ type: workplan
 title: "Generic Source Infospace Generator CLI"
 domain: markitect
 repo: infospace-bench
-status: planned
+status: completed
 owner: markitect
 topic_slug: markitect
 created: "2026-05-14"
@@ -105,7 +105,7 @@ Default-safe modes:

 ```task
 id: IB-WP-0015-T01
-status: in_progress
+status: done
 priority: high
 state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
 ```
@@ -128,7 +128,7 @@ state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"

 ```task
 id: IB-WP-0015-T02
-status: in_progress
+status: done
 priority: high
 state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
 ```
@@ -152,7 +152,7 @@ state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"

 ```task
 id: IB-WP-0015-T03
-status: in_progress
+status: done
 priority: high
 state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
 ```
@@ -171,7 +171,7 @@ state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"

 ```task
 id: IB-WP-0015-T04
-status: todo
+status: done
 priority: high
 state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
 ```
@@ -192,7 +192,7 @@ state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"

 ```task
 id: IB-WP-0015-T05
-status: todo
+status: done
 priority: high
 state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
 ```
@@ -216,7 +216,7 @@ state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"

 ```task
 id: IB-WP-0015-T06
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
 ```
@@ -264,3 +264,18 @@ state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
  - `infospace-bench`: applied infospace generation workflow and CLI
  - `kontextual-engine`: durable runtime/retrieval/audit if needed later

+## Implementation Notes
+
+Completed on 2026-05-14.
+
+- Added generic source intake for Markdown, plain text, local HTML, EPUB-like
+  archives, and folder collections.
+- Added the `general-knowledge` profile with prompt templates and contracts.
+- Added an explicit OpenRouter assisted-generation adapter with mocked provider
+  tests and environment-based credential lookup.
+- Added `infospace-bench generate` subcommands for init, plan, run, resume,
+  status, and from-source flows.
+- Added generation state, resume skipping, source/profile stale detection,
+  metrics/history recording, and a manifest-backed generation report.
+- Added deterministic acceptance tests for article, ebook-like, and folder
+  generation using fixture responses.