generated from coulomb/repo-seed
generic source-to-infospace generator
This commit is contained in:
@@ -31,6 +31,7 @@ Start with:
|
||||
- `docs/legacy-infospace-migration-guide.md`
|
||||
- `docs/replacement-readiness-decision.md`
|
||||
- `docs/wealth-vsm-generation-pipeline.md`
|
||||
- `docs/generic-source-generator.md`
|
||||
- `infospaces/bootstrap-pilot/`
|
||||
- `infospaces/wealth-vsm-legacy-slice/`
|
||||
- `infospaces/wealth-vsm-generation-pilot/`
|
||||
|
||||
94
docs/generic-source-generator.md
Normal file
94
docs/generic-source-generator.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Generic Source Generator
|
||||
|
||||
Date: 2026-05-14
|
||||
|
||||
## Purpose
|
||||
|
||||
`infospace-bench generate` turns a local article, ebook-like file, or folder of
|
||||
knowledge sources into a manifest-backed infospace. It generalizes the
|
||||
Wealth/VSM pilot into an explicit workflow path with deterministic fixture
|
||||
support and an optional OpenRouter provider.
|
||||
|
||||
## Deterministic Run
|
||||
|
||||
Use fixture responses for repeatable tests and demos:
|
||||
|
||||
```bash
|
||||
infospace-bench generate from-source ./examples/article.md \
|
||||
--workspace . \
|
||||
--slug article-space \
|
||||
--name "Article Space" \
|
||||
--profile general-knowledge \
|
||||
--fixture-responses ./examples/responses.yaml \
|
||||
--apply
|
||||
```
|
||||
|
||||
The command creates normalized source chunks, installs the selected profile,
|
||||
runs the declared workflows, writes entities, relations, evaluations, metrics,
|
||||
history, and a generation report, then registers artifacts in
|
||||
`artifacts/index.yaml`.
|
||||
|
||||
## Stepwise Workflow
|
||||
|
||||
```bash
|
||||
infospace-bench generate init ./book.epub \
|
||||
--workspace . \
|
||||
--slug book-space \
|
||||
--name "Book Space" \
|
||||
--profile general-knowledge \
|
||||
--max-chunks 3
|
||||
|
||||
infospace-bench generate plan ./infospaces/book-space --stage all
|
||||
infospace-bench generate run ./infospaces/book-space \
|
||||
--fixture-responses ./responses.yaml
|
||||
infospace-bench generate status ./infospaces/book-space
|
||||
```
|
||||
|
||||
`--max-chunks` caps early experiments and provider cost. `generate status`
|
||||
shows chunk counts, generated artifact counts, evaluations, metrics, history,
|
||||
and stale source/profile inputs.
|
||||
|
||||
## OpenRouter
|
||||
|
||||
Live model calls are explicit:
|
||||
|
||||
```bash
|
||||
export OPENROUTER_API_KEY=...
|
||||
|
||||
infospace-bench generate run ./infospaces/book-space \
|
||||
--provider openrouter \
|
||||
--model openai/gpt-4o-mini \
|
||||
--stage all
|
||||
```
|
||||
|
||||
Choose the `--model` value from OpenRouter model IDs. The API key is read from
|
||||
`OPENROUTER_API_KEY`; it is not written to `infospace.yaml`. Default tests never
|
||||
make live provider calls.
|
||||
|
||||
## Resume
|
||||
|
||||
Use resume for interrupted or reviewed runs:
|
||||
|
||||
```bash
|
||||
infospace-bench generate resume ./infospaces/book-space \
|
||||
--provider openrouter \
|
||||
--model openai/gpt-4o-mini
|
||||
```
|
||||
|
||||
Unchanged completed runs are skipped. Use `--force` when you intentionally want
|
||||
to rerun completed work. Stale status is reported when source artifact digests
|
||||
or installed profile/template files change.
|
||||
|
||||
## Review Path
|
||||
|
||||
After generation:
|
||||
|
||||
- inspect `artifacts/sources/` for normalized input chunks
|
||||
- inspect `artifacts/entities/` and `artifacts/relations/` for generated claims
|
||||
- inspect `output/evaluations/` for rubric output
|
||||
- run `infospace-bench validate <root>` and `infospace-bench graph <root>`
|
||||
- review `reports/generation-summary.md`
|
||||
|
||||
Move from the generic profile to a specialized profile when the source domain
|
||||
needs stricter terminology, narrower extraction granularity, or a discipline
|
||||
lens such as VSM.
|
||||
@@ -11,6 +11,9 @@ dependencies = [
|
||||
[project.scripts]
|
||||
infospace-bench = "infospace_bench.cli:main"
|
||||
|
||||
[tool.setuptools.package-data]
|
||||
infospace_bench = ["profiles/**/*"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
pythonpath = ["src", "../markitect-tool/src"]
|
||||
testpaths = ["tests"]
|
||||
|
||||
@@ -10,6 +10,12 @@ from .checks import run_collection_checks
|
||||
from .engine import engine_capability_contract, plan_asset_sync, sync_assets
|
||||
from .errors import InfospaceError
|
||||
from .evaluation_io import read_entity_evaluations
|
||||
from .generator import (
|
||||
init_generation_infospace,
|
||||
plan_generation,
|
||||
run_generation,
|
||||
status_generation,
|
||||
)
|
||||
from .history import (
|
||||
build_viability_report,
|
||||
find_snapshot,
|
||||
@@ -123,6 +129,72 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
help="Run assisted stages with deterministic fixture responses",
|
||||
)
|
||||
|
||||
generate = sub.add_parser("generate", help="Generate infospaces from sources")
|
||||
generate_sub = generate.add_subparsers(dest="generate_command", required=True)
|
||||
|
||||
generate_init = generate_sub.add_parser(
|
||||
"init",
|
||||
help="Create a generation infospace from a local source",
|
||||
)
|
||||
generate_init.add_argument("source")
|
||||
generate_init.add_argument("--workspace", default=".")
|
||||
generate_init.add_argument("--slug", required=True)
|
||||
generate_init.add_argument("--name", required=True)
|
||||
generate_init.add_argument("--profile", default="general-knowledge")
|
||||
generate_init.add_argument("--max-chunks", type=int, default=0)
|
||||
|
||||
generate_plan = generate_sub.add_parser(
|
||||
"plan",
|
||||
help="Plan generator work without provider calls",
|
||||
)
|
||||
generate_plan.add_argument("root")
|
||||
generate_plan.add_argument("--stage", default="all")
|
||||
|
||||
generate_run = generate_sub.add_parser(
|
||||
"run",
|
||||
help="Run generator workflows for an infospace",
|
||||
)
|
||||
generate_run.add_argument("root")
|
||||
generate_run.add_argument("--stage", default="all")
|
||||
generate_run.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_run.add_argument("--model", default="")
|
||||
generate_run.add_argument("--fixture-responses", default="")
|
||||
generate_run.add_argument("--resume", action="store_true")
|
||||
generate_run.add_argument("--force", action="store_true")
|
||||
|
||||
generate_resume = generate_sub.add_parser(
|
||||
"resume",
|
||||
help="Resume generator workflows for an infospace",
|
||||
)
|
||||
generate_resume.add_argument("root")
|
||||
generate_resume.add_argument("--stage", default="all")
|
||||
generate_resume.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_resume.add_argument("--model", default="")
|
||||
generate_resume.add_argument("--fixture-responses", default="")
|
||||
generate_resume.add_argument("--force", action="store_true")
|
||||
|
||||
generate_status = generate_sub.add_parser(
|
||||
"status",
|
||||
help="Inspect generator status for an infospace",
|
||||
)
|
||||
generate_status.add_argument("root")
|
||||
|
||||
generate_from_source = generate_sub.add_parser(
|
||||
"from-source",
|
||||
help="Initialize and optionally run generation from a local source",
|
||||
)
|
||||
generate_from_source.add_argument("source")
|
||||
generate_from_source.add_argument("--workspace", default=".")
|
||||
generate_from_source.add_argument("--slug", required=True)
|
||||
generate_from_source.add_argument("--name", required=True)
|
||||
generate_from_source.add_argument("--profile", default="general-knowledge")
|
||||
generate_from_source.add_argument("--stage", default="all")
|
||||
generate_from_source.add_argument("--provider", choices=["fixture", "openrouter"], default="fixture")
|
||||
generate_from_source.add_argument("--model", default="")
|
||||
generate_from_source.add_argument("--fixture-responses", default="")
|
||||
generate_from_source.add_argument("--max-chunks", type=int, default=0)
|
||||
generate_from_source.add_argument("--apply", action="store_true")
|
||||
|
||||
engine = sub.add_parser("engine", help="Inspect and sync engine boundary state")
|
||||
engine_sub = engine.add_subparsers(dest="engine_command", required=True)
|
||||
|
||||
@@ -284,6 +356,73 @@ def main(argv: list[str] | None = None) -> int:
|
||||
)
|
||||
else:
|
||||
parser.error(f"Unhandled workflow command: {args.workflow_command}")
|
||||
elif args.command == "generate":
|
||||
if args.generate_command == "init":
|
||||
infospace = init_generation_infospace(
|
||||
Path(args.workspace),
|
||||
Path(args.source),
|
||||
args.slug,
|
||||
name=args.name,
|
||||
profile=args.profile,
|
||||
max_chunks=_optional_positive(args.max_chunks),
|
||||
)
|
||||
_write_json(
|
||||
{
|
||||
"slug": infospace.config.slug,
|
||||
"root": str(infospace.root),
|
||||
"status": "initialized",
|
||||
}
|
||||
)
|
||||
elif args.generate_command == "plan":
|
||||
_write_json(plan_generation(Path(args.root), stage=args.stage))
|
||||
elif args.generate_command == "run":
|
||||
_write_json(
|
||||
run_generation(
|
||||
Path(args.root),
|
||||
stage=args.stage,
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
resume=args.resume,
|
||||
force=args.force,
|
||||
).to_dict()
|
||||
)
|
||||
elif args.generate_command == "resume":
|
||||
_write_json(
|
||||
run_generation(
|
||||
Path(args.root),
|
||||
stage=args.stage,
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
resume=True,
|
||||
force=args.force,
|
||||
).to_dict()
|
||||
)
|
||||
elif args.generate_command == "status":
|
||||
_write_json(status_generation(Path(args.root)))
|
||||
elif args.generate_command == "from-source":
|
||||
infospace = init_generation_infospace(
|
||||
Path(args.workspace),
|
||||
Path(args.source),
|
||||
args.slug,
|
||||
name=args.name,
|
||||
profile=args.profile,
|
||||
max_chunks=_optional_positive(args.max_chunks),
|
||||
)
|
||||
if args.apply:
|
||||
result = run_generation(
|
||||
infospace.root,
|
||||
stage=args.stage,
|
||||
provider=args.provider,
|
||||
model=args.model,
|
||||
fixture_responses=args.fixture_responses or None,
|
||||
)
|
||||
_write_json(result.to_dict())
|
||||
else:
|
||||
_write_json(plan_generation(infospace.root, stage=args.stage))
|
||||
else:
|
||||
parser.error(f"Unhandled generate command: {args.generate_command}")
|
||||
elif args.command == "engine":
|
||||
if args.engine_command == "inspect":
|
||||
_write_json(
|
||||
@@ -377,3 +516,7 @@ def _relationship_summary_payload(summary) -> dict:
|
||||
|
||||
def _write_json(payload: dict) -> None:
|
||||
print(json.dumps(payload, indent=2))
|
||||
|
||||
|
||||
def _optional_positive(value: int) -> int | None:
|
||||
return value if value > 0 else None
|
||||
|
||||
525
src/infospace_bench/generator.py
Normal file
525
src/infospace_bench/generator.py
Normal file
@@ -0,0 +1,525 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import shutil
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
from .checks import run_collection_checks
|
||||
from .errors import InfospaceError
|
||||
from .evaluation_io import read_entity_evaluations
|
||||
from .history import get_history, read_metrics_file, record_check_results
|
||||
from .lifecycle import create_infospace, load_infospace, register_artifact
|
||||
from .openrouter import OpenRouterAssistedGenerationAdapter
|
||||
from .source_intake import SourceChunk, normalize_source
|
||||
from .workflow import (
|
||||
AssistedGenerationAdapter,
|
||||
FixtureAssistedGenerationAdapter,
|
||||
WorkflowRunResult,
|
||||
plan_workflow,
|
||||
run_workflow,
|
||||
)
|
||||
|
||||
STATE_PATH = Path("output/workflows/generation-state.yaml")
|
||||
DEFAULT_PROFILE = "general-knowledge"
|
||||
WORKFLOW_BY_STAGE = {
|
||||
"summary": ["generic-source-summary"],
|
||||
"summarize": ["generic-source-summary"],
|
||||
"extract": ["generic-source-entities"],
|
||||
"entities": ["generic-source-entities"],
|
||||
"relations": ["generic-source-relations"],
|
||||
"evaluate": ["generic-source-evaluations"],
|
||||
"evaluation": ["generic-source-evaluations"],
|
||||
"all": [
|
||||
"generic-source-summary",
|
||||
"generic-source-entities",
|
||||
"generic-source-relations",
|
||||
"generic-source-evaluations",
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class GenerationRunResult:
|
||||
root: str
|
||||
status: str
|
||||
stage: str
|
||||
skipped: bool = False
|
||||
stale: bool = False
|
||||
workflows: list[dict[str, Any]] = field(default_factory=list)
|
||||
metrics: dict[str, Any] = field(default_factory=dict)
|
||||
history_snapshot_id: str = ""
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = asdict(self)
|
||||
return {key: value for key, value in data.items() if value not in ("", [], {})}
|
||||
|
||||
|
||||
def init_generation_infospace(
|
||||
workspace: str | Path,
|
||||
source: str | Path,
|
||||
slug: str,
|
||||
*,
|
||||
name: str,
|
||||
profile: str = DEFAULT_PROFILE,
|
||||
max_chunks: int | None = None,
|
||||
) -> Any:
|
||||
chunks = normalize_source(source, max_chunks=max_chunks)
|
||||
infospace = create_infospace(Path(workspace), slug, name=name)
|
||||
_install_profile(infospace.root, profile)
|
||||
_write_workflows(infospace.root, profile)
|
||||
_register_source_chunks(infospace.root, chunks)
|
||||
_write_state(
|
||||
infospace.root,
|
||||
{
|
||||
"profile": profile,
|
||||
"source": str(Path(source)),
|
||||
"source_chunks": _source_state(infospace.root),
|
||||
"profile_digest": _profile_digest(infospace.root, profile),
|
||||
"stage_status": {},
|
||||
"completed": False,
|
||||
"created_at": _now(),
|
||||
"updated_at": _now(),
|
||||
},
|
||||
)
|
||||
return load_infospace(infospace.root)
|
||||
|
||||
|
||||
def plan_generation(root: str | Path, *, stage: str = "all") -> dict[str, Any]:
|
||||
root_path = Path(root)
|
||||
workflow_ids = _workflow_ids_for_stage(stage)
|
||||
plans: list[dict[str, Any]] = []
|
||||
for workflow_id in workflow_ids:
|
||||
try:
|
||||
plans.append(plan_workflow(root_path, workflow_id).to_dict())
|
||||
except InfospaceError as exc:
|
||||
plans.append(
|
||||
{
|
||||
"workflow_id": workflow_id,
|
||||
"status": "blocked",
|
||||
"error": exc.to_dict(),
|
||||
}
|
||||
)
|
||||
status = status_generation(root_path)
|
||||
return {
|
||||
"root": str(root_path),
|
||||
"stage": stage,
|
||||
"status": "planned",
|
||||
"stale": status["stale"],
|
||||
"source_chunk_count": status["source_chunk_count"],
|
||||
"workflows": plans,
|
||||
}
|
||||
|
||||
|
||||
def run_generation(
|
||||
root: str | Path,
|
||||
*,
|
||||
stage: str = "all",
|
||||
provider: str = "fixture",
|
||||
model: str = "",
|
||||
fixture_responses: str | Path | None = None,
|
||||
resume: bool = False,
|
||||
force: bool = False,
|
||||
) -> GenerationRunResult:
|
||||
root_path = Path(root)
|
||||
stage_key = stage.strip().lower()
|
||||
state = _read_state(root_path)
|
||||
status = status_generation(root_path)
|
||||
workflow_ids = _workflow_ids_for_stage(stage_key)
|
||||
if resume and not force and state.get("completed") is True and not status["stale"]:
|
||||
return GenerationRunResult(
|
||||
root=str(root_path),
|
||||
status="skipped",
|
||||
stage=stage,
|
||||
skipped=True,
|
||||
stale=False,
|
||||
workflows=[],
|
||||
metrics=status.get("metrics", {}),
|
||||
)
|
||||
|
||||
adapter = (
|
||||
_adapter_for(provider, model=model, fixture_responses=fixture_responses)
|
||||
if workflow_ids
|
||||
else None
|
||||
)
|
||||
workflow_results: list[dict[str, Any]] = []
|
||||
for workflow_id in workflow_ids:
|
||||
result = run_workflow(root_path, workflow_id, assisted_adapter=adapter)
|
||||
workflow_results.append(result.to_dict())
|
||||
state = _mark_workflow_completed(state, result)
|
||||
|
||||
metrics: dict[str, Any] = {}
|
||||
snapshot_id = ""
|
||||
if stage_key in {"all", "metrics"}:
|
||||
check_result = _record_metrics(root_path)
|
||||
metrics = check_result.metrics
|
||||
snapshot_id = check_result.snapshot.snapshot_id
|
||||
_write_generation_report(root_path, metrics, snapshot_id)
|
||||
|
||||
state.update(
|
||||
{
|
||||
"source_chunks": _source_state(root_path),
|
||||
"profile_digest": _profile_digest(root_path, str(state.get("profile") or DEFAULT_PROFILE)),
|
||||
"completed": stage_key in {"all", "metrics"},
|
||||
"updated_at": _now(),
|
||||
"last_run": {
|
||||
"stage": stage,
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
"workflow_count": len(workflow_results),
|
||||
"snapshot_id": snapshot_id,
|
||||
"completed_at": _now(),
|
||||
},
|
||||
}
|
||||
)
|
||||
_write_state(root_path, state)
|
||||
return GenerationRunResult(
|
||||
root=str(root_path),
|
||||
status="completed",
|
||||
stage=stage,
|
||||
skipped=False,
|
||||
stale=False,
|
||||
workflows=workflow_results,
|
||||
metrics=metrics,
|
||||
history_snapshot_id=snapshot_id,
|
||||
)
|
||||
|
||||
|
||||
def status_generation(root: str | Path) -> dict[str, Any]:
|
||||
root_path = Path(root)
|
||||
infospace = load_infospace(root_path)
|
||||
state = _read_state(root_path)
|
||||
stale_sources = _stale_source_ids(infospace.root)
|
||||
profile = str(state.get("profile") or DEFAULT_PROFILE)
|
||||
stale_profile = bool(
|
||||
state.get("profile_digest")
|
||||
and state.get("profile_digest") != _profile_digest(infospace.root, profile)
|
||||
)
|
||||
evaluations = read_entity_evaluations(infospace.root / "output" / "evaluations")
|
||||
history = get_history(infospace.root)
|
||||
return {
|
||||
"root": str(infospace.root),
|
||||
"slug": infospace.config.slug,
|
||||
"profile": profile,
|
||||
"source_chunk_count": sum(1 for item in infospace.artifacts if item.kind == "source"),
|
||||
"entity_count": sum(1 for item in infospace.artifacts if item.kind == "entity"),
|
||||
"relation_count": sum(1 for item in infospace.artifacts if item.kind == "relation"),
|
||||
"evaluation_count": len(evaluations),
|
||||
"generated_count": sum(1 for item in infospace.artifacts if item.kind == "generated"),
|
||||
"metrics": read_metrics_file(infospace.root / "output" / "metrics" / "metrics.yaml"),
|
||||
"history_snapshot_count": len(history),
|
||||
"latest_snapshot_id": history[-1].snapshot_id if history else "",
|
||||
"stale": bool(stale_sources or stale_profile),
|
||||
"stale_sources": stale_sources,
|
||||
"stale_profile": stale_profile,
|
||||
"completed": bool(state.get("completed", False)),
|
||||
"stage_status": state.get("stage_status", {}),
|
||||
}
|
||||
|
||||
|
||||
def _adapter_for(
|
||||
provider: str,
|
||||
*,
|
||||
model: str,
|
||||
fixture_responses: str | Path | None,
|
||||
) -> AssistedGenerationAdapter:
|
||||
if fixture_responses:
|
||||
return FixtureAssistedGenerationAdapter.from_file(Path(fixture_responses))
|
||||
if provider == "openrouter":
|
||||
return OpenRouterAssistedGenerationAdapter(model=model)
|
||||
raise InfospaceError(
|
||||
"missing_assisted_generation_adapter",
|
||||
"Assisted generation requires --fixture-responses or --provider openrouter",
|
||||
{"provider": provider},
|
||||
)
|
||||
|
||||
|
||||
def _register_source_chunks(root: Path, chunks: list[SourceChunk]) -> None:
|
||||
for chunk in chunks:
|
||||
path = root / "artifacts" / "sources" / f"{chunk.chunk_id}.md"
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(chunk.markdown, encoding="utf-8")
|
||||
register_artifact(
|
||||
root,
|
||||
artifact_id=f"source/{chunk.chunk_id}.md",
|
||||
path=path,
|
||||
kind="source",
|
||||
title=chunk.title,
|
||||
provenance={
|
||||
"original_path": chunk.original_path,
|
||||
"source_type": chunk.source_type,
|
||||
"digest": chunk.digest,
|
||||
"chunk_id": chunk.chunk_id,
|
||||
"chunk_index": chunk.chunk_index,
|
||||
"chunk_count": chunk.chunk_count,
|
||||
"imported_at": chunk.imported_at,
|
||||
"extractor_version": chunk.extractor_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _install_profile(root: Path, profile: str) -> None:
|
||||
source = Path(__file__).parent / "profiles" / profile
|
||||
if not source.is_dir():
|
||||
raise InfospaceError(
|
||||
"missing_generation_profile",
|
||||
f"Generation profile does not exist: {profile}",
|
||||
{"profile": profile, "path": str(source)},
|
||||
)
|
||||
profile_target = root / "profiles" / profile
|
||||
template_target = root / "workflows" / "templates" / profile
|
||||
shutil.copytree(source, profile_target, dirs_exist_ok=True)
|
||||
shutil.copytree(source / "templates", template_target, dirs_exist_ok=True)
|
||||
|
||||
|
||||
def _write_workflows(root: Path, profile: str) -> None:
|
||||
config_path = root / "infospace.yaml"
|
||||
config = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
|
||||
config["schemas"] = {
|
||||
**dict(config.get("schemas") or {}),
|
||||
"entity": f"profiles/{profile}/contracts/entity.contract.md",
|
||||
"relation": f"profiles/{profile}/contracts/relation.contract.md",
|
||||
"evaluation": f"profiles/{profile}/contracts/evaluation.contract.md",
|
||||
}
|
||||
config["workflows"] = _profile_workflows(profile)
|
||||
config_path.write_text(yaml.safe_dump(config, sort_keys=False), encoding="utf-8")
|
||||
|
||||
|
||||
def _profile_workflows(profile: str) -> list[dict[str, Any]]:
|
||||
base = f"workflows/templates/{profile}"
|
||||
return [
|
||||
{
|
||||
"id": "generic-source-summary",
|
||||
"description": "Summarize normalized source chunks.",
|
||||
"inputs": {"source": {"kind": "source"}},
|
||||
"static_macros": {"profile": profile},
|
||||
"stages": [
|
||||
{
|
||||
"id": "summarize-source",
|
||||
"kind": "assisted",
|
||||
"input": "source",
|
||||
"template": f"{base}/summarize-source.md",
|
||||
"provider_hint": "openrouter",
|
||||
"output": {
|
||||
"path": "artifacts/generated/{{ input.slug }}-summary.md",
|
||||
"artifact_id": "generated/{{ input.slug }}-summary.md",
|
||||
"kind": "generated",
|
||||
"title": "{{ input.title }} Summary",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "generic-source-entities",
|
||||
"description": "Extract reusable entity artifacts from source chunks.",
|
||||
"inputs": {"source": {"kind": "source"}},
|
||||
"static_macros": {"profile": profile},
|
||||
"stages": [
|
||||
{
|
||||
"id": "extract-entities",
|
||||
"kind": "assisted",
|
||||
"input": "source",
|
||||
"template": f"{base}/extract-entities.md",
|
||||
"provider_hint": "openrouter",
|
||||
"output": {
|
||||
"path": "artifacts/generated/{{ input.slug }}-entities.md",
|
||||
"artifact_id": "generated/{{ input.slug }}-entities.md",
|
||||
"kind": "generated",
|
||||
"title": "{{ input.title }} Entity Bundle",
|
||||
},
|
||||
},
|
||||
{
|
||||
"id": "split-entities",
|
||||
"kind": "split_entities",
|
||||
"input": "source",
|
||||
"template": "",
|
||||
"static_macros": {"bundle_stage": "extract-entities"},
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "generic-source-relations",
|
||||
"description": "Extract relation artifacts from source chunks.",
|
||||
"inputs": {"source": {"kind": "source"}},
|
||||
"static_macros": {"profile": profile},
|
||||
"stages": [
|
||||
{
|
||||
"id": "extract-relations",
|
||||
"kind": "assisted",
|
||||
"input": "source",
|
||||
"template": f"{base}/extract-relations.md",
|
||||
"provider_hint": "openrouter",
|
||||
"output": {
|
||||
"path": "artifacts/relations/{{ input.slug }}-relations.md",
|
||||
"artifact_id": "relation/{{ input.slug }}-relations.md",
|
||||
"kind": "relation",
|
||||
"title": "{{ input.title }} Relations",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "generic-source-evaluations",
|
||||
"description": "Evaluate generated entities with the profile rubric.",
|
||||
"inputs": {"entity": {"kind": "entity"}},
|
||||
"static_macros": {"profile": profile},
|
||||
"stages": [
|
||||
{
|
||||
"id": "evaluate-entity",
|
||||
"kind": "assisted",
|
||||
"input": "entity",
|
||||
"template": f"{base}/evaluate-entity.md",
|
||||
"provider_hint": "openrouter",
|
||||
"output": {
|
||||
"path": "output/evaluations/{{ input.slug }}.md",
|
||||
"artifact_id": "generated/evaluation-{{ input.slug }}.md",
|
||||
"kind": "generated",
|
||||
"title": "{{ input.title }} Evaluation",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def _record_metrics(root: Path) -> Any:
|
||||
infospace = load_infospace(root)
|
||||
return record_check_results(
|
||||
infospace.root,
|
||||
run_collection_checks(infospace.artifacts),
|
||||
artifact_evaluations=read_entity_evaluations(infospace.root / "output" / "evaluations"),
|
||||
schema_name="generic-source",
|
||||
metadata={"generator": "generic-source"},
|
||||
)
|
||||
|
||||
|
||||
def _write_generation_report(root: Path, metrics: dict[str, Any], snapshot_id: str) -> None:
|
||||
status = status_generation(root)
|
||||
text = "\n".join(
|
||||
[
|
||||
"# Generation Report",
|
||||
"",
|
||||
f"Snapshot: {snapshot_id}",
|
||||
f"Sources: {status['source_chunk_count']}",
|
||||
f"Entities: {status['entity_count']}",
|
||||
f"Relations: {status['relation_count']}",
|
||||
f"Evaluations: {status['evaluation_count']}",
|
||||
"",
|
||||
"## Metrics",
|
||||
"",
|
||||
*[f"- {name}: {value}" for name, value in sorted(metrics.items())],
|
||||
"",
|
||||
]
|
||||
)
|
||||
path = root / "reports" / "generation-summary.md"
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(text, encoding="utf-8")
|
||||
register_artifact(
|
||||
root,
|
||||
artifact_id="generated/generation-summary.md",
|
||||
path=path,
|
||||
kind="generated",
|
||||
title="Generation Summary",
|
||||
provenance={"workflow_id": "generic-source-generator", "snapshot_id": snapshot_id},
|
||||
)
|
||||
|
||||
|
||||
def _workflow_ids_for_stage(stage: str) -> list[str]:
|
||||
normalized = stage.strip().lower()
|
||||
if normalized == "intake":
|
||||
return []
|
||||
if normalized == "metrics":
|
||||
return []
|
||||
if normalized not in WORKFLOW_BY_STAGE:
|
||||
raise InfospaceError(
|
||||
"invalid_generation_stage",
|
||||
f"Unsupported generation stage: {stage}",
|
||||
{
|
||||
"stage": stage,
|
||||
"valid_stages": sorted([*WORKFLOW_BY_STAGE, "intake", "metrics"]),
|
||||
},
|
||||
)
|
||||
return WORKFLOW_BY_STAGE[normalized]
|
||||
|
||||
|
||||
def _source_state(root: Path) -> dict[str, Any]:
|
||||
infospace = load_infospace(root)
|
||||
return {
|
||||
item.id: {
|
||||
"path": item.path,
|
||||
"digest": item.provenance.get("digest", ""),
|
||||
"title": item.title,
|
||||
"source_type": item.provenance.get("source_type", ""),
|
||||
"chunk_id": item.provenance.get("chunk_id", ""),
|
||||
}
|
||||
for item in infospace.artifacts
|
||||
if item.kind == "source"
|
||||
}
|
||||
|
||||
|
||||
def _stale_source_ids(root: Path) -> list[str]:
|
||||
infospace = load_infospace(root)
|
||||
stale: list[str] = []
|
||||
for item in infospace.artifacts:
|
||||
if item.kind != "source":
|
||||
continue
|
||||
path = infospace.root / item.path
|
||||
expected = str(item.provenance.get("digest") or "")
|
||||
if not path.is_file() or (expected and _digest_text(path.read_text(encoding="utf-8")) != expected):
|
||||
stale.append(item.id)
|
||||
return stale
|
||||
|
||||
|
||||
def _mark_workflow_completed(
|
||||
state: dict[str, Any],
|
||||
result: WorkflowRunResult,
|
||||
) -> dict[str, Any]:
|
||||
stage_status = dict(state.get("stage_status") or {})
|
||||
stage_status[result.workflow_id] = {
|
||||
"status": result.status,
|
||||
"run_id": result.run_id,
|
||||
"output_artifact_ids": [output.artifact_id for output in result.outputs],
|
||||
"updated_at": _now(),
|
||||
}
|
||||
return {**state, "stage_status": stage_status}
|
||||
|
||||
|
||||
def _profile_digest(root: Path, profile: str) -> str:
|
||||
files: list[Path] = []
|
||||
for base in (
|
||||
root / "profiles" / profile,
|
||||
root / "workflows" / "templates" / profile,
|
||||
):
|
||||
if base.is_dir():
|
||||
files.extend(path for path in sorted(base.rglob("*")) if path.is_file())
|
||||
hasher = hashlib.sha256()
|
||||
for path in files:
|
||||
hasher.update(str(path.relative_to(root)).encode("utf-8"))
|
||||
hasher.update(path.read_bytes())
|
||||
return hasher.hexdigest()
|
||||
|
||||
|
||||
def _read_state(root: Path) -> dict[str, Any]:
|
||||
path = root / STATE_PATH
|
||||
if not path.is_file():
|
||||
return {}
|
||||
data = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||
return data if isinstance(data, dict) else {}
|
||||
|
||||
|
||||
def _write_state(root: Path, state: dict[str, Any]) -> None:
|
||||
path = root / STATE_PATH
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(yaml.safe_dump(state, sort_keys=False), encoding="utf-8")
|
||||
|
||||
|
||||
def _digest_text(text: str) -> str:
|
||||
return hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def _now() -> str:
|
||||
return datetime.now(timezone.utc).isoformat()
|
||||
142
src/infospace_bench/openrouter.py
Normal file
142
src/infospace_bench/openrouter.py
Normal file
@@ -0,0 +1,142 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import time
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Callable
|
||||
|
||||
from .errors import InfospaceError
|
||||
from .workflow import AssistedGenerationRequest, AssistedGenerationResult
|
||||
|
||||
OPENROUTER_ENDPOINT = "https://openrouter.ai/api/v1/chat/completions"
|
||||
Transport = Callable[[dict[str, Any], dict[str, str], str], dict[str, Any]]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class OpenRouterAssistedGenerationAdapter:
|
||||
model: str
|
||||
api_key: str = ""
|
||||
endpoint: str = OPENROUTER_ENDPOINT
|
||||
transport: Transport | None = None
|
||||
retry_limit: int = 2
|
||||
timeout_seconds: float = 60.0
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
key = self.api_key or os.environ.get("OPENROUTER_API_KEY", "")
|
||||
if not key:
|
||||
raise InfospaceError(
|
||||
"missing_openrouter_api_key",
|
||||
"OPENROUTER_API_KEY is required for the OpenRouter provider",
|
||||
{"env": "OPENROUTER_API_KEY"},
|
||||
)
|
||||
object.__setattr__(self, "api_key", key)
|
||||
if not self.model:
|
||||
raise InfospaceError(
|
||||
"missing_openrouter_model",
|
||||
"OpenRouter provider requires an explicit model",
|
||||
{"option": "--model"},
|
||||
)
|
||||
|
||||
def generate(
|
||||
self,
|
||||
request: AssistedGenerationRequest,
|
||||
) -> AssistedGenerationResult:
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"Return concise, valid Markdown only. Preserve explicit "
|
||||
"contracts requested in the user prompt."
|
||||
),
|
||||
},
|
||||
{"role": "user", "content": request.prompt},
|
||||
],
|
||||
"metadata": {
|
||||
"workflow_id": request.workflow_id,
|
||||
"stage_id": request.stage_id,
|
||||
"input_artifact_id": request.input_artifact_id,
|
||||
},
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
"HTTP-Referer": "https://github.com/markitect/infospace-bench",
|
||||
"X-Title": "infospace-bench",
|
||||
}
|
||||
started = time.monotonic()
|
||||
retry_count = 0
|
||||
last_error = ""
|
||||
while True:
|
||||
try:
|
||||
response = (
|
||||
self.transport(payload, headers, self.endpoint)
|
||||
if self.transport is not None
|
||||
else self._default_transport(payload, headers, self.endpoint)
|
||||
)
|
||||
choice = (response.get("choices") or [{}])[0]
|
||||
message = choice.get("message") or {}
|
||||
markdown = str(message.get("content") or "")
|
||||
if not markdown:
|
||||
raise InfospaceError(
|
||||
"empty_openrouter_response",
|
||||
"OpenRouter returned an empty assistant response",
|
||||
{"model": self.model, "response_id": response.get("id")},
|
||||
)
|
||||
return AssistedGenerationResult(
|
||||
markdown=markdown,
|
||||
provider="openrouter",
|
||||
metadata={
|
||||
"model": self.model,
|
||||
"request_id": str(response.get("id") or ""),
|
||||
"usage": response.get("usage") or {},
|
||||
"retry_count": retry_count,
|
||||
"duration_seconds": round(time.monotonic() - started, 3),
|
||||
},
|
||||
)
|
||||
except (urllib.error.HTTPError, urllib.error.URLError, TimeoutError) as exc:
|
||||
last_error = str(exc)
|
||||
except InfospaceError:
|
||||
raise
|
||||
except Exception as exc: # pragma: no cover - defensive provider boundary
|
||||
last_error = str(exc)
|
||||
|
||||
if retry_count >= self.retry_limit:
|
||||
raise InfospaceError(
|
||||
"openrouter_request_failed",
|
||||
"OpenRouter request failed after bounded retries",
|
||||
{
|
||||
"model": self.model,
|
||||
"retry_count": retry_count,
|
||||
"error": last_error,
|
||||
},
|
||||
)
|
||||
retry_count += 1
|
||||
time.sleep(min(2**retry_count, 8))
|
||||
|
||||
def _default_transport(
|
||||
self,
|
||||
payload: dict[str, Any],
|
||||
headers: dict[str, str],
|
||||
endpoint: str,
|
||||
) -> dict[str, Any]:
|
||||
request = urllib.request.Request(
|
||||
endpoint,
|
||||
data=json.dumps(payload).encode("utf-8"),
|
||||
headers=headers,
|
||||
method="POST",
|
||||
)
|
||||
with urllib.request.urlopen(request, timeout=self.timeout_seconds) as response:
|
||||
data = response.read().decode("utf-8")
|
||||
parsed = json.loads(data)
|
||||
if not isinstance(parsed, dict):
|
||||
raise InfospaceError(
|
||||
"invalid_openrouter_response",
|
||||
"OpenRouter returned a non-object JSON response",
|
||||
{"model": self.model},
|
||||
)
|
||||
return parsed
|
||||
@@ -0,0 +1,9 @@
|
||||
# Entity Contract
|
||||
|
||||
Each generated entity must be a Markdown artifact with:
|
||||
|
||||
- one top-level heading containing the entity title
|
||||
- a `## Definition` section
|
||||
- optional `## Context`, `## Source Evidence`, and `## Review Notes` sections
|
||||
|
||||
Entity titles should be stable, short, and reusable across source chunks.
|
||||
@@ -0,0 +1,10 @@
|
||||
# Evaluation Contract
|
||||
|
||||
Each evaluation must be Markdown with YAML frontmatter containing:
|
||||
|
||||
- `artifact_id`
|
||||
- `evaluator`
|
||||
- `evaluated_at`
|
||||
- `scores`
|
||||
|
||||
Scores should include groundedness and usefulness on a 0 to 5 scale.
|
||||
@@ -0,0 +1,11 @@
|
||||
# Relation Contract
|
||||
|
||||
Each generated relation must be a Markdown artifact with:
|
||||
|
||||
- one top-level heading containing the relation title
|
||||
- `## Subject`
|
||||
- `## Predicate`
|
||||
- `## Object`
|
||||
- optional `## Relation Type`, `## Evidence`, and `## Feedback Role`
|
||||
|
||||
Subject and object values should match generated entity titles whenever possible.
|
||||
@@ -0,0 +1,7 @@
|
||||
# Summary Contract
|
||||
|
||||
Each source summary should preserve:
|
||||
|
||||
- the core claims or concepts
|
||||
- evidence phrases useful for later review
|
||||
- unresolved ambiguities or extraction risks
|
||||
14
src/infospace_bench/profiles/general-knowledge/profile.yaml
Normal file
14
src/infospace_bench/profiles/general-knowledge/profile.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
id: general-knowledge
|
||||
name: General Knowledge
|
||||
description: Generic infospace generation profile for local articles, ebooks, and knowledge collections.
|
||||
terminology:
|
||||
source_chunk: Normalized source artifact
|
||||
entity: Durable concept, claim, method, person, place, work, or object
|
||||
relation: Typed link between two generated entities
|
||||
granularity:
|
||||
default: Extract entities that can stand alone as useful review artifacts.
|
||||
evaluation_criteria:
|
||||
- groundedness
|
||||
- usefulness
|
||||
- clarity
|
||||
- provenance
|
||||
@@ -0,0 +1,14 @@
|
||||
# Evaluate Entity
|
||||
|
||||
Profile: {{ macros.profile }}
|
||||
|
||||
Evaluate the generated entity as Markdown with YAML frontmatter. Include
|
||||
`artifact_id`, `evaluator`, `evaluated_at`, and scores for groundedness and
|
||||
usefulness on a 0 to 5 scale.
|
||||
|
||||
Entity artifact: {{ input.artifact_id }}
|
||||
Entity title: {{ input.title }}
|
||||
|
||||
## Entity
|
||||
|
||||
{{ input.content }}
|
||||
@@ -0,0 +1,15 @@
|
||||
# Extract Entities
|
||||
|
||||
Profile: {{ macros.profile }}
|
||||
|
||||
Extract reusable infospace entities from the source chunk. Return one Markdown
|
||||
bundle where each entity starts with `# Entity Title` and contains at least a
|
||||
`## Definition` section. Prefer durable concepts, claims, named methods,
|
||||
people, places, works, and objects over sentence fragments.
|
||||
|
||||
Source title: {{ input.title }}
|
||||
Source artifact: {{ input.artifact_id }}
|
||||
|
||||
## Source
|
||||
|
||||
{{ input.content }}
|
||||
@@ -0,0 +1,14 @@
|
||||
# Extract Relations
|
||||
|
||||
Profile: {{ macros.profile }}
|
||||
|
||||
Extract a small set of important relations from the source chunk. Return one
|
||||
Markdown relation artifact with sections `## Subject`, `## Predicate`, and
|
||||
`## Object`. Use entity-style names for subject and object.
|
||||
|
||||
Source title: {{ input.title }}
|
||||
Source artifact: {{ input.artifact_id }}
|
||||
|
||||
## Source
|
||||
|
||||
{{ input.content }}
|
||||
@@ -0,0 +1,13 @@
|
||||
# Summarize Source Chunk
|
||||
|
||||
Profile: {{ macros.profile }}
|
||||
|
||||
Summarize the source chunk as Markdown. Preserve concrete claims, named concepts,
|
||||
and evidence phrases that should guide later entity and relation extraction.
|
||||
|
||||
Source title: {{ input.title }}
|
||||
Source artifact: {{ input.artifact_id }}
|
||||
|
||||
## Source
|
||||
|
||||
{{ input.content }}
|
||||
@@ -0,0 +1,6 @@
|
||||
# Synthesize Collection Report
|
||||
|
||||
Profile: {{ macros.profile }}
|
||||
|
||||
Synthesize a concise report from generated source summaries, entities,
|
||||
relations, evaluations, and collection metrics.
|
||||
273
src/infospace_bench/source_intake.py
Normal file
273
src/infospace_bench/source_intake.py
Normal file
@@ -0,0 +1,273 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import html
|
||||
import re
|
||||
import zipfile
|
||||
from dataclasses import asdict, dataclass
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
from .errors import InfospaceError
|
||||
from .semantics import slugify
|
||||
|
||||
EXTRACTOR_VERSION = "generic-source-intake-v1"
|
||||
SUPPORTED_EXTENSIONS = {".md", ".markdown", ".txt", ".html", ".htm", ".epub"}
|
||||
HTML_TITLE_RE = re.compile(r"<title[^>]*>(?P<title>.*?)</title>", re.I | re.S)
|
||||
HTML_H1_RE = re.compile(r"<h1[^>]*>(?P<title>.*?)</h1>", re.I | re.S)
|
||||
SCRIPT_STYLE_RE = re.compile(r"<(script|style)[^>]*>.*?</\1>", re.I | re.S)
|
||||
TAG_RE = re.compile(r"<[^>]+>")
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SourceChunk:
|
||||
chunk_id: str
|
||||
title: str
|
||||
markdown: str
|
||||
source_type: str
|
||||
original_path: str
|
||||
digest: str
|
||||
chunk_index: int
|
||||
chunk_count: int
|
||||
imported_at: str
|
||||
extractor_version: str = EXTRACTOR_VERSION
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _SourceDocument:
|
||||
title: str
|
||||
markdown: str
|
||||
source_type: str
|
||||
original_path: str
|
||||
base_slug: str
|
||||
|
||||
|
||||
def normalize_source(
|
||||
source: str | Path,
|
||||
*,
|
||||
max_words: int = 800,
|
||||
max_chunks: int | None = None,
|
||||
) -> list[SourceChunk]:
|
||||
source_path = Path(source)
|
||||
if not source_path.exists():
|
||||
raise InfospaceError(
|
||||
"missing_source",
|
||||
f"Source path does not exist: {source_path}",
|
||||
{"source": str(source_path)},
|
||||
)
|
||||
documents = list(_iter_documents(source_path))
|
||||
if not documents:
|
||||
raise InfospaceError(
|
||||
"unsupported_source",
|
||||
f"No supported source documents found: {source_path}",
|
||||
{
|
||||
"source": str(source_path),
|
||||
"supported_extensions": sorted(SUPPORTED_EXTENSIONS),
|
||||
},
|
||||
)
|
||||
imported_at = datetime.now(timezone.utc).isoformat()
|
||||
chunks: list[SourceChunk] = []
|
||||
used_ids: set[str] = set()
|
||||
for document in documents:
|
||||
pieces = _chunk_markdown(document.markdown, max_words=max_words)
|
||||
for index, piece in enumerate(pieces):
|
||||
title = document.title if len(pieces) == 1 else f"{document.title} Part {index + 1}"
|
||||
base_id = (
|
||||
document.base_slug if len(pieces) == 1 else f"{document.base_slug}-part-{index + 1:03d}"
|
||||
)
|
||||
chunk_id = _dedupe_chunk_id(base_id, used_ids)
|
||||
chunks.append(
|
||||
SourceChunk(
|
||||
chunk_id=chunk_id,
|
||||
title=title,
|
||||
markdown=piece,
|
||||
source_type=document.source_type,
|
||||
original_path=document.original_path,
|
||||
digest=_digest_text(piece),
|
||||
chunk_index=index,
|
||||
chunk_count=len(pieces),
|
||||
imported_at=imported_at,
|
||||
)
|
||||
)
|
||||
if max_chunks is not None and max_chunks > 0 and len(chunks) >= max_chunks:
|
||||
return chunks
|
||||
return chunks
|
||||
|
||||
|
||||
def _iter_documents(source_path: Path) -> Iterable[_SourceDocument]:
|
||||
if source_path.is_dir():
|
||||
for path in sorted(source_path.rglob("*")):
|
||||
if path.is_file() and path.suffix.lower() in SUPPORTED_EXTENSIONS:
|
||||
yield from _iter_documents(path)
|
||||
return
|
||||
|
||||
suffix = source_path.suffix.lower()
|
||||
if suffix in (".md", ".markdown"):
|
||||
yield _markdown_document(source_path)
|
||||
elif suffix == ".txt":
|
||||
yield _text_document(source_path)
|
||||
elif suffix in (".html", ".htm"):
|
||||
yield _html_document(source_path, source_type="html")
|
||||
elif suffix == ".epub":
|
||||
yield from _epub_documents(source_path)
|
||||
|
||||
|
||||
def _markdown_document(path: Path) -> _SourceDocument:
|
||||
markdown = _normalize_newlines(path.read_text(encoding="utf-8")).strip() + "\n"
|
||||
title = _markdown_title(markdown) or _title_from_path(path)
|
||||
return _SourceDocument(
|
||||
title=title,
|
||||
markdown=_ensure_h1(markdown, title),
|
||||
source_type="markdown",
|
||||
original_path=str(path),
|
||||
base_slug=slugify(title) or slugify(path.stem) or "source",
|
||||
)
|
||||
|
||||
|
||||
def _text_document(path: Path) -> _SourceDocument:
|
||||
title = _title_from_path(path)
|
||||
body = _normalize_newlines(path.read_text(encoding="utf-8")).strip()
|
||||
markdown = f"# {title}\n\n{body}\n"
|
||||
return _SourceDocument(
|
||||
title=title,
|
||||
markdown=markdown,
|
||||
source_type="text",
|
||||
original_path=str(path),
|
||||
base_slug=slugify(title) or "source",
|
||||
)
|
||||
|
||||
|
||||
def _html_document(
|
||||
path: Path,
|
||||
*,
|
||||
source_type: str,
|
||||
original_path: str | None = None,
|
||||
text: str | None = None,
|
||||
) -> _SourceDocument:
|
||||
raw = text if text is not None else path.read_text(encoding="utf-8")
|
||||
title = _html_title(raw) or _title_from_path(path)
|
||||
body = _html_to_text(raw)
|
||||
if body.lower().startswith(title.lower()):
|
||||
body = body[len(title) :].strip()
|
||||
markdown = f"# {title}\n\n{body}\n"
|
||||
return _SourceDocument(
|
||||
title=title,
|
||||
markdown=markdown,
|
||||
source_type=source_type,
|
||||
original_path=original_path or str(path),
|
||||
base_slug=slugify(title) or slugify(path.stem) or "source",
|
||||
)
|
||||
|
||||
|
||||
def _epub_documents(path: Path) -> Iterable[_SourceDocument]:
|
||||
try:
|
||||
with zipfile.ZipFile(path) as archive:
|
||||
names = [
|
||||
name
|
||||
for name in sorted(archive.namelist())
|
||||
if Path(name).suffix.lower() in {".html", ".htm", ".xhtml", ".txt", ".md"}
|
||||
and not name.endswith("/")
|
||||
]
|
||||
for name in names:
|
||||
raw = archive.read(name).decode("utf-8", errors="replace")
|
||||
pseudo_path = Path(name)
|
||||
if pseudo_path.suffix.lower() in {".txt", ".md"}:
|
||||
title = _markdown_title(raw) or _title_from_path(pseudo_path)
|
||||
markdown = _ensure_h1(_normalize_newlines(raw).strip() + "\n", title)
|
||||
yield _SourceDocument(
|
||||
title=title,
|
||||
markdown=markdown,
|
||||
source_type="epub",
|
||||
original_path=f"{path}!{name}",
|
||||
base_slug=slugify(title) or slugify(pseudo_path.stem) or "source",
|
||||
)
|
||||
else:
|
||||
yield _html_document(
|
||||
pseudo_path,
|
||||
source_type="epub",
|
||||
original_path=f"{path}!{name}",
|
||||
text=raw,
|
||||
)
|
||||
except zipfile.BadZipFile as exc:
|
||||
raise InfospaceError(
|
||||
"invalid_epub_source",
|
||||
f"EPUB source is not a readable zip archive: {path}",
|
||||
{"source": str(path)},
|
||||
) from exc
|
||||
|
||||
|
||||
def _chunk_markdown(markdown: str, *, max_words: int) -> list[str]:
|
||||
text = markdown.strip()
|
||||
if max_words <= 0:
|
||||
return [text + "\n"]
|
||||
words = text.split()
|
||||
if len(words) <= max_words:
|
||||
return [text + "\n"]
|
||||
chunks: list[str] = []
|
||||
heading = _markdown_title(text) or "Source"
|
||||
body_words = re.sub(r"(?m)^# .+?\n+", "", text, count=1).split()
|
||||
for start in range(0, len(body_words), max_words):
|
||||
part = " ".join(body_words[start : start + max_words]).strip()
|
||||
chunks.append(f"# {heading} Part {len(chunks) + 1}\n\n{part}\n")
|
||||
return chunks
|
||||
|
||||
|
||||
def _html_title(raw: str) -> str:
|
||||
match = HTML_TITLE_RE.search(raw) or HTML_H1_RE.search(raw)
|
||||
if not match:
|
||||
return ""
|
||||
return _collapse_ws(_html_to_text(match.group("title")))
|
||||
|
||||
|
||||
def _html_to_text(raw: str) -> str:
|
||||
cleaned = SCRIPT_STYLE_RE.sub(" ", raw)
|
||||
cleaned = re.sub(r"</(p|div|section|article|h[1-6]|li)>", "\n", cleaned, flags=re.I)
|
||||
cleaned = TAG_RE.sub(" ", cleaned)
|
||||
cleaned = html.unescape(cleaned)
|
||||
lines = [_collapse_ws(line) for line in cleaned.splitlines()]
|
||||
return "\n\n".join(line for line in lines if line).strip()
|
||||
|
||||
|
||||
def _ensure_h1(markdown: str, title: str) -> str:
|
||||
if re.search(r"(?m)^#\s+\S", markdown):
|
||||
return markdown
|
||||
return f"# {title}\n\n{markdown.strip()}\n"
|
||||
|
||||
|
||||
def _markdown_title(markdown: str) -> str:
|
||||
match = re.search(r"(?m)^#\s+(?P<title>.+?)\s*$", markdown)
|
||||
return match.group("title").strip() if match else ""
|
||||
|
||||
|
||||
def _title_from_path(path: Path) -> str:
|
||||
words = re.sub(r"[^A-Za-z0-9]+", " ", path.stem).strip()
|
||||
return words.title() if words else "Source"
|
||||
|
||||
|
||||
def _dedupe_chunk_id(base_id: str, used_ids: set[str]) -> str:
|
||||
candidate = base_id or "source"
|
||||
if candidate not in used_ids:
|
||||
used_ids.add(candidate)
|
||||
return candidate
|
||||
index = 2
|
||||
while f"{candidate}-{index}" in used_ids:
|
||||
index += 1
|
||||
deduped = f"{candidate}-{index}"
|
||||
used_ids.add(deduped)
|
||||
return deduped
|
||||
|
||||
|
||||
def _digest_text(text: str) -> str:
|
||||
return hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def _collapse_ws(value: str) -> str:
|
||||
return re.sub(r"\s+", " ", value).strip()
|
||||
|
||||
|
||||
def _normalize_newlines(value: str) -> str:
|
||||
return value.replace("\r\n", "\n").replace("\r", "\n")
|
||||
@@ -273,10 +273,12 @@ class WorkflowStageRecord:
|
||||
input_artifact_id: str
|
||||
output_artifact_id: str = ""
|
||||
message: str = ""
|
||||
provider: str = ""
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = asdict(self)
|
||||
return {key: value for key, value in data.items() if value != ""}
|
||||
return {key: value for key, value in data.items() if value not in ("", {}, [])}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -442,6 +444,7 @@ def _execute_workflow(
|
||||
infospace.root,
|
||||
dry_run=False,
|
||||
provider=result.provider,
|
||||
provider_metadata=result.metadata,
|
||||
)
|
||||
outputs.append(output)
|
||||
stage_outputs[stage.id] = {
|
||||
@@ -458,6 +461,8 @@ def _execute_workflow(
|
||||
status="completed",
|
||||
input_artifact_id=input_record.artifact_id,
|
||||
output_artifact_id=output.artifact_id,
|
||||
provider=result.provider,
|
||||
metadata=result.metadata,
|
||||
)
|
||||
)
|
||||
elif stage.kind == "split_entities":
|
||||
@@ -645,6 +650,7 @@ def _resolve_output(
|
||||
*,
|
||||
dry_run: bool,
|
||||
provider: str = "",
|
||||
provider_metadata: dict[str, Any] | None = None,
|
||||
) -> WorkflowOutputRecord:
|
||||
if stage.output is None:
|
||||
raise InfospaceError(
|
||||
@@ -673,6 +679,11 @@ def _resolve_output(
|
||||
"stage_id": stage.id,
|
||||
"input_artifact_id": input_record.artifact_id,
|
||||
**({"provider": provider} if provider else {}),
|
||||
**(
|
||||
{"provider_metadata": provider_metadata}
|
||||
if provider_metadata
|
||||
else {}
|
||||
),
|
||||
},
|
||||
relationships=[
|
||||
{
|
||||
|
||||
301
tests/test_generic_generator.py
Normal file
301
tests/test_generic_generator.py
Normal file
@@ -0,0 +1,301 @@
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
from infospace_bench.generator import (
|
||||
init_generation_infospace,
|
||||
run_generation,
|
||||
status_generation,
|
||||
)
|
||||
from infospace_bench.openrouter import OpenRouterAssistedGenerationAdapter
|
||||
from infospace_bench.source_intake import normalize_source
|
||||
|
||||
|
||||
def cli_env() -> dict[str, str]:
|
||||
env = os.environ.copy()
|
||||
env["PYTHONPATH"] = "src:/home/worsch/markitect-tool/src"
|
||||
return env
|
||||
|
||||
|
||||
def fixture_responses(path: Path) -> None:
|
||||
data = {
|
||||
"responses": [
|
||||
{
|
||||
"stage_id": "summarize-source",
|
||||
"input_artifact_id": "*",
|
||||
"markdown": "# Source Summary\n\nThe source describes reusable knowledge work.\n",
|
||||
},
|
||||
{
|
||||
"stage_id": "extract-entities",
|
||||
"input_artifact_id": "*",
|
||||
"markdown": (
|
||||
"# Knowledge Artifact\n\n"
|
||||
"## Definition\n\n"
|
||||
"A durable unit of structured knowledge derived from a source.\n\n"
|
||||
"## Context\n\n"
|
||||
"Generated from a generic source workflow.\n\n"
|
||||
"# Source Claim\n\n"
|
||||
"## Definition\n\n"
|
||||
"A claim preserved from the source for later review.\n\n"
|
||||
"## Context\n\n"
|
||||
"Used to keep provenance visible.\n"
|
||||
),
|
||||
},
|
||||
{
|
||||
"stage_id": "extract-relations",
|
||||
"input_artifact_id": "*",
|
||||
"markdown": (
|
||||
"# Knowledge Artifact Supports Source Claim\n\n"
|
||||
"## Subject\n\n"
|
||||
"Knowledge Artifact\n\n"
|
||||
"## Predicate\n\n"
|
||||
"supports\n\n"
|
||||
"## Object\n\n"
|
||||
"Source Claim\n\n"
|
||||
"## Relation Type\n\n"
|
||||
"support\n\n"
|
||||
"## Evidence\n\n"
|
||||
"The source links durable artifacts to explicit claims.\n"
|
||||
),
|
||||
},
|
||||
{
|
||||
"stage_id": "evaluate-entity",
|
||||
"input_artifact_id": "*",
|
||||
"markdown": (
|
||||
"---\n"
|
||||
"artifact_id: entity/knowledge-artifact.md\n"
|
||||
"evaluator: fixture\n"
|
||||
"evaluated_at: '2026-05-14T00:00:00'\n"
|
||||
"scores:\n"
|
||||
" - name: groundedness\n"
|
||||
" value: 4.0\n"
|
||||
" max_value: 5.0\n"
|
||||
" - name: usefulness\n"
|
||||
" value: 4.0\n"
|
||||
" max_value: 5.0\n"
|
||||
"---\n"
|
||||
"\n"
|
||||
"# Evaluation: entity/knowledge-artifact.md\n"
|
||||
),
|
||||
},
|
||||
]
|
||||
}
|
||||
path.write_text(yaml.safe_dump(data, sort_keys=False), encoding="utf-8")
|
||||
|
||||
|
||||
def write_epub_fixture(path: Path) -> None:
|
||||
with zipfile.ZipFile(path, "w") as archive:
|
||||
archive.writestr("OEBPS/chapter1.xhtml", "<h1>Chapter One</h1><p>Alpha beta.</p>")
|
||||
archive.writestr("OEBPS/chapter2.xhtml", "<h1>Chapter Two</h1><p>Gamma delta.</p>")
|
||||
|
||||
|
||||
def test_source_intake_accepts_article_ebook_and_folder(tmp_path: Path) -> None:
|
||||
article = tmp_path / "article.html"
|
||||
article.write_text(
|
||||
"<html><head><title>Article Title</title></head>"
|
||||
"<body><h1>Article Title</h1><p>One two three.</p></body></html>",
|
||||
encoding="utf-8",
|
||||
)
|
||||
ebook = tmp_path / "book.epub"
|
||||
write_epub_fixture(ebook)
|
||||
folder = tmp_path / "collection"
|
||||
folder.mkdir()
|
||||
(folder / "note.md").write_text("# Note\n\nMarkdown source.", encoding="utf-8")
|
||||
(folder / "memo.txt").write_text("Plain text source.", encoding="utf-8")
|
||||
|
||||
article_chunks = normalize_source(article)
|
||||
ebook_chunks = normalize_source(ebook)
|
||||
folder_chunks = normalize_source(folder)
|
||||
|
||||
assert article_chunks[0].source_type == "html"
|
||||
assert article_chunks[0].title == "Article Title"
|
||||
assert article_chunks[0].chunk_id == "article-title"
|
||||
assert article_chunks[0].digest == normalize_source(article)[0].digest
|
||||
assert [chunk.source_type for chunk in ebook_chunks] == ["epub", "epub"]
|
||||
assert {chunk.source_type for chunk in folder_chunks} == {"markdown", "text"}
|
||||
assert all(chunk.markdown.startswith("# ") for chunk in folder_chunks)
|
||||
|
||||
|
||||
def test_generate_from_source_cli_fixture_builds_infospace(tmp_path: Path) -> None:
|
||||
source = tmp_path / "article.md"
|
||||
source.write_text(
|
||||
"# Reusable Knowledge\n\nA source about claims and durable artifacts.",
|
||||
encoding="utf-8",
|
||||
)
|
||||
fixture = tmp_path / "responses.yaml"
|
||||
fixture_responses(fixture)
|
||||
|
||||
result = subprocess.run(
|
||||
[
|
||||
sys.executable,
|
||||
"-m",
|
||||
"infospace_bench",
|
||||
"generate",
|
||||
"from-source",
|
||||
str(source),
|
||||
"--workspace",
|
||||
str(tmp_path),
|
||||
"--slug",
|
||||
"article-space",
|
||||
"--name",
|
||||
"Article Space",
|
||||
"--fixture-responses",
|
||||
str(fixture),
|
||||
"--apply",
|
||||
],
|
||||
check=False,
|
||||
env=cli_env(),
|
||||
text=True,
|
||||
capture_output=True,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
payload = json.loads(result.stdout)
|
||||
root = Path(payload["root"])
|
||||
status = subprocess.run(
|
||||
[
|
||||
sys.executable,
|
||||
"-m",
|
||||
"infospace_bench",
|
||||
"generate",
|
||||
"status",
|
||||
str(root),
|
||||
],
|
||||
check=False,
|
||||
env=cli_env(),
|
||||
text=True,
|
||||
capture_output=True,
|
||||
)
|
||||
assert status.returncode == 0, status.stderr
|
||||
status_payload = json.loads(status.stdout)
|
||||
|
||||
assert payload["status"] == "completed"
|
||||
assert (root / "artifacts" / "sources" / "reusable-knowledge.md").is_file()
|
||||
assert (root / "artifacts" / "entities" / "knowledge-artifact.md").is_file()
|
||||
assert (root / "artifacts" / "relations" / "reusable-knowledge-relations.md").is_file()
|
||||
assert (root / "output" / "metrics" / "metrics.yaml").is_file()
|
||||
assert status_payload["source_chunk_count"] == 1
|
||||
assert status_payload["entity_count"] == 2
|
||||
assert status_payload["relation_count"] == 1
|
||||
assert status_payload["stale"] is False
|
||||
|
||||
|
||||
def test_generate_from_ebook_and_folder_fixtures(tmp_path: Path) -> None:
|
||||
fixture = tmp_path / "responses.yaml"
|
||||
fixture_responses(fixture)
|
||||
ebook = tmp_path / "book.epub"
|
||||
write_epub_fixture(ebook)
|
||||
folder = tmp_path / "folder"
|
||||
folder.mkdir()
|
||||
(folder / "first.md").write_text("# First\n\nOne source.", encoding="utf-8")
|
||||
(folder / "second.txt").write_text("Second source.", encoding="utf-8")
|
||||
|
||||
for source, slug, expected_sources in (
|
||||
(ebook, "book-space", 2),
|
||||
(folder, "folder-space", 2),
|
||||
):
|
||||
result = subprocess.run(
|
||||
[
|
||||
sys.executable,
|
||||
"-m",
|
||||
"infospace_bench",
|
||||
"generate",
|
||||
"from-source",
|
||||
str(source),
|
||||
"--workspace",
|
||||
str(tmp_path),
|
||||
"--slug",
|
||||
slug,
|
||||
"--name",
|
||||
slug.replace("-", " ").title(),
|
||||
"--fixture-responses",
|
||||
str(fixture),
|
||||
"--apply",
|
||||
],
|
||||
check=False,
|
||||
env=cli_env(),
|
||||
text=True,
|
||||
capture_output=True,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
payload = json.loads(result.stdout)
|
||||
status = status_generation(Path(payload["root"]))
|
||||
assert status["source_chunk_count"] == expected_sources
|
||||
assert status["entity_count"] == 2
|
||||
assert status["relation_count"] == expected_sources
|
||||
assert status["history_snapshot_count"] == 1
|
||||
|
||||
|
||||
def test_generator_resume_is_idempotent_and_detects_stale_source(tmp_path: Path) -> None:
|
||||
source = tmp_path / "note.md"
|
||||
source.write_text("# Note\n\nInitial source.", encoding="utf-8")
|
||||
fixture = tmp_path / "responses.yaml"
|
||||
fixture_responses(fixture)
|
||||
root = init_generation_infospace(tmp_path, source, "note-space", name="Note Space").root
|
||||
|
||||
first = run_generation(root, fixture_responses=fixture)
|
||||
second = run_generation(root, fixture_responses=fixture, resume=True)
|
||||
generated_source = root / "artifacts" / "sources" / "note.md"
|
||||
generated_source.write_text("# Note\n\nChanged source.", encoding="utf-8")
|
||||
stale_status = status_generation(root)
|
||||
|
||||
assert first.status == "completed"
|
||||
assert second.status == "skipped"
|
||||
assert second.skipped is True
|
||||
assert stale_status["stale"] is True
|
||||
assert stale_status["stale_sources"] == ["source/note.md"]
|
||||
|
||||
|
||||
def test_openrouter_adapter_uses_model_and_records_metadata() -> None:
|
||||
requests: list[dict] = []
|
||||
|
||||
def transport(payload: dict, headers: dict[str, str], endpoint: str) -> dict:
|
||||
requests.append({"payload": payload, "headers": headers, "endpoint": endpoint})
|
||||
return {
|
||||
"id": "or-request-1",
|
||||
"choices": [{"message": {"content": "# Generated\n\nContent."}}],
|
||||
"usage": {"prompt_tokens": 5, "completion_tokens": 3},
|
||||
}
|
||||
|
||||
adapter = OpenRouterAssistedGenerationAdapter(
|
||||
api_key="test-key",
|
||||
model="openai/gpt-4o-mini",
|
||||
transport=transport,
|
||||
retry_limit=0,
|
||||
)
|
||||
result = adapter.generate(
|
||||
type(
|
||||
"Request",
|
||||
(),
|
||||
{
|
||||
"prompt": "Generate markdown.",
|
||||
"stage_id": "extract-entities",
|
||||
"workflow_id": "generic-source-extract",
|
||||
"input_artifact_id": "source/example.md",
|
||||
"provider_hint": "openrouter",
|
||||
"metadata": {},
|
||||
},
|
||||
)()
|
||||
)
|
||||
|
||||
assert requests[0]["payload"]["model"] == "openai/gpt-4o-mini"
|
||||
assert requests[0]["headers"]["Authorization"] == "Bearer test-key"
|
||||
assert result.markdown == "# Generated\n\nContent."
|
||||
assert result.provider == "openrouter"
|
||||
assert result.metadata["model"] == "openai/gpt-4o-mini"
|
||||
assert result.metadata["request_id"] == "or-request-1"
|
||||
assert result.metadata["usage"]["completion_tokens"] == 3
|
||||
|
||||
|
||||
def test_generic_generator_docs_cover_openrouter_resume_and_cost_caps() -> None:
|
||||
text = Path("docs/generic-source-generator.md").read_text(encoding="utf-8")
|
||||
|
||||
assert "OPENROUTER_API_KEY" in text
|
||||
assert "--model" in text
|
||||
assert "--max-chunks" in text
|
||||
assert "resume" in text.lower()
|
||||
assert "fixture-responses" in text
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Generic Source Infospace Generator CLI"
|
||||
domain: markitect
|
||||
repo: infospace-bench
|
||||
status: planned
|
||||
status: completed
|
||||
owner: markitect
|
||||
topic_slug: markitect
|
||||
created: "2026-05-14"
|
||||
@@ -105,7 +105,7 @@ Default-safe modes:
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T01
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
|
||||
```
|
||||
@@ -128,7 +128,7 @@ state_hub_task_id: "08196bf2-9323-4cd8-860c-4306c965ed60"
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T02
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
|
||||
```
|
||||
@@ -152,7 +152,7 @@ state_hub_task_id: "5604796b-cb09-43ed-b3a9-5d4906790807"
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T03
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
|
||||
```
|
||||
@@ -171,7 +171,7 @@ state_hub_task_id: "c02720c5-1b82-458a-bf8c-9147af4fd9e9"
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
|
||||
```
|
||||
@@ -192,7 +192,7 @@ state_hub_task_id: "21b50fbc-f43e-4b18-b012-976a5241f52a"
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T05
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
|
||||
```
|
||||
@@ -216,7 +216,7 @@ state_hub_task_id: "ad882b6e-924e-4f9a-8e93-119aeadd8132"
|
||||
|
||||
```task
|
||||
id: IB-WP-0015-T06
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
|
||||
```
|
||||
@@ -264,3 +264,18 @@ state_hub_task_id: "3461eacf-e42a-455c-954c-849b0ad69fc1"
|
||||
- `infospace-bench`: applied infospace generation workflow and CLI
|
||||
- `kontextual-engine`: durable runtime/retrieval/audit if needed later
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
Completed on 2026-05-14.
|
||||
|
||||
- Added generic source intake for Markdown, plain text, local HTML, EPUB-like
|
||||
archives, and folder collections.
|
||||
- Added the `general-knowledge` profile with prompt templates and contracts.
|
||||
- Added an explicit OpenRouter assisted-generation adapter with mocked provider
|
||||
tests and environment-based credential lookup.
|
||||
- Added `infospace-bench generate` subcommands for init, plan, run, resume,
|
||||
status, and from-source flows.
|
||||
- Added generation state, resume skipping, source/profile stale detection,
|
||||
metrics/history recording, and a manifest-backed generation report.
|
||||
- Added deterministic acceptance tests for article, ebook-like, and folder
|
||||
generation using fixture responses.
|
||||
|
||||
Reference in New Issue
Block a user