guide-board/docs/ARCHITECTURE-BLUEPRINT.md

# Guide Board Core Architecture Blueprint

Status: draft
Created: 2026-05-07

## Purpose

This blueprint defines the first core architecture for `guide-board`: a
certification and compliance preparation framework that can orchestrate
extension-specific conformance harnesses, validators, repository-quality checks,
and procedural evidence packs without embedding domain policy in the core.

The design is based on recurring patterns from official or authority-backed
programs such as OGC TEAM Engine, OpenID Foundation Conformance Suite, CNCF
Kubernetes Conformance, web-platform-tests, Khronos CTS, NIST ACVP, HL7/FHIR
Inferno, Jakarta EE TCK, OPC UA CTT, NIST SCAP/OSCAL, CIS-CAT, and OpenSSF
Scorecard.

## Research Lessons

### Suite Engine

Examples: OGC TEAM Engine, OpenCMIS TCK, Jakarta EE TCK.

Pattern:

- installable suites with their own test definitions,
- command-line execution and sometimes web/API execution,
- target-specific input forms or profiles,
- raw logs plus structured result formats,
- conformance classes or capability areas,
- certification boundary outside normal self-testing.

Architecture lesson:

`guide-board` needs a runner bridge that can call external harnesses, capture
artifacts, and normalize tool-specific result formats without making the harness
part of the core.

Sources:

- [TEAM Engine](https://opengeospatial.github.io/teamengine/)
- [TEAM Engine User Guide](https://opengeospatial.github.io/teamengine/users.html)
- [Jakarta EE TCK Process](https://jakarta.ee/committees/specification/tckprocess/)
- [OpenCMIS TCK package](https://chemistry.apache.org/java/javadoc/org/apache/chemistry/opencmis/tck/package-summary.html)

### Hosted Or Local Certification Suite

Examples: OpenID Foundation Conformance Suite, Inferno.

Pattern:

- open source suite,
- hosted public/staging environments,
- local Docker execution,
- named test plans or test kits,
- logs and public result pages,
- fee or accredited-review boundary for formal certification.

Architecture lesson:

`guide-board` should model execution environment tiers, test plans, and
certification submission packages separately from normal development runs.

Sources:

- [OpenID Conformance Suite](https://openid.net/certification/about-conformance-suite/)
- [OpenID Certification](https://openid.net/certification/)
- [Inferno Framework](https://inferno-framework.github.io/about/)
- [Inferno Documentation](https://inferno-framework.github.io/docs)

### Submit-Results Program

Example: CNCF Kubernetes Conformance.

Pattern:

- vendors run the same open source conformance application used by the program,
- result artifacts are submitted for review,
- accepted results feed a public certification list,
- users can rerun the same conformance application to confirm behavior.

Architecture lesson:

An assessment package should be a first-class artifact with source metadata,
runner version, target identity, raw evidence, normalized results, and a review
boundary suitable for downstream submission.

Source:

- [CNCF Certified Kubernetes Software Conformance](https://www.cncf.io/certification/software-conformance/)

### Protocol Validation Service

Example: NIST ACVP.

Pattern:

- authority-operated demo and production services,
- client authentication,
- machine-to-machine protocol,
- generated test vectors and submitted responses,
- validation tied to an external authority process.

Architecture lesson:

Some extensions will not run a local test suite. They will coordinate a session
with an authority service. The core must support credential references, remote
session IDs, generated inputs, submitted responses, and external verdicts.

Source:

- [NIST ACVP](https://pages.nist.gov/ACVP/)

### Web-Scale Shared Test Repository

Example: web-platform-tests.

Pattern:

- shared specification-linked test repository,
- canonical manifest generation,
- multiple test types including automated, reference, and manual tests,
- local and public execution surfaces.

Architecture lesson:

`guide-board` check discovery should be manifest-driven where possible. It must
support heterogeneous check types instead of assuming every check is a simple
pass/fail command.

Sources:

- [web-platform-tests](https://web-platform-tests.org/)
- [Writing Your Own Runner](https://web-platform-tests.org/running-tests/custom-runner.html)
- [Running Tests from the Web](https://web-platform-tests.org/running-tests/from-web.html)

### Conformance Submission Package

Examples: Khronos Vulkan CTS and OpenXR CTS.

Pattern:

- automated and sometimes interactive test runs,
- XML result files,
- console output,
- build and CTS version metadata,
- explicit conformance statement,
- trademark or adopter-program boundary.

Architecture lesson:

The guide-board assessment package should preserve both normalized evidence and
the original submission-grade artifacts expected by an authority.

Sources:

- [Vulkan CTS Guide](https://docs.vulkan.org/guide/latest/vulkan_cts.html)
- [OpenXR CTS Usage Guide](https://registry.khronos.org/OpenXR/conformance/cts_usage.html)

### Restricted Tool

Examples: OPC UA CTT, CIS-CAT Pro.

Pattern:

- official tool may be restricted to members, licensees, or controlled access,
- tests are organized by profiles, facets, conformance units, benchmarks, or
  controls,
- command-line execution may exist for automation,
- redistribution is not allowed or not appropriate.

Architecture lesson:

`guide-board` must represent restricted harnesses as externally supplied runtime
assets. The registry can describe how to integrate them, but the core and
extensions must not vendor restricted tools or proprietary standard text.

Sources:

- [OPC UA Compliance Test Tool](https://opcfoundation.org/developer-tools/certification-test-tools/opc-ua-compliance-test-tool-uactt/)
- [CIS-CAT Pro Assessor](https://www.cisecurity.org/cybersecurity-tools/cis-cat-pro)

### Security Configuration And Assessment Content

Examples: NIST SCAP, OpenSCAP, CIS-CAT Pro.

Pattern:

- machine-readable security configuration content,
- profiles or tailored benchmarks,
- local or remote system assessment,
- automated and manual checks,
- reports mapped to controls.

Architecture lesson:

`guide-board` must support content-driven validators where the extension supplies
policy content and a scanner, not a fixed test suite. The evidence model must
handle manual, automated, and partially automated checks.

Sources:

- [NIST SCAP](https://csrc.nist.gov/Projects/Security-Content-Automation-Protocol)
- [NIST SCAP 1.3](https://csrc.nist.gov/projects/security-content-automation-protocol/scap-releases/scap-1-3)
- [OpenSCAP](https://www.open-scap.org/)

### Assessment Data Interchange

Example: NIST OSCAL.

Pattern:

- layered machine-readable models for controls, implementation, assessment
  plans, assessment results, and remediation milestones,
- multiple serializations such as JSON, XML, and YAML,
- assessment results expressed relative to a system and controls.

Architecture lesson:

`guide-board` should keep its internal evidence model small, but design it so
later OSCAL export is natural for compliance packs that need formal assessment
interchange.

Source:

- [NIST OSCAL Layers and Models](https://pages.nist.gov/OSCAL/learn/concepts/layer/)

### Repository Quality And Supply Chain Scoring

Example: OpenSSF Scorecard.

Pattern:

- automated checks over source repositories,
- score and risk level per check,
- aggregate posture score,
- remediation prompts,
- CI and API integration.

Architecture lesson:

Repository quality packs should be normal extensions. A score is not a
certification verdict; it is a normalized finding and trend signal.

Quality gates should be core policy decisions over retained posture, not
extension-specific verdicts. The first gate layer checks latest run status,
unexpected finding count, and whether the latest trend regressed.

Sources:

- [OpenSSF Scorecard](https://openssf.org/projects/scorecard/)
- [Scorecard documentation](https://github.com/ossf/scorecard)

## Architecture Principles

- The core is extension-neutral.
- Authority, framework, and harness versions are evidence, not prose.
- Local CLI behavior is the execution source of truth.
- Optional service APIs wrap the same contracts used by the CLI.
- Restricted harnesses and proprietary standards are mounted or referenced, not
  redistributed.
- Raw artifacts are preserved, but normalized evidence is the primary interface.
- Every assessment package must state its certification boundary.
- Manual, semi-automated, and fully automated checks all use the same evidence
  model.
- Expected gaps and waivers never suppress unexpected failures silently.
- Extension extraction to separate repositories should be possible without
  changing core contracts.

## Core Components

### Authority Catalog

Tracks source authorities, framework names, versions, official URLs, licensing
posture, access constraints, certification boundaries, and lifecycle status.

### Extension Registry

Discovers installed or incubating extensions. Each extension declares:

- extension ID,
- type,
- supported frameworks,
- source authority references,
- profile schemas,
- check groups,
- runner or validator entry points,
- normalizers,
- mappings,
- report fragments,
- dependency and license posture.

### Profile Registry

Loads and validates target profiles and assessment profiles.

Target profiles describe the subject being assessed: repository, service,
cluster, product, API, data archive, host, organization, process, or policy set.

Assessment profiles select frameworks, controls, check groups, expectations,
waivers, output policies, and retention policies.

### Local Service Facade

Wraps the CLI/core contracts in a dependency-light local HTTP API. The service
can list extensions, validate profiles, build plans, start assessment jobs,
inspect job status, and fetch generated reports.

The first implementation stores job status in memory and leaves durable evidence
in the normal run directory. It does not introduce separate execution semantics.

### Assessment Planner

Resolves an assessment profile into an executable run plan:

- selected extensions,
- selected check groups,
- required credentials,
- preflight checks,
- dependency checks,
- execution order,
- isolation and timeout policy,
- artifact retention policy.

At execution time, a failing preflight blocks downstream check groups for the
same extension so expensive or misleading harness steps are not invoked.

### Runner Bridge

Executes or coordinates extension checks.

Supported runner kinds:

- local command,
- container command,
- in-process validator,
- remote protocol session,
- hosted test-suite interaction,
- manual evidence request,
- imported result package.

### Artifact Store

Stores run artifacts by reference and checksum:

- raw logs,
- XML/JSON/HTML reports,
- screenshots or rendered documents,
- authority submission files,
- request/response transcripts,
- input forms,
- profile snapshots,
- source lockfiles.

The first implementation builds the assessment package artifact manifest from
runner-emitted artifact refs and computes checksums for files inside the run
directory. New runs also write a source lock and a submission package manifest
that fingerprint reviewable run files and summarize runner or normalizer
metadata reported by extensions.

### Normalizer

Converts extension output into guide-board evidence records.

The normalizer should preserve native identifiers such as test case IDs,
conformance class IDs, control IDs, profile IDs, benchmark IDs, or requirement
references.

### Mapping Engine

Maps evidence to:

- capabilities,
- controls,
- conformance classes,
- requirements,
- policy questions,
- repository quality dimensions,
- scorecard dimensions.

Mappings belong to extensions or assessment packs, not the core.

The first implementation loads extension-owned JSON mapping sets from
`extensions/<extension-id>/mappings/`, joins them to evidence `requirement_refs`,
and writes normalized mapping records under each run directory.

### Expectation And Waiver Engine

Applies declared target posture after evidence normalization.

Use expectations for known optional behavior, unsupported-by-design features, and
accepted gaps.

Use waivers for time-bounded exceptions with owner, reason, expiry, and review
metadata.

The first implementation supports assessment-profile references to JSON
expectation and waiver sets. These policies annotate findings as expected or
waived after evidence normalization and finding creation.

### Report Builder

Builds human and machine-readable outputs:

- compact JSON assessment package,
- Markdown summary,
- extension-specific fragments,
- submission package manifest,
- trend summaries,
- future OSCAL or other interchange exports.

### Retention Index

Keeps compact summaries over time while allowing raw artifact retention to be
bounded by policy. The first implementation writes `retention-summary.json` for
each run and can build a trend summary grouped by target and assessment profile.

## Extension Archetypes

### Executable Harness Extension

Runs an external TCK, CTS, or conformance suite.

Examples: `open-cmis-tck`, OGC TEAM Engine, Jakarta EE TCK, Khronos CTS.

### Validator Extension

Validates structured artifacts against schemas, profiles, or data-stream
requirements.

Examples: SCAP content validation, FHIR resource validation.

### Protocol Service Extension

Coordinates with an external authority-operated service.

Example: NIST ACVP.

### Hosted Suite Extension

Uses a hosted or locally containerized suite with named test plans.

Examples: OpenID Conformance Suite, Inferno.

### Repository Quality Extension

Runs checks against repository configuration, development process, supply chain
signals, and release hygiene.

Example: OpenSSF Scorecard.

### Procedural Evidence Extension

Guides collection of policy, process, and control evidence where no official
executable harness exists.

Examples: GDPR, SOC 2, HIPAA, NF Z 42-013, NF 461, ISO 14641, ISO 15489.

Procedural packs use evidence request sets to describe artifact collection,
review roles, acceptance criteria, confidentiality, renewal expectations, and
waiver paths without reproducing restricted standard text. See
`docs/COMPLIANCE-EVIDENCE-PACKS.md`.

### Hybrid Extension

Combines automated checks, manual evidence, external auditor review, and imported
result packages.

## Core Data Contracts

The first implementation should define these as simple JSON/YAML schemas before
building complex runtime code.

### `Authority`

- `id`
- `name`
- `authority_type`
- `source_urls`
- `frameworks`
- `license_posture`
- `access_constraints`
- `certification_boundary`
- `lifecycle_status`

### `ExtensionManifest`

- `id`
- `name`
- `version`
- `extension_type`
- `supported_frameworks`
- `profile_schemas`
- `check_groups`
- `runner_entrypoints`
- `normalizers`
- `mappings`
- `report_fragments`
- `dependencies`
- `restricted_assets`

### `Framework`

- `id`
- `authority_id`
- `name`
- `version`
- `status`
- `source_urls`
- `requirement_index`
- `profile_index`
- `license_posture`

### `TargetProfile`

- `id`
- `subject_type`
- `subject_name`
- `environment`
- `scope`
- `endpoints`
- `artifacts`
- `credentials_ref`
- `declared_capabilities`
- `known_gaps`

### `AssessmentProfile`

- `id`
- `framework_refs`
- `extension_refs`
- `target_profile_ref`
- `selected_check_groups`
- `expectations_ref`
- `waivers_ref`
- `output_policy`
- `retention_policy`

### `CheckDefinition`

- `id`
- `extension_id`
- `check_type`
- `framework_refs`
- `requirement_refs`
- `inputs`
- `preconditions`
- `timeout`
- `runner_ref`
- `expected_artifacts`

### `RunPlan`

- `id`
- `assessment_profile_snapshot`
- `extension_snapshots`
- `source_lock`
- `ordered_steps`
- `credential_refs`
- `artifact_policy`
- `runtime_policy`

### `SourceLock`

- `framework_refs`
- `extension_refs`
- `frameworks`
- `extensions`
- `mapping_sets`
- `profiles`
- `policy_refs`
- `authorities`
- `metadata_hooks`

### `RawArtifact`

- `id`
- `run_id`
- `path`
- `media_type`
- `producer`
- `checksum`
- `created_at`
- `retention_class`

### `EvidenceItem`

- `id`
- `run_id`
- `extension_id`
- `check_id`
- `subject_ref`
- `result`
- `observations`
- `facts`
- `requirement_refs`
- `artifact_refs`
- `started_at`
- `completed_at`

### `Finding`

- `id`
- `run_id`
- `status`
- `severity`
- `classification`
- `requirement_refs`
- `evidence_refs`
- `expected`
- `waiver_ref`
- `remediation`

### `Waiver`

- `id`
- `scope`
- `requirement_refs`
- `reason`
- `owner`
- `approved_by`
- `created_at`
- `expires_at`
- `review_status`

### `AssessmentPackage`

- `id`
- `run_id`
- `target`
- `frameworks`
- `extensions`
- `source_lock`
- `summary`
- `findings`
- `evidence_refs`
- `artifact_manifest`
- `waivers`
- `certification_boundary`
- `created_at`

### `SubmissionPackage`

- `run_id`
- `package_identity`
- `source_lock_ref`
- `source_lock`
- `reports`
- `normalized_outputs`
- `profile_snapshots`
- `artifact_manifest`
- `reported_metadata`
- `certification_boundary`

## Result Vocabulary

The evidence model should allow these statuses:

- `pass`
- `fail`
- `warning`
- `manual`
- `not_applicable`
- `skipped`
- `expected_gap`
- `waiver_applied`
- `unsupported_by_design`
- `infrastructure_error`
- `blocked`
- `unknown`

The reporting layer should distinguish at least:

- conformant evidence,
- nonconformant evidence,
- expected limitation,
- waived limitation,
- missing evidence,
- infrastructure failure,
- human review required.

## Proposed Repository Layout

```text
guide-board/
  INTENT.md
  README.md
  docs/
    ARCHITECTURE-BLUEPRINT.md
    schemas/
  extensions/
    CANDIDATES.md
    _template/
    sample-noop/
  runs/
  reports/
  workplans/
```

`runs/` and `reports/` should be local generated outputs and ignored by default.
Production extensions should usually live in separate repositories and be
attached with `--extension-dir` or `GUIDE_BOARD_EXTENSION_PATHS`.

## Execution Flow

```text
discover extensions
  -> load authority catalog
  -> validate target profile
  -> validate assessment profile
  -> plan run
  -> run preflight
  -> execute checks
  -> collect artifacts
  -> normalize evidence
  -> map findings
  -> apply expectations and waivers
  -> build assessment package
  -> write reports
  -> retain summaries
```

## Run Directory Contract

Each run should be reproducible from captured metadata where possible.

```text
runs/<run-id>/
  run.json
  retention-summary.json
  plan.json
  sources.lock.json
  target-profile.snapshot.json
  assessment-profile.snapshot.json
  artifacts/
  normalized/
    evidence.json
    findings.json
    mappings.json
  reports/
    report.md
    assessment-package.json
    submission-package.json
  exports/
```

## Container And Service Model

The local CLI should come first. Containerization should preserve the same CLI
contracts.

Recommended container model:

- `guide-board-core` image contains the core CLI and schema tooling.
- Extension dependencies are either installed by extension-specific images or
  mounted as external assets.
- Profiles, credentials, runs, and reports are mounted explicitly.
- Restricted tools are mounted from licensed local paths.
- Network access is declared per extension and per assessment profile.

The baseline `Containerfile` packages the local CLI, schemas, sample profiles,
and incubating extensions. See `docs/CONTAINER.md` for mount contracts and the
extension-specific image path.

Optional service model:

- service lists extensions and profiles,
- service validates and plans runs,
- service starts jobs that call the CLI contracts,
- service streams status and exposes reports,
- service does not invent separate execution semantics.

Candidate API resources:

- `GET /extensions`
- `GET /authorities`
- `POST /profiles/validate`
- `POST /assessments/plan`
- `POST /runs`
- `GET /runs/{run_id}`
- `GET /runs/{run_id}/artifacts`
- `GET /runs/{run_id}/reports`

## Governance Model

### Extension Lifecycle

- `candidate`: researched and registered.
- `incubating`: has an intent and workplan.
- `active`: runnable through core contracts.
- `external`: maintained outside the repo but compatible.
- `deprecated`: retained for historical runs only.

### Challenge And Exclusion Handling

Use separate concepts:

- authority exclusion: imported from an official TCK or program process,
- extension challenge: local claim that a check is invalid or mis-mapped,
- target expectation: declared optional or unsupported behavior,
- waiver: approved and time-bounded exception,
- defect: unexpected product or process failure.

The report must make these visible separately.
The current policy layer loads challenge and exclusion refs from assessment
profiles, annotates findings and evidence, and keeps `unexpected_findings`
visible for gate semantics unless a finding is separately expected or waived.

### Source Locking

Each run should lock:

- extension version,
- framework version,
- harness version,
- authority source URLs,
- test suite IDs,
- mapping version,
- target profile snapshot,
- expectation and waiver refs.

The current source lock remains backward-compatible with the original
`framework_refs` and `extension_refs` fields while adding checksummed profiles,
mapping-set refs, optional policy refs, authority descriptors, and metadata
hooks for runners and normalizers.

## Implementation Sequence

1. Create schema drafts for the core data contracts.
2. Add an extension manifest format and a minimal sample extension.
3. Build the CLI commands: `extensions list`, `profile validate`, `plan`, `run`,
   and `report`.
4. Integrate `open-cmis-tck` through the same contracts.
5. Add generated-output ignores for `runs/` and `reports/`.
6. Add container design after the CLI baseline is stable.
7. Add optional service API around the CLI job model.
8. Add OSCAL export and procedural evidence-pack support after the internal
   evidence model proves itself with executable extensions.

The first extension SDK contract is documented in `docs/EXTENSION-SDK.md`.