Add challenge and exclusion review handling

2026-05-16 02:58:18 +02:00
parent c8ac42154c
commit b1dff0440d
16 changed files with 644 additions and 21 deletions
--- a/docs/ARCHITECTURE-BLUEPRINT.md
+++ b/docs/ARCHITECTURE-BLUEPRINT.md
@@ -803,6 +803,9 @@ Use separate concepts:
 - defect: unexpected product or process failure.

 The report must make these visible separately.
+The current policy layer loads challenge and exclusion refs from assessment
+profiles, annotates findings and evidence, and keeps `unexpected_findings`
+visible for gate semantics unless a finding is separately expected or waived.

 ### Source Locking

--- a/docs/ASSESSMENT-OPERATIONS.md
+++ b/docs/ASSESSMENT-OPERATIONS.md
@@ -27,7 +27,8 @@ Every run needs:

 The target profile describes the candidate system or artifact being assessed.
 The assessment profile selects frameworks, extensions, check groups, runtime
-policy, waivers, expectations, and output policy.
+policy, expectations, waivers, challenges, authority exclusions, and output
+policy.

 ## CLI Flow

@@ -99,10 +100,10 @@ artifacts/
 ```

 `sources.lock.json` records the framework refs, extension versions, mapping
-sets, profile snapshots, policy refs, authority refs, and extension metadata
-hooks used for the run. `reports/submission-package.json` points at the
-reviewable package files, includes checksums where files exist, carries the raw
-artifact manifest, and repeats the certification boundary. It is a portable
+sets, profile snapshots, policy and review refs, authority refs, and extension
+metadata hooks used for the run. `reports/submission-package.json` points at
+the reviewable package files, includes checksums where files exist, carries the
+raw artifact manifest, and repeats the certification boundary. It is a portable
 handoff manifest for preparation evidence, not an authority-specific final
 submission.

@@ -200,6 +201,23 @@ Individual evidence items use:
 - `expected_gap`
 - `infrastructure_error`

+## Review State
+
+Assessment profiles may reference:
+
+- `expectations_ref`: known target posture, optional scope, or accepted gaps,
+- `waivers_ref`: approved, time-bounded exceptions,
+- `challenges_ref`: review claims that a finding, check, mapping, or native
+  result should be challenged,
+- `exclusions_ref`: authority or program exclusions that apply to selected
+  findings.
+
+Challenges and exclusions annotate findings and evidence. They do not silently
+turn failures into passing evidence and they do not reduce the
+`unexpected_findings` count used by default gates. Retained summaries expose
+separate counts for expected findings, waived findings, challenged findings,
+authority exclusions, unresolved defects, and unresolved review items.
+
 ## Candidate System Checklist

 Before starting a run against candidate software, confirm:
--- a/docs/COMPLIANCE-EVIDENCE-PACKS.md
+++ b/docs/COMPLIANCE-EVIDENCE-PACKS.md
@@ -8,8 +8,8 @@ Created: 2026-05-07
 Compliance evidence packs cover frameworks where guide-board cannot rely on an
 official executable harness. They help prepare and perform assessments by
 organizing evidence requests, expected artifacts, reviewer workflow, waivers,
-and run reports. They do not replace auditors, accredited certification bodies,
-legal counsel, or official standard text.
+challenges, authority exclusions, and run reports. They do not replace auditors,
+accredited certification bodies, legal counsel, or official standard text.

 Examples include GDPR, SOC 2, HIPAA, NF Z 42-013, NF 461, ISO 14641, ISO 15489,
 and similar procedural or control-oriented frameworks.
@@ -83,7 +83,7 @@ Each request should include:

 Requests should be phrased as collection guidance, not as legal conclusions.

-## Waivers And Expected Gaps
+## Review Policy Records

 Evidence packs use the same expectation and waiver model as executable
 extensions.
@@ -103,6 +103,16 @@ Use waivers for:

 Every waiver should include owner, reason, approval status, and expiry.

+Use challenges for disputed checks, disputed mappings, imported native result
+questions, or evidence that needs a reviewer decision before it can be treated
+as a defect. Use authority exclusions only when a program, standard, or
+authorized reviewer excludes a requirement or check from the assessment scope.
+Both records should cite stable requirement refs, check refs, evidence refs, or
+authority source refs rather than reproducing restricted standard text.
+
+Challenges and exclusions make review state visible; they do not by themselves
+claim compliance or remove default gate-visible unexpected findings.
+
 ## Framework Notes

 GDPR packs should emphasize processing inventory, lawful basis records, data
@@ -129,6 +139,7 @@ extensions:

 - normalized evidence,
 - findings,
+- review annotations for expectations, waivers, challenges, and exclusions,
 - mapping records,
 - assessment packages,
 - retention summaries,
--- a/docs/EXTENSION-SDK.md
+++ b/docs/EXTENSION-SDK.md
@@ -250,6 +250,33 @@ Expectation sets mark known posture as expected. Waiver sets mark approved,
 time-bounded exceptions. Both are applied after findings are generated, and the
 assessment package records policy summary counts.

+## Challenges And Authority Exclusions
+
+Assessment profiles may also reference challenge and exclusion sets:
+
+```json
+{
+  "challenges_ref": "profiles/challenges/example.json",
+  "exclusions_ref": "profiles/exclusions/example.json"
+}
+```
+
+Challenge sets validate against `docs/schemas/challenge-set.schema.json`.
+Exclusion sets validate against `docs/schemas/exclusion-set.schema.json`.
+Records can match findings by requirement refs, check refs, evidence refs,
+result refs, or classification refs. They also carry owner, review status,
+rationale, authority source refs, review dates, optional expiry, native IDs,
+and free-form metadata.
+
+Use challenges when an extension author or assessment team believes a finding
+needs review because a check is invalid, a native harness result is disputed, or
+a mapping is wrong. Use exclusions when an authority or program explicitly
+removes a requirement, check, or result from the assessment scope. The core
+preserves these distinctions in findings, evidence review annotations,
+assessment packages, reports, and retained summaries, but default gate semantics
+still count the underlying finding as unexpected unless it is separately
+expected or waived.
+
 ## Python Runner Contract

 A Python runner receives one context object and returns one result object.
--- a/docs/schemas/assessment-package.schema.json
+++ b/docs/schemas/assessment-package.schema.json
@@ -17,6 +17,8 @@
    "evidence_refs",
    "artifact_manifest",
    "waivers",
+    "challenges",
+    "exclusions",
    "certification_boundary",
    "created_at"
  ],
@@ -34,6 +36,8 @@
    "evidence_refs": { "type": "array", "items": { "type": "string" } },
    "artifact_manifest": { "type": "array", "items": { "type": "object" } },
    "waivers": { "type": "array", "items": { "type": "object" } },
+    "challenges": { "type": "array", "items": { "type": "object" } },
+    "exclusions": { "type": "array", "items": { "type": "object" } },
    "certification_boundary": { "type": "string" },
    "created_at": { "type": "string" }
  }
--- a/docs/schemas/assessment-profile.schema.json
+++ b/docs/schemas/assessment-profile.schema.json
@@ -28,6 +28,8 @@
    },
    "expectations_ref": { "type": ["string", "null"] },
    "waivers_ref": { "type": ["string", "null"] },
+    "challenges_ref": { "type": ["string", "null"] },
+    "exclusions_ref": { "type": ["string", "null"] },
    "output_policy": { "type": "object" },
    "retention_policy": { "type": "object" },
    "runtime_policy": { "type": "object" }
--- a/docs/schemas/challenge-set.schema.json
+++ b/docs/schemas/challenge-set.schema.json
@@ -0,0 +1,56 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "title": "Guide Board Challenge Set",
+  "type": "object",
+  "additionalProperties": false,
+  "required": [
+    "id",
+    "target_profile_ref",
+    "challenges"
+  ],
+  "properties": {
+    "id": { "type": "string" },
+    "target_profile_ref": { "type": "string" },
+    "challenges": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": [
+          "id",
+          "requirement_refs",
+          "check_refs",
+          "evidence_refs",
+          "result_refs",
+          "classification_refs",
+          "authority_source_refs",
+          "owner",
+          "review_status",
+          "rationale",
+          "created_at",
+          "review_due_at",
+          "expires_at",
+          "native_challenge_id",
+          "metadata"
+        ],
+        "properties": {
+          "id": { "type": "string" },
+          "requirement_refs": { "type": "array", "items": { "type": "string" } },
+          "check_refs": { "type": "array", "items": { "type": "string" } },
+          "evidence_refs": { "type": "array", "items": { "type": "string" } },
+          "result_refs": { "type": "array", "items": { "type": "string" } },
+          "classification_refs": { "type": "array", "items": { "type": "string" } },
+          "authority_source_refs": { "type": "array", "items": { "type": "string" } },
+          "owner": { "type": "string" },
+          "review_status": { "type": "string" },
+          "rationale": { "type": "string" },
+          "created_at": { "type": "string" },
+          "review_due_at": { "type": ["string", "null"] },
+          "expires_at": { "type": ["string", "null"] },
+          "native_challenge_id": { "type": ["string", "null"] },
+          "metadata": { "type": "object" }
+        }
+      }
+    }
+  }
+}
--- a/docs/schemas/evidence-item.schema.json
+++ b/docs/schemas/evidence-item.schema.json
@@ -42,6 +42,7 @@
    },
    "observations": { "type": "array", "items": { "type": "string" } },
    "facts": { "type": "object" },
+    "review": { "type": "object" },
    "requirement_refs": { "type": "array", "items": { "type": "string" } },
    "artifact_refs": { "type": "array", "items": { "type": "string" } },
    "started_at": { "type": "string" },
--- a/docs/schemas/exclusion-set.schema.json
+++ b/docs/schemas/exclusion-set.schema.json
@@ -0,0 +1,60 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "title": "Guide Board Authority Exclusion Set",
+  "type": "object",
+  "additionalProperties": false,
+  "required": [
+    "id",
+    "target_profile_ref",
+    "exclusions"
+  ],
+  "properties": {
+    "id": { "type": "string" },
+    "target_profile_ref": { "type": "string" },
+    "exclusions": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": [
+          "id",
+          "authority_ref",
+          "requirement_refs",
+          "check_refs",
+          "evidence_refs",
+          "result_refs",
+          "classification_refs",
+          "authority_source_refs",
+          "owner",
+          "approved_by",
+          "review_status",
+          "rationale",
+          "created_at",
+          "review_due_at",
+          "expires_at",
+          "native_exclusion_id",
+          "metadata"
+        ],
+        "properties": {
+          "id": { "type": "string" },
+          "authority_ref": { "type": "string" },
+          "requirement_refs": { "type": "array", "items": { "type": "string" } },
+          "check_refs": { "type": "array", "items": { "type": "string" } },
+          "evidence_refs": { "type": "array", "items": { "type": "string" } },
+          "result_refs": { "type": "array", "items": { "type": "string" } },
+          "classification_refs": { "type": "array", "items": { "type": "string" } },
+          "authority_source_refs": { "type": "array", "items": { "type": "string" } },
+          "owner": { "type": "string" },
+          "approved_by": { "type": ["string", "null"] },
+          "review_status": { "type": "string" },
+          "rationale": { "type": "string" },
+          "created_at": { "type": "string" },
+          "review_due_at": { "type": ["string", "null"] },
+          "expires_at": { "type": ["string", "null"] },
+          "native_exclusion_id": { "type": ["string", "null"] },
+          "metadata": { "type": "object" }
+        }
+      }
+    }
+  }
+}
--- a/docs/schemas/finding.schema.json
+++ b/docs/schemas/finding.schema.json
@@ -14,7 +14,10 @@
    "evidence_refs",
    "expected",
    "waiver_ref",
+    "challenge_ref",
+    "exclusion_ref",
    "policy_ref",
+    "review_status",
    "remediation"
  ],
  "properties": {
@@ -28,7 +31,10 @@
    "evidence_refs": { "type": "array", "items": { "type": "string" } },
    "expected": { "type": "boolean" },
    "waiver_ref": { "type": ["string", "null"] },
+    "challenge_ref": { "type": ["string", "null"] },
+    "exclusion_ref": { "type": ["string", "null"] },
    "policy_ref": { "type": ["string", "null"] },
+    "review_status": { "type": "string" },
    "remediation": { "type": ["string", "null"] }
  }
 }
--- a/src/guide_board/execution.py
+++ b/src/guide_board/execution.py
@@ -35,7 +35,15 @@ def run_assessment(
        assert_valid(item, "evidence-item")

    findings = _findings_for_evidence(run_id, evidence)
-    findings, policy_summary, applied_waivers = apply_policy(root, plan, findings)
+    (
+        findings,
+        policy_summary,
+        applied_waivers,
+        applied_challenges,
+        applied_exclusions,
+    ) = apply_policy(root, plan, evidence, findings)
+    for item in evidence:
+        assert_valid(item, "evidence-item")
    for finding in findings:
        assert_valid(finding, "finding")

@@ -52,6 +60,8 @@ def run_assessment(
        mapping_summary,
        policy_summary,
        applied_waivers,
+        applied_challenges,
+        applied_exclusions,
        created_at,
    )
    assert_valid(assessment_package, "assessment-package")
@@ -308,6 +318,7 @@ def _findings_for_evidence(run_id: str, evidence: list[dict[str, Any]]) -> list[
    for item in evidence:
        if item["result"] not in {"blocked", "fail", "infrastructure_error"}:
            continue
+        expected = _expected_for_item(item)
        findings.append(
            {
                "id": f"finding:{item['check_id']}",
@@ -318,9 +329,12 @@ def _findings_for_evidence(run_id: str, evidence: list[dict[str, Any]]) -> list[
                "classification": _classification_for_item(item),
                "requirement_refs": item["requirement_refs"],
                "evidence_refs": [item["id"]],
-                "expected": _expected_for_item(item),
+                "expected": expected,
                "waiver_ref": None,
+                "challenge_ref": None,
+                "exclusion_ref": None,
                "policy_ref": None,
+                "review_status": "expected" if expected else "unresolved_defect",
                "remediation": _remediation_for_item(item),
            }
        )
@@ -382,6 +396,8 @@ def _assessment_package(
    mapping_summary: dict[str, Any],
    policy_summary: dict[str, Any],
    applied_waivers: list[dict[str, Any]],
+    applied_challenges: list[dict[str, Any]],
+    applied_exclusions: list[dict[str, Any]],
    created_at: str,
 ) -> dict[str, Any]:
    summary = dict(Counter(item["result"] for item in evidence))
@@ -401,6 +417,8 @@ def _assessment_package(
        "evidence_refs": [item["id"] for item in evidence],
        "artifact_manifest": artifact_manifest,
        "waivers": applied_waivers,
+        "challenges": applied_challenges,
+        "exclusions": applied_exclusions,
        "certification_boundary": "Guide Board produces preparation evidence only and does not issue certifications or audit assurance.",
        "created_at": created_at,
    }
@@ -452,6 +470,7 @@ def _markdown_report(run_metadata: dict[str, Any], package: dict[str, Any]) -> s
        summary_lines = "- no evidence produced"
    mapping_lines = _mapping_summary_lines(package)
    policy_lines = _policy_summary_lines(package)
+    review_lines = _review_summary_lines(package)

    return "\n".join(
        [
@@ -473,6 +492,10 @@ def _markdown_report(run_metadata: dict[str, Any], package: dict[str, Any]) -> s
            "",
            policy_lines,
            "",
+            "## Review",
+            "",
+            review_lines,
+            "",
            "## Boundary",
            "",
            package["certification_boundary"],
@@ -502,10 +525,27 @@ def _policy_summary_lines(package: dict[str, Any]) -> str:
            f"- applied expectations: {summary.get('applied_expectations', 0)}",
            f"- applied waivers: {summary.get('applied_waivers', 0)}",
            f"- unexpected findings: {summary.get('unexpected_findings', 0)}",
+            f"- challenged findings: {summary.get('challenged_findings', 0)}",
+            f"- authority exclusions: {summary.get('authority_exclusions', 0)}",
+            f"- unresolved defects: {summary.get('unresolved_defects', 0)}",
        ]
    )


+def _review_summary_lines(package: dict[str, Any]) -> str:
+    findings = package.get("findings", [])
+    if not findings:
+        return "- no findings requiring review"
+    counts = Counter(
+        finding.get("review_status", "unreviewed")
+        for finding in findings
+        if isinstance(finding, dict)
+    )
+    return "\n".join(
+        f"- {status}: {count}" for status, count in sorted(counts.items())
+    )
+
+
 def _run_status(evidence: list[dict[str, Any]]) -> str:
    if any(item["result"] == "fail" for item in evidence):
        return "failed"
--- a/src/guide_board/planning.py
+++ b/src/guide_board/planning.py
@@ -262,6 +262,18 @@ def _build_source_lock(
                assessment.get("waivers_ref"),
                "waiver-set",
            ),
+            "challenges": _optional_policy_source_record(
+                root,
+                assessment_path,
+                assessment.get("challenges_ref"),
+                "challenge-set",
+            ),
+            "exclusions": _optional_policy_source_record(
+                root,
+                assessment_path,
+                assessment.get("exclusions_ref"),
+                "exclusion-set",
+            ),
        },
        "authorities": _authority_source_records(extensions),
        "metadata_hooks": {
--- a/src/guide_board/policy.py
+++ b/src/guide_board/policy.py
@@ -13,20 +13,36 @@ from guide_board.schema import assert_valid
 def apply_policy(
    root: Path,
    plan: dict[str, Any],
+    evidence: list[dict[str, Any]],
    findings: list[dict[str, Any]],
-) -> tuple[list[dict[str, Any]], dict[str, Any], list[dict[str, Any]]]:
+) -> tuple[
+    list[dict[str, Any]],
+    dict[str, Any],
+    list[dict[str, Any]],
+    list[dict[str, Any]],
+    list[dict[str, Any]],
+]:
    expectations = _load_optional_set(root, plan, "expectations_ref", "expectation-set")
    waiver_set = _load_optional_set(root, plan, "waivers_ref", "waiver-set")
+    challenge_set = _load_optional_set(root, plan, "challenges_ref", "challenge-set")
+    exclusion_set = _load_optional_set(root, plan, "exclusions_ref", "exclusion-set")
    waivers = waiver_set.get("waivers", []) if waiver_set else []
+    challenges = challenge_set.get("challenges", []) if challenge_set else []
+    exclusions = exclusion_set.get("exclusions", []) if exclusion_set else []

    applied_expectations = 0
    applied_waivers: list[dict[str, Any]] = []
+    applied_challenges: list[dict[str, Any]] = []
+    applied_exclusions: list[dict[str, Any]] = []
+    evidence_by_id = {item["id"]: item for item in evidence}

    for finding in findings:
        for expectation in expectations.get("expectations", []) if expectations else []:
            if _matches_rule(finding, expectation):
                finding["expected"] = expectation["expected"]
                finding["policy_ref"] = expectation["id"]
+                finding["review_status"] = "expected" if expectation["expected"] else "unresolved_defect"
+                _annotate_evidence(evidence_by_id, finding, "expectation_refs", expectation["id"])
                applied_expectations += 1
                break

@@ -37,20 +53,60 @@ def apply_policy(
                finding["waiver_ref"] = waiver["id"]
                finding["expected"] = True
                finding["policy_ref"] = waiver["id"]
+                finding["review_status"] = "waived"
                finding["remediation"] = f"Waived: {waiver['reason']}"
                applied_waivers.append(waiver)
+                _annotate_evidence(evidence_by_id, finding, "waiver_refs", waiver["id"])
+                break
+
+        for exclusion in exclusions:
+            if not _review_record_active(exclusion):
+                continue
+            if _matches_rule(finding, exclusion):
+                finding["exclusion_ref"] = exclusion["id"]
+                if finding.get("review_status") == "unresolved_defect":
+                    finding["review_status"] = "authority_excluded"
+                applied_exclusions.append(exclusion)
+                _annotate_evidence(evidence_by_id, finding, "exclusion_refs", exclusion["id"])
+                break
+
+        for challenge in challenges:
+            if not _review_record_active(challenge):
+                continue
+            if _matches_rule(finding, challenge):
+                finding["challenge_ref"] = challenge["id"]
+                if finding.get("review_status") == "unresolved_defect":
+                    finding["review_status"] = "challenged"
+                applied_challenges.append(challenge)
+                _annotate_evidence(evidence_by_id, finding, "challenge_refs", challenge["id"])
                break

    policy_summary = {
        "expectations_ref": plan["assessment_profile_snapshot"].get("expectations_ref"),
        "waivers_ref": plan["assessment_profile_snapshot"].get("waivers_ref"),
+        "challenges_ref": plan["assessment_profile_snapshot"].get("challenges_ref"),
+        "exclusions_ref": plan["assessment_profile_snapshot"].get("exclusions_ref"),
        "applied_expectations": applied_expectations,
        "applied_waivers": len(applied_waivers),
+        "challenged_findings": _unique_applied_count(findings, "challenge_ref"),
+        "authority_exclusions": _unique_applied_count(findings, "exclusion_ref"),
        "unexpected_findings": sum(
            1 for finding in findings if not finding.get("expected") and not finding.get("waiver_ref")
        ),
+        "unresolved_defects": sum(
+            1 for finding in findings if finding.get("review_status") == "unresolved_defect"
+        ),
+        "unresolved_review_items": sum(
+            1 for finding in findings if finding.get("review_status") in {"challenged", "authority_excluded"}
+        ),
    }
-    return findings, policy_summary, applied_waivers
+    return (
+        findings,
+        policy_summary,
+        _dedupe_records(applied_waivers),
+        _dedupe_records(applied_challenges),
+        _dedupe_records(applied_exclusions),
+    )


 def _load_optional_set(
@@ -94,6 +150,7 @@ def _matches_rule(finding: dict[str, Any], rule: dict[str, Any]) -> bool:
    return (
        _matches_any(finding.get("requirement_refs", []), rule.get("requirement_refs", []))
        and _matches_any([finding.get("check_id", "")], rule.get("check_refs", []))
+        and _matches_any(finding.get("evidence_refs", []), rule.get("evidence_refs", []))
        and _matches_scalar(finding.get("status"), rule.get("result_refs", []))
        and _matches_scalar(finding.get("classification"), rule.get("classification_refs", []))
    )
@@ -122,3 +179,57 @@ def _waiver_active(waiver: dict[str, Any]) -> bool:
    except ValueError:
        return False
    return expiry >= date.today()
+
+
+def _review_record_active(record: dict[str, Any]) -> bool:
+    status = record.get("review_status")
+    if status in {"rejected", "withdrawn", "closed", "expired"}:
+        return False
+    expires_at = record.get("expires_at")
+    if not expires_at:
+        return True
+    try:
+        expiry = date.fromisoformat(expires_at)
+    except ValueError:
+        return False
+    return expiry >= date.today()
+
+
+def _annotate_evidence(
+    evidence_by_id: dict[str, dict[str, Any]],
+    finding: dict[str, Any],
+    ref_key: str,
+    ref_value: str,
+) -> None:
+    for evidence_ref in finding.get("evidence_refs", []):
+        item = evidence_by_id.get(evidence_ref)
+        if item is None:
+            continue
+        review = item.setdefault(
+            "review",
+            {
+                "expectation_refs": [],
+                "waiver_refs": [],
+                "challenge_refs": [],
+                "exclusion_refs": [],
+            },
+        )
+        refs = review.setdefault(ref_key, [])
+        if ref_value not in refs:
+            refs.append(ref_value)
+
+
+def _unique_applied_count(findings: list[dict[str, Any]], ref_name: str) -> int:
+    return sum(1 for finding in findings if finding.get(ref_name))
+
+
+def _dedupe_records(records: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    seen = set()
+    deduped = []
+    for record in records:
+        record_id = record.get("id")
+        if not isinstance(record_id, str) or record_id in seen:
+            continue
+        seen.add(record_id)
+        deduped.append(record)
+    return deduped
--- a/src/guide_board/retention.py
+++ b/src/guide_board/retention.py
@@ -37,6 +37,10 @@ def build_retention_summary(
            "unexpected_findings": policy_summary.get("unexpected_findings", 0),
            "expected_findings": sum(1 for finding in findings if finding.get("expected")),
            "waived_findings": sum(1 for finding in findings if finding.get("waiver_ref")),
+            "challenged_findings": policy_summary.get("challenged_findings", 0),
+            "authority_exclusions": policy_summary.get("authority_exclusions", 0),
+            "unresolved_defects": policy_summary.get("unresolved_defects", 0),
+            "unresolved_review_items": policy_summary.get("unresolved_review_items", 0),
            "mapping_target_count": len(
                assessment_package.get("mapping_summary", {}).get("targets", [])
            ),
@@ -197,6 +201,10 @@ def _run_projection(run: dict[str, Any]) -> dict[str, Any]:
        "unexpected_findings": _summary_int(summary, "unexpected_findings"),
        "finding_count": _summary_int(summary, "finding_count"),
        "artifact_count": _summary_int(summary, "artifact_count"),
+        "challenged_findings": _summary_int(summary, "challenged_findings"),
+        "authority_exclusions": _summary_int(summary, "authority_exclusions"),
+        "unresolved_defects": _summary_int(summary, "unresolved_defects"),
+        "unresolved_review_items": _summary_int(summary, "unresolved_review_items"),
        "run_dir": run.get("run_dir"),
    }

@@ -211,9 +219,10 @@ def _trend_between(
            "status_changed": False,
            "unexpected_findings_delta": 0,
            "finding_count_delta": 0,
-            "artifact_count_delta": 0,
-            "evidence_result_deltas": {},
-        }
+        "artifact_count_delta": 0,
+        "unresolved_review_items_delta": 0,
+        "evidence_result_deltas": {},
+    }

    previous_summary = previous.get("summary", {})
    latest_summary = latest.get("summary", {})
@@ -230,6 +239,9 @@ def _trend_between(
    artifact_delta = _summary_int(latest_summary, "artifact_count") - _summary_int(
        previous_summary, "artifact_count"
    )
+    review_delta = _summary_int(latest_summary, "unresolved_review_items") - _summary_int(
+        previous_summary, "unresolved_review_items"
+    )
    previous_status = _status_for(previous)
    latest_status = _status_for(latest)

@@ -239,6 +251,7 @@ def _trend_between(
        "unexpected_findings_delta": unexpected_delta,
        "finding_count_delta": finding_delta,
        "artifact_count_delta": artifact_delta,
+        "unresolved_review_items_delta": review_delta,
        "evidence_result_deltas": evidence_deltas,
    }

--- a/tests/test_core.py
+++ b/tests/test_core.py
@@ -334,6 +334,69 @@ class CoreArchitectureTests(unittest.TestCase):
            self.assertEqual(len(mappings), 1)
            self.assertEqual(mappings[0]["target_id"], "profile-readiness")

+    def test_applies_challenges_and_exclusions_without_hiding_gate_failures(self) -> None:
+        with TemporaryDirectory() as temporary_directory:
+            temp_root = Path(temporary_directory)
+            extension_dir = temp_root / "review-noop"
+            _write_review_extension(extension_dir)
+            target_path = temp_root / "review-target.json"
+            assessment_path = temp_root / "review-assessment.json"
+            challenge_path = temp_root / "review-challenges.json"
+            exclusion_path = temp_root / "review-exclusions.json"
+            _write_review_target(target_path)
+            _write_review_assessment(assessment_path)
+            _write_review_challenges(challenge_path)
+            _write_review_exclusions(exclusion_path)
+
+            result = run_assessment(
+                ROOT,
+                target_path,
+                assessment_path,
+                temp_root / "runs" / "review",
+                [extension_dir],
+            )
+            run_dir = Path(result["run_dir"])
+            evidence = json.loads(
+                (run_dir / "normalized" / "evidence.json").read_text(encoding="utf-8")
+            )["evidence"]
+            assessment_package = json.loads(
+                (run_dir / "reports" / "assessment-package.json").read_text(encoding="utf-8")
+            )
+            retention = json.loads(
+                (run_dir / "retention-summary.json").read_text(encoding="utf-8")
+            )
+            report = (run_dir / "reports" / "report.md").read_text(encoding="utf-8")
+
+            self.assertEqual(result["status"], "blocked")
+            finding = assessment_package["findings"][0]
+            self.assertEqual(finding["challenge_ref"], "challenge-review-blocked")
+            self.assertEqual(finding["exclusion_ref"], "exclusion-review-blocked")
+            self.assertEqual(finding["review_status"], "authority_excluded")
+            self.assertFalse(finding["expected"])
+            self.assertEqual(assessment_package["policy_summary"]["unexpected_findings"], 1)
+            self.assertEqual(assessment_package["policy_summary"]["challenged_findings"], 1)
+            self.assertEqual(assessment_package["policy_summary"]["authority_exclusions"], 1)
+            self.assertEqual(assessment_package["policy_summary"]["unresolved_defects"], 0)
+            self.assertEqual(
+                evidence[1]["review"]["challenge_refs"],
+                ["challenge-review-blocked"],
+            )
+            self.assertEqual(
+                evidence[1]["review"]["exclusion_refs"],
+                ["exclusion-review-blocked"],
+            )
+            self.assertEqual(assessment_package["challenges"][0]["owner"], "qa")
+            self.assertEqual(assessment_package["exclusions"][0]["authority_ref"], "review-authority")
+            self.assertEqual(retention["summary"]["challenged_findings"], 1)
+            self.assertEqual(retention["summary"]["authority_exclusions"], 1)
+            self.assertEqual(retention["summary"]["unresolved_review_items"], 1)
+            self.assertIn("- authority_excluded: 1", report)
+
+            gate = evaluate_trend_gates(build_trend_summary(temp_root / "runs"))
+            self.assertEqual(gate["status"], "failed")
+            checks = {check["id"]: check for check in gate["groups"][0]["checks"]}
+            self.assertEqual(checks["unexpected-findings"]["observed"], 1)
+
    def test_serves_local_api_run_lifecycle(self) -> None:
        with TemporaryDirectory() as temporary_directory:
            service = start_service(ROOT, host="127.0.0.1", port=0)
@@ -742,5 +805,166 @@ def _write_schema_assessment(path: Path, runtime_policy: dict[str, object]) -> N
    )


+def _write_review_extension(extension_dir: Path) -> None:
+    extension_dir.mkdir(parents=True, exist_ok=True)
+    (extension_dir / "extension.json").write_text(
+        json.dumps(
+            {
+                "id": "review-noop",
+                "name": "Review No-op",
+                "version": "0.1.0",
+                "extension_type": "repository_quality",
+                "lifecycle_status": "incubating",
+                "supported_frameworks": ["review.framework.v1"],
+                "authorities": ["review-authority"],
+                "profile_schemas": ["target-profile", "assessment-profile"],
+                "check_groups": [
+                    {
+                        "id": "review",
+                        "name": "Review",
+                        "check_type": "repository_quality",
+                        "requirement_refs": ["review.requirement"],
+                        "runner_ref": "external-review",
+                    }
+                ],
+                "preflight_runner": None,
+                "runner_entrypoints": [
+                    {
+                        "id": "external-review",
+                        "kind": "external",
+                        "module_path": None,
+                        "callable": None,
+                        "command": None,
+                        "metadata": {"test_suite_id": "review-suite"},
+                        "description": "External runner used to produce reviewable blocked evidence.",
+                    }
+                ],
+                "normalizers": [],
+                "mappings": [],
+                "report_fragments": [],
+                "dependencies": [],
+                "restricted_assets": [],
+                "certification_boundary": "Review fixture only.",
+            }
+        ),
+        encoding="utf-8",
+    )
+
+
+def _write_review_target(path: Path) -> None:
+    path.write_text(
+        json.dumps(
+            {
+                "id": "review-target",
+                "subject_type": "repository",
+                "subject_name": "Review Target",
+                "environment": "test",
+                "scope": ["review"],
+                "endpoints": [],
+                "artifacts": [],
+                "credentials_ref": None,
+                "declared_capabilities": [],
+                "known_gaps": [],
+            }
+        ),
+        encoding="utf-8",
+    )
+
+
+def _write_review_assessment(path: Path) -> None:
+    path.write_text(
+        json.dumps(
+            {
+                "id": "review-assessment",
+                "framework_refs": ["review.framework.v1"],
+                "extension_refs": ["review-noop"],
+                "target_profile_ref": "review-target",
+                "selected_check_groups": {"review-noop": ["review"]},
+                "expectations_ref": None,
+                "waivers_ref": None,
+                "challenges_ref": "review-challenges.json",
+                "exclusions_ref": "review-exclusions.json",
+                "output_policy": {
+                    "report_formats": ["json", "markdown"],
+                    "artifact_retention": "summary-only",
+                },
+                "retention_policy": {
+                    "summary_days": 365,
+                    "raw_artifact_days": 0,
+                },
+                "runtime_policy": {
+                    "offline": True,
+                    "timeout_seconds": 2,
+                },
+            }
+        ),
+        encoding="utf-8",
+    )
+
+
+def _write_review_challenges(path: Path) -> None:
+    path.write_text(
+        json.dumps(
+            {
+                "id": "review-challenges",
+                "target_profile_ref": "review-target",
+                "challenges": [
+                    {
+                        "id": "challenge-review-blocked",
+                        "requirement_refs": ["review.requirement"],
+                        "check_refs": ["check-group:review-noop:review"],
+                        "evidence_refs": [],
+                        "result_refs": ["blocked"],
+                        "classification_refs": ["runner_not_implemented"],
+                        "authority_source_refs": ["review-authority:rule-1"],
+                        "owner": "qa",
+                        "review_status": "open",
+                        "rationale": "The external suite is not wired in this fixture.",
+                        "created_at": "2026-05-16",
+                        "review_due_at": "2026-06-16",
+                        "expires_at": None,
+                        "native_challenge_id": "native-challenge-1",
+                        "metadata": {"kind": "fixture"},
+                    }
+                ],
+            }
+        ),
+        encoding="utf-8",
+    )
+
+
+def _write_review_exclusions(path: Path) -> None:
+    path.write_text(
+        json.dumps(
+            {
+                "id": "review-exclusions",
+                "target_profile_ref": "review-target",
+                "exclusions": [
+                    {
+                        "id": "exclusion-review-blocked",
+                        "authority_ref": "review-authority",
+                        "requirement_refs": ["review.requirement"],
+                        "check_refs": ["check-group:review-noop:review"],
+                        "evidence_refs": [],
+                        "result_refs": ["blocked"],
+                        "classification_refs": ["runner_not_implemented"],
+                        "authority_source_refs": ["review-authority:rule-1"],
+                        "owner": "qa",
+                        "approved_by": "authority-reviewer",
+                        "review_status": "approved",
+                        "rationale": "Fixture demonstrates authority exclusion annotation.",
+                        "created_at": "2026-05-16",
+                        "review_due_at": "2026-06-16",
+                        "expires_at": None,
+                        "native_exclusion_id": "native-exclusion-1",
+                        "metadata": {"kind": "fixture"},
+                    }
+                ],
+            }
+        ),
+        encoding="utf-8",
+    )
+
+
 if __name__ == "__main__":
    unittest.main()
--- a/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md
+++ b/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md
@@ -4,12 +4,12 @@ type: workplan
 title: "Challenge And Exclusion Handling"
 repo: guide-board
 domain: markitect
-status: active
+status: completed
 owner: codex
 planning_priority: high
 planning_order: 5
 created: "2026-05-15"
-updated: "2026-05-15"
+updated: "2026-05-16"
 state_hub_workstream_id: "fb11e1c7-6c0c-4ec7-a163-da98b2fe9f8f"
 ---

@@ -42,7 +42,7 @@ but the core should preserve them without embedding domain policy.

 ```task
 id: GUIDE-BOARD-WP-0005-T001
-status: todo
+status: done
 priority: high
 state_hub_task_id: "6ff4e6f7-bce6-4e7f-a5af-e0c67cfa7e55"
 ```
@@ -57,11 +57,21 @@ Acceptance:
 - Keep the data contract usable by executable harnesses, hosted suites, and
  procedural packs.

+Progress:
+
+- Added `docs/schemas/challenge-set.schema.json` and
+  `docs/schemas/exclusion-set.schema.json`.
+- Added optional `challenges_ref` and `exclusions_ref` assessment profile
+  fields.
+- Supported requirement, check, evidence, result, classification, authority
+  source, owner, review status, rationale, review date, expiry, native ID, and
+  metadata fields.
+
 ## D5.2 - Policy Application And Finding Annotation

 ```task
 id: GUIDE-BOARD-WP-0005-T002
-status: todo
+status: done
 priority: high
 state_hub_task_id: "fd384bd3-40c4-4344-8b7d-cb123dbf2cac"
 ```
@@ -76,11 +86,20 @@ Acceptance:
 - Add tests that prove challenge and exclusion records affect reporting without
  corrupting gate semantics.

+Progress:
+
+- Loaded challenge and exclusion refs through the policy layer.
+- Annotated findings with challenge refs, exclusion refs, and review status.
+- Annotated matching evidence with review refs.
+- Kept default `unexpected_findings` gate semantics visible unless a finding is
+  separately expected or waived.
+- Added tests proving challenged and excluded findings remain gate-visible.
+
 ## D5.3 - Report Visibility And Review Workflow

 ```task
 id: GUIDE-BOARD-WP-0005-T003
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "791071c0-8a9a-462b-83b3-75548bb8524f"
 ```
@@ -94,11 +113,19 @@ Acceptance:
  run.
 - Document how an operator should treat challenged or excluded findings.

+Progress:
+
+- Added Markdown report review summaries.
+- Added challenge, exclusion, unresolved defect, and unresolved review counts to
+  retention summaries and trend projections.
+- Included applied challenge and exclusion records in JSON assessment packages.
+- Exposed review counts through existing retained run helpers.
+
 ## D5.4 - Tests And Documentation

 ```task
 id: GUIDE-BOARD-WP-0005-T004
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "43b966da-af8d-479b-93bd-6b6741fdab37"
 ```
@@ -111,6 +138,14 @@ Acceptance:
 - Update assessment operations, extension SDK, and compliance evidence pack docs.
 - Keep certification boundary language explicit.

+Progress:
+
+- Added focused schema and policy tests through a fixture extension scenario.
+- Updated assessment operations, extension SDK, compliance evidence pack, and
+  architecture docs.
+- Kept boundary language explicit: challenges and exclusions are review state,
+  not certification conclusions.
+
 ## Definition Of Done

 - The core has separate, tested concepts for expectations, waivers, challenges,