diff --git a/docs/ARCHITECTURE-BLUEPRINT.md b/docs/ARCHITECTURE-BLUEPRINT.md index 5776d38..0934989 100644 --- a/docs/ARCHITECTURE-BLUEPRINT.md +++ b/docs/ARCHITECTURE-BLUEPRINT.md @@ -803,6 +803,9 @@ Use separate concepts: - defect: unexpected product or process failure. The report must make these visible separately. +The current policy layer loads challenge and exclusion refs from assessment +profiles, annotates findings and evidence, and keeps `unexpected_findings` +visible for gate semantics unless a finding is separately expected or waived. ### Source Locking diff --git a/docs/ASSESSMENT-OPERATIONS.md b/docs/ASSESSMENT-OPERATIONS.md index c5c1a52..4a7ef6b 100644 --- a/docs/ASSESSMENT-OPERATIONS.md +++ b/docs/ASSESSMENT-OPERATIONS.md @@ -27,7 +27,8 @@ Every run needs: The target profile describes the candidate system or artifact being assessed. The assessment profile selects frameworks, extensions, check groups, runtime -policy, waivers, expectations, and output policy. +policy, expectations, waivers, challenges, authority exclusions, and output +policy. ## CLI Flow @@ -99,10 +100,10 @@ artifacts/ ``` `sources.lock.json` records the framework refs, extension versions, mapping -sets, profile snapshots, policy refs, authority refs, and extension metadata -hooks used for the run. `reports/submission-package.json` points at the -reviewable package files, includes checksums where files exist, carries the raw -artifact manifest, and repeats the certification boundary. It is a portable +sets, profile snapshots, policy and review refs, authority refs, and extension +metadata hooks used for the run. `reports/submission-package.json` points at +the reviewable package files, includes checksums where files exist, carries the +raw artifact manifest, and repeats the certification boundary. It is a portable handoff manifest for preparation evidence, not an authority-specific final submission. @@ -200,6 +201,23 @@ Individual evidence items use: - `expected_gap` - `infrastructure_error` +## Review State + +Assessment profiles may reference: + +- `expectations_ref`: known target posture, optional scope, or accepted gaps, +- `waivers_ref`: approved, time-bounded exceptions, +- `challenges_ref`: review claims that a finding, check, mapping, or native + result should be challenged, +- `exclusions_ref`: authority or program exclusions that apply to selected + findings. + +Challenges and exclusions annotate findings and evidence. They do not silently +turn failures into passing evidence and they do not reduce the +`unexpected_findings` count used by default gates. Retained summaries expose +separate counts for expected findings, waived findings, challenged findings, +authority exclusions, unresolved defects, and unresolved review items. + ## Candidate System Checklist Before starting a run against candidate software, confirm: diff --git a/docs/COMPLIANCE-EVIDENCE-PACKS.md b/docs/COMPLIANCE-EVIDENCE-PACKS.md index 23de1e4..110f2f8 100644 --- a/docs/COMPLIANCE-EVIDENCE-PACKS.md +++ b/docs/COMPLIANCE-EVIDENCE-PACKS.md @@ -8,8 +8,8 @@ Created: 2026-05-07 Compliance evidence packs cover frameworks where guide-board cannot rely on an official executable harness. They help prepare and perform assessments by organizing evidence requests, expected artifacts, reviewer workflow, waivers, -and run reports. They do not replace auditors, accredited certification bodies, -legal counsel, or official standard text. +challenges, authority exclusions, and run reports. They do not replace auditors, +accredited certification bodies, legal counsel, or official standard text. Examples include GDPR, SOC 2, HIPAA, NF Z 42-013, NF 461, ISO 14641, ISO 15489, and similar procedural or control-oriented frameworks. @@ -83,7 +83,7 @@ Each request should include: Requests should be phrased as collection guidance, not as legal conclusions. -## Waivers And Expected Gaps +## Review Policy Records Evidence packs use the same expectation and waiver model as executable extensions. @@ -103,6 +103,16 @@ Use waivers for: Every waiver should include owner, reason, approval status, and expiry. +Use challenges for disputed checks, disputed mappings, imported native result +questions, or evidence that needs a reviewer decision before it can be treated +as a defect. Use authority exclusions only when a program, standard, or +authorized reviewer excludes a requirement or check from the assessment scope. +Both records should cite stable requirement refs, check refs, evidence refs, or +authority source refs rather than reproducing restricted standard text. + +Challenges and exclusions make review state visible; they do not by themselves +claim compliance or remove default gate-visible unexpected findings. + ## Framework Notes GDPR packs should emphasize processing inventory, lawful basis records, data @@ -129,6 +139,7 @@ extensions: - normalized evidence, - findings, +- review annotations for expectations, waivers, challenges, and exclusions, - mapping records, - assessment packages, - retention summaries, diff --git a/docs/EXTENSION-SDK.md b/docs/EXTENSION-SDK.md index 9bcfb48..cf5738e 100644 --- a/docs/EXTENSION-SDK.md +++ b/docs/EXTENSION-SDK.md @@ -250,6 +250,33 @@ Expectation sets mark known posture as expected. Waiver sets mark approved, time-bounded exceptions. Both are applied after findings are generated, and the assessment package records policy summary counts. +## Challenges And Authority Exclusions + +Assessment profiles may also reference challenge and exclusion sets: + +```json +{ + "challenges_ref": "profiles/challenges/example.json", + "exclusions_ref": "profiles/exclusions/example.json" +} +``` + +Challenge sets validate against `docs/schemas/challenge-set.schema.json`. +Exclusion sets validate against `docs/schemas/exclusion-set.schema.json`. +Records can match findings by requirement refs, check refs, evidence refs, +result refs, or classification refs. They also carry owner, review status, +rationale, authority source refs, review dates, optional expiry, native IDs, +and free-form metadata. + +Use challenges when an extension author or assessment team believes a finding +needs review because a check is invalid, a native harness result is disputed, or +a mapping is wrong. Use exclusions when an authority or program explicitly +removes a requirement, check, or result from the assessment scope. The core +preserves these distinctions in findings, evidence review annotations, +assessment packages, reports, and retained summaries, but default gate semantics +still count the underlying finding as unexpected unless it is separately +expected or waived. + ## Python Runner Contract A Python runner receives one context object and returns one result object. diff --git a/docs/schemas/assessment-package.schema.json b/docs/schemas/assessment-package.schema.json index 96867af..54a7c91 100644 --- a/docs/schemas/assessment-package.schema.json +++ b/docs/schemas/assessment-package.schema.json @@ -17,6 +17,8 @@ "evidence_refs", "artifact_manifest", "waivers", + "challenges", + "exclusions", "certification_boundary", "created_at" ], @@ -34,6 +36,8 @@ "evidence_refs": { "type": "array", "items": { "type": "string" } }, "artifact_manifest": { "type": "array", "items": { "type": "object" } }, "waivers": { "type": "array", "items": { "type": "object" } }, + "challenges": { "type": "array", "items": { "type": "object" } }, + "exclusions": { "type": "array", "items": { "type": "object" } }, "certification_boundary": { "type": "string" }, "created_at": { "type": "string" } } diff --git a/docs/schemas/assessment-profile.schema.json b/docs/schemas/assessment-profile.schema.json index c481374..84a5a8e 100644 --- a/docs/schemas/assessment-profile.schema.json +++ b/docs/schemas/assessment-profile.schema.json @@ -28,6 +28,8 @@ }, "expectations_ref": { "type": ["string", "null"] }, "waivers_ref": { "type": ["string", "null"] }, + "challenges_ref": { "type": ["string", "null"] }, + "exclusions_ref": { "type": ["string", "null"] }, "output_policy": { "type": "object" }, "retention_policy": { "type": "object" }, "runtime_policy": { "type": "object" } diff --git a/docs/schemas/challenge-set.schema.json b/docs/schemas/challenge-set.schema.json new file mode 100644 index 0000000..1e916d6 --- /dev/null +++ b/docs/schemas/challenge-set.schema.json @@ -0,0 +1,56 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "Guide Board Challenge Set", + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "target_profile_ref", + "challenges" + ], + "properties": { + "id": { "type": "string" }, + "target_profile_ref": { "type": "string" }, + "challenges": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "requirement_refs", + "check_refs", + "evidence_refs", + "result_refs", + "classification_refs", + "authority_source_refs", + "owner", + "review_status", + "rationale", + "created_at", + "review_due_at", + "expires_at", + "native_challenge_id", + "metadata" + ], + "properties": { + "id": { "type": "string" }, + "requirement_refs": { "type": "array", "items": { "type": "string" } }, + "check_refs": { "type": "array", "items": { "type": "string" } }, + "evidence_refs": { "type": "array", "items": { "type": "string" } }, + "result_refs": { "type": "array", "items": { "type": "string" } }, + "classification_refs": { "type": "array", "items": { "type": "string" } }, + "authority_source_refs": { "type": "array", "items": { "type": "string" } }, + "owner": { "type": "string" }, + "review_status": { "type": "string" }, + "rationale": { "type": "string" }, + "created_at": { "type": "string" }, + "review_due_at": { "type": ["string", "null"] }, + "expires_at": { "type": ["string", "null"] }, + "native_challenge_id": { "type": ["string", "null"] }, + "metadata": { "type": "object" } + } + } + } + } +} diff --git a/docs/schemas/evidence-item.schema.json b/docs/schemas/evidence-item.schema.json index e4412a3..8977a06 100644 --- a/docs/schemas/evidence-item.schema.json +++ b/docs/schemas/evidence-item.schema.json @@ -42,6 +42,7 @@ }, "observations": { "type": "array", "items": { "type": "string" } }, "facts": { "type": "object" }, + "review": { "type": "object" }, "requirement_refs": { "type": "array", "items": { "type": "string" } }, "artifact_refs": { "type": "array", "items": { "type": "string" } }, "started_at": { "type": "string" }, diff --git a/docs/schemas/exclusion-set.schema.json b/docs/schemas/exclusion-set.schema.json new file mode 100644 index 0000000..9e084c9 --- /dev/null +++ b/docs/schemas/exclusion-set.schema.json @@ -0,0 +1,60 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "Guide Board Authority Exclusion Set", + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "target_profile_ref", + "exclusions" + ], + "properties": { + "id": { "type": "string" }, + "target_profile_ref": { "type": "string" }, + "exclusions": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "authority_ref", + "requirement_refs", + "check_refs", + "evidence_refs", + "result_refs", + "classification_refs", + "authority_source_refs", + "owner", + "approved_by", + "review_status", + "rationale", + "created_at", + "review_due_at", + "expires_at", + "native_exclusion_id", + "metadata" + ], + "properties": { + "id": { "type": "string" }, + "authority_ref": { "type": "string" }, + "requirement_refs": { "type": "array", "items": { "type": "string" } }, + "check_refs": { "type": "array", "items": { "type": "string" } }, + "evidence_refs": { "type": "array", "items": { "type": "string" } }, + "result_refs": { "type": "array", "items": { "type": "string" } }, + "classification_refs": { "type": "array", "items": { "type": "string" } }, + "authority_source_refs": { "type": "array", "items": { "type": "string" } }, + "owner": { "type": "string" }, + "approved_by": { "type": ["string", "null"] }, + "review_status": { "type": "string" }, + "rationale": { "type": "string" }, + "created_at": { "type": "string" }, + "review_due_at": { "type": ["string", "null"] }, + "expires_at": { "type": ["string", "null"] }, + "native_exclusion_id": { "type": ["string", "null"] }, + "metadata": { "type": "object" } + } + } + } + } +} diff --git a/docs/schemas/finding.schema.json b/docs/schemas/finding.schema.json index ca030b8..7383d18 100644 --- a/docs/schemas/finding.schema.json +++ b/docs/schemas/finding.schema.json @@ -14,7 +14,10 @@ "evidence_refs", "expected", "waiver_ref", + "challenge_ref", + "exclusion_ref", "policy_ref", + "review_status", "remediation" ], "properties": { @@ -28,7 +31,10 @@ "evidence_refs": { "type": "array", "items": { "type": "string" } }, "expected": { "type": "boolean" }, "waiver_ref": { "type": ["string", "null"] }, + "challenge_ref": { "type": ["string", "null"] }, + "exclusion_ref": { "type": ["string", "null"] }, "policy_ref": { "type": ["string", "null"] }, + "review_status": { "type": "string" }, "remediation": { "type": ["string", "null"] } } } diff --git a/src/guide_board/execution.py b/src/guide_board/execution.py index ec33f0f..7ee6e35 100644 --- a/src/guide_board/execution.py +++ b/src/guide_board/execution.py @@ -35,7 +35,15 @@ def run_assessment( assert_valid(item, "evidence-item") findings = _findings_for_evidence(run_id, evidence) - findings, policy_summary, applied_waivers = apply_policy(root, plan, findings) + ( + findings, + policy_summary, + applied_waivers, + applied_challenges, + applied_exclusions, + ) = apply_policy(root, plan, evidence, findings) + for item in evidence: + assert_valid(item, "evidence-item") for finding in findings: assert_valid(finding, "finding") @@ -52,6 +60,8 @@ def run_assessment( mapping_summary, policy_summary, applied_waivers, + applied_challenges, + applied_exclusions, created_at, ) assert_valid(assessment_package, "assessment-package") @@ -308,6 +318,7 @@ def _findings_for_evidence(run_id: str, evidence: list[dict[str, Any]]) -> list[ for item in evidence: if item["result"] not in {"blocked", "fail", "infrastructure_error"}: continue + expected = _expected_for_item(item) findings.append( { "id": f"finding:{item['check_id']}", @@ -318,9 +329,12 @@ def _findings_for_evidence(run_id: str, evidence: list[dict[str, Any]]) -> list[ "classification": _classification_for_item(item), "requirement_refs": item["requirement_refs"], "evidence_refs": [item["id"]], - "expected": _expected_for_item(item), + "expected": expected, "waiver_ref": None, + "challenge_ref": None, + "exclusion_ref": None, "policy_ref": None, + "review_status": "expected" if expected else "unresolved_defect", "remediation": _remediation_for_item(item), } ) @@ -382,6 +396,8 @@ def _assessment_package( mapping_summary: dict[str, Any], policy_summary: dict[str, Any], applied_waivers: list[dict[str, Any]], + applied_challenges: list[dict[str, Any]], + applied_exclusions: list[dict[str, Any]], created_at: str, ) -> dict[str, Any]: summary = dict(Counter(item["result"] for item in evidence)) @@ -401,6 +417,8 @@ def _assessment_package( "evidence_refs": [item["id"] for item in evidence], "artifact_manifest": artifact_manifest, "waivers": applied_waivers, + "challenges": applied_challenges, + "exclusions": applied_exclusions, "certification_boundary": "Guide Board produces preparation evidence only and does not issue certifications or audit assurance.", "created_at": created_at, } @@ -452,6 +470,7 @@ def _markdown_report(run_metadata: dict[str, Any], package: dict[str, Any]) -> s summary_lines = "- no evidence produced" mapping_lines = _mapping_summary_lines(package) policy_lines = _policy_summary_lines(package) + review_lines = _review_summary_lines(package) return "\n".join( [ @@ -473,6 +492,10 @@ def _markdown_report(run_metadata: dict[str, Any], package: dict[str, Any]) -> s "", policy_lines, "", + "## Review", + "", + review_lines, + "", "## Boundary", "", package["certification_boundary"], @@ -502,10 +525,27 @@ def _policy_summary_lines(package: dict[str, Any]) -> str: f"- applied expectations: {summary.get('applied_expectations', 0)}", f"- applied waivers: {summary.get('applied_waivers', 0)}", f"- unexpected findings: {summary.get('unexpected_findings', 0)}", + f"- challenged findings: {summary.get('challenged_findings', 0)}", + f"- authority exclusions: {summary.get('authority_exclusions', 0)}", + f"- unresolved defects: {summary.get('unresolved_defects', 0)}", ] ) +def _review_summary_lines(package: dict[str, Any]) -> str: + findings = package.get("findings", []) + if not findings: + return "- no findings requiring review" + counts = Counter( + finding.get("review_status", "unreviewed") + for finding in findings + if isinstance(finding, dict) + ) + return "\n".join( + f"- {status}: {count}" for status, count in sorted(counts.items()) + ) + + def _run_status(evidence: list[dict[str, Any]]) -> str: if any(item["result"] == "fail" for item in evidence): return "failed" diff --git a/src/guide_board/planning.py b/src/guide_board/planning.py index 9776f52..2f3e93d 100644 --- a/src/guide_board/planning.py +++ b/src/guide_board/planning.py @@ -262,6 +262,18 @@ def _build_source_lock( assessment.get("waivers_ref"), "waiver-set", ), + "challenges": _optional_policy_source_record( + root, + assessment_path, + assessment.get("challenges_ref"), + "challenge-set", + ), + "exclusions": _optional_policy_source_record( + root, + assessment_path, + assessment.get("exclusions_ref"), + "exclusion-set", + ), }, "authorities": _authority_source_records(extensions), "metadata_hooks": { diff --git a/src/guide_board/policy.py b/src/guide_board/policy.py index 39c99df..839dd48 100644 --- a/src/guide_board/policy.py +++ b/src/guide_board/policy.py @@ -13,20 +13,36 @@ from guide_board.schema import assert_valid def apply_policy( root: Path, plan: dict[str, Any], + evidence: list[dict[str, Any]], findings: list[dict[str, Any]], -) -> tuple[list[dict[str, Any]], dict[str, Any], list[dict[str, Any]]]: +) -> tuple[ + list[dict[str, Any]], + dict[str, Any], + list[dict[str, Any]], + list[dict[str, Any]], + list[dict[str, Any]], +]: expectations = _load_optional_set(root, plan, "expectations_ref", "expectation-set") waiver_set = _load_optional_set(root, plan, "waivers_ref", "waiver-set") + challenge_set = _load_optional_set(root, plan, "challenges_ref", "challenge-set") + exclusion_set = _load_optional_set(root, plan, "exclusions_ref", "exclusion-set") waivers = waiver_set.get("waivers", []) if waiver_set else [] + challenges = challenge_set.get("challenges", []) if challenge_set else [] + exclusions = exclusion_set.get("exclusions", []) if exclusion_set else [] applied_expectations = 0 applied_waivers: list[dict[str, Any]] = [] + applied_challenges: list[dict[str, Any]] = [] + applied_exclusions: list[dict[str, Any]] = [] + evidence_by_id = {item["id"]: item for item in evidence} for finding in findings: for expectation in expectations.get("expectations", []) if expectations else []: if _matches_rule(finding, expectation): finding["expected"] = expectation["expected"] finding["policy_ref"] = expectation["id"] + finding["review_status"] = "expected" if expectation["expected"] else "unresolved_defect" + _annotate_evidence(evidence_by_id, finding, "expectation_refs", expectation["id"]) applied_expectations += 1 break @@ -37,20 +53,60 @@ def apply_policy( finding["waiver_ref"] = waiver["id"] finding["expected"] = True finding["policy_ref"] = waiver["id"] + finding["review_status"] = "waived" finding["remediation"] = f"Waived: {waiver['reason']}" applied_waivers.append(waiver) + _annotate_evidence(evidence_by_id, finding, "waiver_refs", waiver["id"]) + break + + for exclusion in exclusions: + if not _review_record_active(exclusion): + continue + if _matches_rule(finding, exclusion): + finding["exclusion_ref"] = exclusion["id"] + if finding.get("review_status") == "unresolved_defect": + finding["review_status"] = "authority_excluded" + applied_exclusions.append(exclusion) + _annotate_evidence(evidence_by_id, finding, "exclusion_refs", exclusion["id"]) + break + + for challenge in challenges: + if not _review_record_active(challenge): + continue + if _matches_rule(finding, challenge): + finding["challenge_ref"] = challenge["id"] + if finding.get("review_status") == "unresolved_defect": + finding["review_status"] = "challenged" + applied_challenges.append(challenge) + _annotate_evidence(evidence_by_id, finding, "challenge_refs", challenge["id"]) break policy_summary = { "expectations_ref": plan["assessment_profile_snapshot"].get("expectations_ref"), "waivers_ref": plan["assessment_profile_snapshot"].get("waivers_ref"), + "challenges_ref": plan["assessment_profile_snapshot"].get("challenges_ref"), + "exclusions_ref": plan["assessment_profile_snapshot"].get("exclusions_ref"), "applied_expectations": applied_expectations, "applied_waivers": len(applied_waivers), + "challenged_findings": _unique_applied_count(findings, "challenge_ref"), + "authority_exclusions": _unique_applied_count(findings, "exclusion_ref"), "unexpected_findings": sum( 1 for finding in findings if not finding.get("expected") and not finding.get("waiver_ref") ), + "unresolved_defects": sum( + 1 for finding in findings if finding.get("review_status") == "unresolved_defect" + ), + "unresolved_review_items": sum( + 1 for finding in findings if finding.get("review_status") in {"challenged", "authority_excluded"} + ), } - return findings, policy_summary, applied_waivers + return ( + findings, + policy_summary, + _dedupe_records(applied_waivers), + _dedupe_records(applied_challenges), + _dedupe_records(applied_exclusions), + ) def _load_optional_set( @@ -94,6 +150,7 @@ def _matches_rule(finding: dict[str, Any], rule: dict[str, Any]) -> bool: return ( _matches_any(finding.get("requirement_refs", []), rule.get("requirement_refs", [])) and _matches_any([finding.get("check_id", "")], rule.get("check_refs", [])) + and _matches_any(finding.get("evidence_refs", []), rule.get("evidence_refs", [])) and _matches_scalar(finding.get("status"), rule.get("result_refs", [])) and _matches_scalar(finding.get("classification"), rule.get("classification_refs", [])) ) @@ -122,3 +179,57 @@ def _waiver_active(waiver: dict[str, Any]) -> bool: except ValueError: return False return expiry >= date.today() + + +def _review_record_active(record: dict[str, Any]) -> bool: + status = record.get("review_status") + if status in {"rejected", "withdrawn", "closed", "expired"}: + return False + expires_at = record.get("expires_at") + if not expires_at: + return True + try: + expiry = date.fromisoformat(expires_at) + except ValueError: + return False + return expiry >= date.today() + + +def _annotate_evidence( + evidence_by_id: dict[str, dict[str, Any]], + finding: dict[str, Any], + ref_key: str, + ref_value: str, +) -> None: + for evidence_ref in finding.get("evidence_refs", []): + item = evidence_by_id.get(evidence_ref) + if item is None: + continue + review = item.setdefault( + "review", + { + "expectation_refs": [], + "waiver_refs": [], + "challenge_refs": [], + "exclusion_refs": [], + }, + ) + refs = review.setdefault(ref_key, []) + if ref_value not in refs: + refs.append(ref_value) + + +def _unique_applied_count(findings: list[dict[str, Any]], ref_name: str) -> int: + return sum(1 for finding in findings if finding.get(ref_name)) + + +def _dedupe_records(records: list[dict[str, Any]]) -> list[dict[str, Any]]: + seen = set() + deduped = [] + for record in records: + record_id = record.get("id") + if not isinstance(record_id, str) or record_id in seen: + continue + seen.add(record_id) + deduped.append(record) + return deduped diff --git a/src/guide_board/retention.py b/src/guide_board/retention.py index eab367c..8b980bd 100644 --- a/src/guide_board/retention.py +++ b/src/guide_board/retention.py @@ -37,6 +37,10 @@ def build_retention_summary( "unexpected_findings": policy_summary.get("unexpected_findings", 0), "expected_findings": sum(1 for finding in findings if finding.get("expected")), "waived_findings": sum(1 for finding in findings if finding.get("waiver_ref")), + "challenged_findings": policy_summary.get("challenged_findings", 0), + "authority_exclusions": policy_summary.get("authority_exclusions", 0), + "unresolved_defects": policy_summary.get("unresolved_defects", 0), + "unresolved_review_items": policy_summary.get("unresolved_review_items", 0), "mapping_target_count": len( assessment_package.get("mapping_summary", {}).get("targets", []) ), @@ -197,6 +201,10 @@ def _run_projection(run: dict[str, Any]) -> dict[str, Any]: "unexpected_findings": _summary_int(summary, "unexpected_findings"), "finding_count": _summary_int(summary, "finding_count"), "artifact_count": _summary_int(summary, "artifact_count"), + "challenged_findings": _summary_int(summary, "challenged_findings"), + "authority_exclusions": _summary_int(summary, "authority_exclusions"), + "unresolved_defects": _summary_int(summary, "unresolved_defects"), + "unresolved_review_items": _summary_int(summary, "unresolved_review_items"), "run_dir": run.get("run_dir"), } @@ -211,9 +219,10 @@ def _trend_between( "status_changed": False, "unexpected_findings_delta": 0, "finding_count_delta": 0, - "artifact_count_delta": 0, - "evidence_result_deltas": {}, - } + "artifact_count_delta": 0, + "unresolved_review_items_delta": 0, + "evidence_result_deltas": {}, + } previous_summary = previous.get("summary", {}) latest_summary = latest.get("summary", {}) @@ -230,6 +239,9 @@ def _trend_between( artifact_delta = _summary_int(latest_summary, "artifact_count") - _summary_int( previous_summary, "artifact_count" ) + review_delta = _summary_int(latest_summary, "unresolved_review_items") - _summary_int( + previous_summary, "unresolved_review_items" + ) previous_status = _status_for(previous) latest_status = _status_for(latest) @@ -239,6 +251,7 @@ def _trend_between( "unexpected_findings_delta": unexpected_delta, "finding_count_delta": finding_delta, "artifact_count_delta": artifact_delta, + "unresolved_review_items_delta": review_delta, "evidence_result_deltas": evidence_deltas, } diff --git a/tests/test_core.py b/tests/test_core.py index 60abc68..ce79b2c 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -334,6 +334,69 @@ class CoreArchitectureTests(unittest.TestCase): self.assertEqual(len(mappings), 1) self.assertEqual(mappings[0]["target_id"], "profile-readiness") + def test_applies_challenges_and_exclusions_without_hiding_gate_failures(self) -> None: + with TemporaryDirectory() as temporary_directory: + temp_root = Path(temporary_directory) + extension_dir = temp_root / "review-noop" + _write_review_extension(extension_dir) + target_path = temp_root / "review-target.json" + assessment_path = temp_root / "review-assessment.json" + challenge_path = temp_root / "review-challenges.json" + exclusion_path = temp_root / "review-exclusions.json" + _write_review_target(target_path) + _write_review_assessment(assessment_path) + _write_review_challenges(challenge_path) + _write_review_exclusions(exclusion_path) + + result = run_assessment( + ROOT, + target_path, + assessment_path, + temp_root / "runs" / "review", + [extension_dir], + ) + run_dir = Path(result["run_dir"]) + evidence = json.loads( + (run_dir / "normalized" / "evidence.json").read_text(encoding="utf-8") + )["evidence"] + assessment_package = json.loads( + (run_dir / "reports" / "assessment-package.json").read_text(encoding="utf-8") + ) + retention = json.loads( + (run_dir / "retention-summary.json").read_text(encoding="utf-8") + ) + report = (run_dir / "reports" / "report.md").read_text(encoding="utf-8") + + self.assertEqual(result["status"], "blocked") + finding = assessment_package["findings"][0] + self.assertEqual(finding["challenge_ref"], "challenge-review-blocked") + self.assertEqual(finding["exclusion_ref"], "exclusion-review-blocked") + self.assertEqual(finding["review_status"], "authority_excluded") + self.assertFalse(finding["expected"]) + self.assertEqual(assessment_package["policy_summary"]["unexpected_findings"], 1) + self.assertEqual(assessment_package["policy_summary"]["challenged_findings"], 1) + self.assertEqual(assessment_package["policy_summary"]["authority_exclusions"], 1) + self.assertEqual(assessment_package["policy_summary"]["unresolved_defects"], 0) + self.assertEqual( + evidence[1]["review"]["challenge_refs"], + ["challenge-review-blocked"], + ) + self.assertEqual( + evidence[1]["review"]["exclusion_refs"], + ["exclusion-review-blocked"], + ) + self.assertEqual(assessment_package["challenges"][0]["owner"], "qa") + self.assertEqual(assessment_package["exclusions"][0]["authority_ref"], "review-authority") + self.assertEqual(retention["summary"]["challenged_findings"], 1) + self.assertEqual(retention["summary"]["authority_exclusions"], 1) + self.assertEqual(retention["summary"]["unresolved_review_items"], 1) + self.assertIn("- authority_excluded: 1", report) + + gate = evaluate_trend_gates(build_trend_summary(temp_root / "runs")) + self.assertEqual(gate["status"], "failed") + checks = {check["id"]: check for check in gate["groups"][0]["checks"]} + self.assertEqual(checks["unexpected-findings"]["observed"], 1) + def test_serves_local_api_run_lifecycle(self) -> None: with TemporaryDirectory() as temporary_directory: service = start_service(ROOT, host="127.0.0.1", port=0) @@ -742,5 +805,166 @@ def _write_schema_assessment(path: Path, runtime_policy: dict[str, object]) -> N ) +def _write_review_extension(extension_dir: Path) -> None: + extension_dir.mkdir(parents=True, exist_ok=True) + (extension_dir / "extension.json").write_text( + json.dumps( + { + "id": "review-noop", + "name": "Review No-op", + "version": "0.1.0", + "extension_type": "repository_quality", + "lifecycle_status": "incubating", + "supported_frameworks": ["review.framework.v1"], + "authorities": ["review-authority"], + "profile_schemas": ["target-profile", "assessment-profile"], + "check_groups": [ + { + "id": "review", + "name": "Review", + "check_type": "repository_quality", + "requirement_refs": ["review.requirement"], + "runner_ref": "external-review", + } + ], + "preflight_runner": None, + "runner_entrypoints": [ + { + "id": "external-review", + "kind": "external", + "module_path": None, + "callable": None, + "command": None, + "metadata": {"test_suite_id": "review-suite"}, + "description": "External runner used to produce reviewable blocked evidence.", + } + ], + "normalizers": [], + "mappings": [], + "report_fragments": [], + "dependencies": [], + "restricted_assets": [], + "certification_boundary": "Review fixture only.", + } + ), + encoding="utf-8", + ) + + +def _write_review_target(path: Path) -> None: + path.write_text( + json.dumps( + { + "id": "review-target", + "subject_type": "repository", + "subject_name": "Review Target", + "environment": "test", + "scope": ["review"], + "endpoints": [], + "artifacts": [], + "credentials_ref": None, + "declared_capabilities": [], + "known_gaps": [], + } + ), + encoding="utf-8", + ) + + +def _write_review_assessment(path: Path) -> None: + path.write_text( + json.dumps( + { + "id": "review-assessment", + "framework_refs": ["review.framework.v1"], + "extension_refs": ["review-noop"], + "target_profile_ref": "review-target", + "selected_check_groups": {"review-noop": ["review"]}, + "expectations_ref": None, + "waivers_ref": None, + "challenges_ref": "review-challenges.json", + "exclusions_ref": "review-exclusions.json", + "output_policy": { + "report_formats": ["json", "markdown"], + "artifact_retention": "summary-only", + }, + "retention_policy": { + "summary_days": 365, + "raw_artifact_days": 0, + }, + "runtime_policy": { + "offline": True, + "timeout_seconds": 2, + }, + } + ), + encoding="utf-8", + ) + + +def _write_review_challenges(path: Path) -> None: + path.write_text( + json.dumps( + { + "id": "review-challenges", + "target_profile_ref": "review-target", + "challenges": [ + { + "id": "challenge-review-blocked", + "requirement_refs": ["review.requirement"], + "check_refs": ["check-group:review-noop:review"], + "evidence_refs": [], + "result_refs": ["blocked"], + "classification_refs": ["runner_not_implemented"], + "authority_source_refs": ["review-authority:rule-1"], + "owner": "qa", + "review_status": "open", + "rationale": "The external suite is not wired in this fixture.", + "created_at": "2026-05-16", + "review_due_at": "2026-06-16", + "expires_at": None, + "native_challenge_id": "native-challenge-1", + "metadata": {"kind": "fixture"}, + } + ], + } + ), + encoding="utf-8", + ) + + +def _write_review_exclusions(path: Path) -> None: + path.write_text( + json.dumps( + { + "id": "review-exclusions", + "target_profile_ref": "review-target", + "exclusions": [ + { + "id": "exclusion-review-blocked", + "authority_ref": "review-authority", + "requirement_refs": ["review.requirement"], + "check_refs": ["check-group:review-noop:review"], + "evidence_refs": [], + "result_refs": ["blocked"], + "classification_refs": ["runner_not_implemented"], + "authority_source_refs": ["review-authority:rule-1"], + "owner": "qa", + "approved_by": "authority-reviewer", + "review_status": "approved", + "rationale": "Fixture demonstrates authority exclusion annotation.", + "created_at": "2026-05-16", + "review_due_at": "2026-06-16", + "expires_at": None, + "native_exclusion_id": "native-exclusion-1", + "metadata": {"kind": "fixture"}, + } + ], + } + ), + encoding="utf-8", + ) + + if __name__ == "__main__": unittest.main() diff --git a/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md b/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md index 719832f..68c8b15 100644 --- a/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md +++ b/workplans/GUIDE-BOARD-WP-0005-challenge-and-exclusion-handling.md @@ -4,12 +4,12 @@ type: workplan title: "Challenge And Exclusion Handling" repo: guide-board domain: markitect -status: active +status: completed owner: codex planning_priority: high planning_order: 5 created: "2026-05-15" -updated: "2026-05-15" +updated: "2026-05-16" state_hub_workstream_id: "fb11e1c7-6c0c-4ec7-a163-da98b2fe9f8f" --- @@ -42,7 +42,7 @@ but the core should preserve them without embedding domain policy. ```task id: GUIDE-BOARD-WP-0005-T001 -status: todo +status: done priority: high state_hub_task_id: "6ff4e6f7-bce6-4e7f-a5af-e0c67cfa7e55" ``` @@ -57,11 +57,21 @@ Acceptance: - Keep the data contract usable by executable harnesses, hosted suites, and procedural packs. +Progress: + +- Added `docs/schemas/challenge-set.schema.json` and + `docs/schemas/exclusion-set.schema.json`. +- Added optional `challenges_ref` and `exclusions_ref` assessment profile + fields. +- Supported requirement, check, evidence, result, classification, authority + source, owner, review status, rationale, review date, expiry, native ID, and + metadata fields. + ## D5.2 - Policy Application And Finding Annotation ```task id: GUIDE-BOARD-WP-0005-T002 -status: todo +status: done priority: high state_hub_task_id: "fd384bd3-40c4-4344-8b7d-cb123dbf2cac" ``` @@ -76,11 +86,20 @@ Acceptance: - Add tests that prove challenge and exclusion records affect reporting without corrupting gate semantics. +Progress: + +- Loaded challenge and exclusion refs through the policy layer. +- Annotated findings with challenge refs, exclusion refs, and review status. +- Annotated matching evidence with review refs. +- Kept default `unexpected_findings` gate semantics visible unless a finding is + separately expected or waived. +- Added tests proving challenged and excluded findings remain gate-visible. + ## D5.3 - Report Visibility And Review Workflow ```task id: GUIDE-BOARD-WP-0005-T003 -status: todo +status: done priority: medium state_hub_task_id: "791071c0-8a9a-462b-83b3-75548bb8524f" ``` @@ -94,11 +113,19 @@ Acceptance: run. - Document how an operator should treat challenged or excluded findings. +Progress: + +- Added Markdown report review summaries. +- Added challenge, exclusion, unresolved defect, and unresolved review counts to + retention summaries and trend projections. +- Included applied challenge and exclusion records in JSON assessment packages. +- Exposed review counts through existing retained run helpers. + ## D5.4 - Tests And Documentation ```task id: GUIDE-BOARD-WP-0005-T004 -status: todo +status: done priority: medium state_hub_task_id: "43b966da-af8d-479b-93bd-6b6741fdab37" ``` @@ -111,6 +138,14 @@ Acceptance: - Update assessment operations, extension SDK, and compliance evidence pack docs. - Keep certification boundary language explicit. +Progress: + +- Added focused schema and policy tests through a fixture extension scenario. +- Updated assessment operations, extension SDK, compliance evidence pack, and + architecture docs. +- Kept boundary language explicit: challenges and exclusions are review state, + not certification conclusions. + ## Definition Of Done - The core has separate, tested concepts for expectations, waivers, challenges,