feat: add LinkValidator for semantic link validation (Phase 3)

Implement comprehensive link validation as part of semantic validation:

Core Features:
- Link classification: internal, external, fragment, email
- Internal link validation: fragment anchors and file paths
- External link validation: HTTP/HTTPS with configurable timeout
- Email validation: mailto: link format checking
- Fragment policy enforcement: allow/disallow fragment identifiers

Link Validator:
- markitect/validators/link_validator.py - Full link validation implementation
- Supports x-markitect-content-control.link_validation configuration
- Default: check internal links, skip external (fast)
- Opt-in external checking with --check-links flag

Integration:
- Updated SemanticValidator to include link_result in reports
- CLI already supports --check-links flag (line 1629 in cli.py)
- Link validation runs by default for internal links (fast)
- External link checking requires explicit --check-links flag

Test Coverage:
- Added 9 comprehensive tests for LinkValidator
- Tests cover: classification, broken links, fragments, email, statistics
- All 25 semantic validator tests passing (100%)

Documentation:
- Updated SCHEMA_MANAGEMENT_GUIDE.md with link validation section
- Added examples for broken links and external link checking
- Documented link types, validation rules, and configuration

Statistics Tracking:
- Links checked, internal/external/fragment/email counts
- Detailed error/warning reporting with line numbers
- Integration with existing semantic validation reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-06 03:41:03 +01:00
parent 0d78837a53
commit 20c0cfece7
5 changed files with 829 additions and 10 deletions

View File

@@ -177,6 +177,9 @@ markitect validate my-document.md --schema manpage-schema-v1.0.md
# Only structural validation (classic mode)
markitect validate my-document.md --schema schema.json --no-semantic
# With external link checking (may be slow)
markitect validate my-document.md --schema manpage-schema-v1.0.md --check-links
# Strict mode (warnings become errors)
markitect validate my-document.md --schema manpage-schema-v1.0.md --strict
```
@@ -202,6 +205,14 @@ markitect validate my-document.md --schema manpage-schema-v1.0.md --strict
- **Quality Metrics**: Checks word counts, sentence counts
- `min_words`, `max_words`: Word count requirements (WARNING)
- `min_sentences`: Minimum sentence count (WARNING)
- **Link Validation**: Validates internal and external links (optional)
- Internal links: Checked by default when semantic validation enabled
- Fragment links (#section-name) verified to exist (ERROR if broken)
- Relative file paths checked for existence (ERROR if broken)
- External links: Opt-in with --check-links flag (may be slow)
- HTTP/HTTPS URLs validated with HEAD requests (WARNING if broken)
- Email validation: Validates mailto: link format (WARNING if invalid)
- Fragment policy: Configurable allow/disallow fragment identifiers
### Validation Output
@@ -222,6 +233,9 @@ Section Validation:
Content Validation:
✅ All content requirements met
Link Validation:
✅ All 12 links valid
Summary:
Sections checked: 3
Sections found: 5
@@ -271,6 +285,30 @@ $ markitect validate doc.md --schema manpage-schema-v1.0.md --strict
Status: FAILED ❌ (warnings treated as errors)
```
**Example 4: Broken Internal Link**
```bash
$ markitect validate doc.md --schema manpage-schema-v1.0.md
Link Validation:
#nonexistent-section - Internal link target not found: #nonexistent-section
Errors: 1
Status: FAILED ❌
```
**Example 5: External Link Validation**
```bash
# Enable external link checking (may be slow)
$ markitect validate doc.md --schema manpage-schema-v1.0.md --check-links
Link Validation:
✅ http://example.com - Valid
⚠️ http://broken-link.invalid - External link unreachable: Name or service not known
Warnings: 1
Status: PASSED ✅
```
## Schema Naming Conventions
All schema filenames must follow this pattern: