104 lines
2.6 KiB
Markdown
104 lines
2.6 KiB
Markdown
# MarkiTect Content Capability
|
|
|
|
A self-contained capability for parsing and analyzing MarkdownMatters content without frontmatter and tailmatter zones.
|
|
|
|
## Overview
|
|
|
|
The markitect-content capability provides content extraction and statistics functionality for MarkdownMatters documents. It cleanly separates main document content from metadata zones (frontmatter/tailmatter) and provides comprehensive content analysis.
|
|
|
|
## Features
|
|
|
|
- **Content Extraction**: Extract main markdown content without frontmatter/tailmatter zones
|
|
- **Content Statistics**: Calculate word count, line count, paragraph count, and character count
|
|
- **CLI Commands**: Direct command-line access to content operations
|
|
- **Contentmatter Preservation**: Preserves inline metadata (MMD key-value pairs) as part of content
|
|
|
|
## API
|
|
|
|
### Core Classes
|
|
|
|
#### `ContentParser`
|
|
Main parser class for content extraction and analysis.
|
|
|
|
```python
|
|
from markitect_content import ContentParser
|
|
|
|
parser = ContentParser()
|
|
|
|
# Extract content without matter zones
|
|
content = parser.extract_content(text)
|
|
|
|
# Calculate content statistics
|
|
stats = parser.calculate_stats(content)
|
|
```
|
|
|
|
#### `ContentStats`
|
|
Statistics data structure with content metrics.
|
|
|
|
```python
|
|
from markitect_content import ContentStats
|
|
|
|
# Stats object contains:
|
|
# - word_count: int
|
|
# - line_count: int
|
|
# - paragraph_count: int
|
|
# - character_count: int
|
|
|
|
# Convert to dictionary
|
|
stats_dict = stats.to_dict()
|
|
```
|
|
|
|
### CLI Commands
|
|
|
|
#### `content-get`
|
|
Extract content without frontmatter and tailmatter.
|
|
|
|
```bash
|
|
markitect content-get --file document.md
|
|
```
|
|
|
|
#### `content-stats`
|
|
Calculate content statistics.
|
|
|
|
```bash
|
|
markitect content-stats --file document.md --format json
|
|
markitect content-stats --file document.md --format text
|
|
```
|
|
|
|
## Content Processing Rules
|
|
|
|
1. **Frontmatter Removal**: Removes YAML frontmatter blocks (`---...---`)
|
|
2. **Tailmatter Removal**: Removes tailmatter blocks (````yaml tailmatter...````)
|
|
3. **Contentmatter Preservation**: Keeps inline MMD key-value pairs
|
|
4. **Content Statistics**: Counts are calculated on cleaned content only
|
|
|
|
## Installation
|
|
|
|
Install as an editable dependency in your MarkiTect environment:
|
|
|
|
```bash
|
|
pip install -e capabilities/markitect-content/
|
|
```
|
|
|
|
## Testing
|
|
|
|
Run the capability test suite:
|
|
|
|
```bash
|
|
cd capabilities/markitect-content/
|
|
pytest tests/
|
|
```
|
|
|
|
## Compliance
|
|
|
|
This capability follows the ComposableRepositoryParadigm:
|
|
- ✅ Src layout (PEP 660 compliant)
|
|
- ✅ Unidirectional dependencies
|
|
- ✅ Self-contained with own tests
|
|
- ✅ Independent configuration
|
|
- ✅ Clean API boundaries
|
|
|
|
## Dependencies
|
|
|
|
- click>=8.0.0 (for CLI commands)
|
|
- pytest>=7.0.0 (dev dependency for testing) |