Fast Document Loading & CLI Manipulation #2

Closed
opened 2025-09-16 20:50:23 +00:00 by tegwick · 0 comments
Owner

UseCase: The user can read a Markdown file from a given file path and save its parsed AST representation into the database. The file's metadata (e.g., name, path) should also be stored.

Core Performance Principle:

Parse once, manipulate many times - AST should be loaded into memory faster than re-parsing from markdown.

Redesigned Requirements:

  1. Storage Strategy (Performance-First)
  • SQLite for metadata (filename, timestamps, front matter)
  • Separate AST cache files (JSON/pickle) for fast deserialization
  • Cache invalidation based on file modification time
  • Memory-first architecture - AST lives in memory, persisted for performance
  1. CLI Workflow (Roundtrip Validation)

Insert document

markitect add document.md

Manipulate content (examples)

markitect modify document.md --add-section "New Section"
markitect modify document.md --update-front-matter "status: draft"

Retrieve modified document

markitect get document.md --output modified_document.md

Validate roundtrip integrity

diff document.md modified_document.md # Should show only intended changes

  1. Testable Subtasks for Issue #2:

2a. File Ingestion & AST Caching

  • Read markdown file from disk
  • Parse to AST using existing parser.py
  • Store metadata in DB + AST cache file
  • Verify cache is faster than re-parsing

2b. AST Memory Management

  • Load AST from cache into memory representation
  • Design document object model for manipulation
  • Implement AST-to-markdown serialization

2c. Basic CLI Interface

  • markitect add command
  • markitect get command
  • markitect list command
  • Basic error handling & validation

2d. Simple Content Manipulation

  • Modify front matter in AST
  • Add/remove sections programmatically
  • Roundtrip test: add → modify → retrieve → verify

Technical Architecture:

┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ markdown │───▶│ AST cache │───▶│ Memory AST │
│ files │ │ (.json/.pkl) │ │ (fast access) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SQLite DB │ │ Cache Index │ │ CLI Commands │
│ (metadata) │ │ (performance) │ │ (user interface)│
└─────────────────┘ └──────────────────┘ └─────────────────┘

Success Criteria:

  1. Performance: AST cache loading < 50% of markdown parsing time
  2. Functionality: Complete roundtrip without data loss
  3. Usability: Intuitive CLI for basic operations
  4. Testability: Each subtask has measurable validation
UseCase: The user can read a Markdown file from a given file path and save its parsed AST representation into the database. The file's metadata (e.g., name, path) should also be stored. Core Performance Principle: Parse once, manipulate many times - AST should be loaded into memory faster than re-parsing from markdown. Redesigned Requirements: 1. Storage Strategy (Performance-First) - SQLite for metadata (filename, timestamps, front matter) - Separate AST cache files (JSON/pickle) for fast deserialization - Cache invalidation based on file modification time - Memory-first architecture - AST lives in memory, persisted for performance 2. CLI Workflow (Roundtrip Validation) # Insert document markitect add document.md # Manipulate content (examples) markitect modify document.md --add-section "New Section" markitect modify document.md --update-front-matter "status: draft" # Retrieve modified document markitect get document.md --output modified_document.md # Validate roundtrip integrity diff document.md modified_document.md # Should show only intended changes 3. Testable Subtasks for Issue #2: 2a. File Ingestion & AST Caching - Read markdown file from disk - Parse to AST using existing parser.py - Store metadata in DB + AST cache file - Verify cache is faster than re-parsing 2b. AST Memory Management - Load AST from cache into memory representation - Design document object model for manipulation - Implement AST-to-markdown serialization 2c. Basic CLI Interface - markitect add <file> command - markitect get <file> command - markitect list command - Basic error handling & validation 2d. Simple Content Manipulation - Modify front matter in AST - Add/remove sections programmatically - Roundtrip test: add → modify → retrieve → verify Technical Architecture: ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ markdown │───▶│ AST cache │───▶│ Memory AST │ │ files │ │ (.json/.pkl) │ │ (fast access) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ SQLite DB │ │ Cache Index │ │ CLI Commands │ │ (metadata) │ │ (performance) │ │ (user interface)│ └─────────────────┘ └──────────────────┘ └─────────────────┘ Success Criteria: 1. Performance: AST cache loading < 50% of markdown parsing time 2. Functionality: Complete roundtrip without data loss 3. Usability: Intuitive CLI for basic operations 4. Testability: Each subtask has measurable validation
tegwick added this to the Getting started project 2025-09-16 21:10:10 +00:00
tegwick moved this to Todo in Getting started on 2025-09-16 21:13:39 +00:00
tegwick changed title from Read and Store a Markdown File to Fast Document Loading & CLI Manipulation 2025-09-23 20:51:43 +00:00
tegwick moved this to Active in Getting started on 2025-09-24 22:01:46 +00:00
tegwick added the status:done label 2025-09-25 00:56:39 +00:00
tegwick moved this to Done in Getting started on 2025-09-25 01:02:01 +00:00
Sign in to join this conversation.