Files

tegwick de49c76ff9 refactor: failed attempt at edit mode recovery and robustness implementation

This commit preserves work from a refactoring session that attempted to:

ACHIEVEMENTS:
- Implemented Robustness Principle with dual-mode error handling
- Created sophisticated error detection for edit mode failures
- Added comprehensive safety utilities in control-base.js
- Successfully recovered JavaScript components from git history
- Fixed template variable substitution and initialization flow
- Added detailed documentation (REFACTORING_SESSION_REPORT.md)

PROBLEMS:
- Violated GUARDRAILS.md by embedding JavaScript in Python strings
- Mixed old and new component systems without proper migration
- Content rendering issues - no visible content despite initialization
- Became overly complex trying to solve multiple problems simultaneously

LESSONS LEARNED:
- Focus is critical - solve one problem at a time
- Respect architectural constraints (keep JS separate from Python)
- Component migration requires explicit planning
- Incremental testing prevents complexity accumulation

RECOMMENDATION:
Reset to working commit and take focused, incremental approach
that respects GUARDRAILS.md while achieving core edit mode functionality.

See REFACTORING_SESSION_REPORT.md for detailed analysis.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-12 00:19:03 +01:00

14 KiB

Raw Blame History

ADR-002: Robustness Principle for Production Use

Status

Accepted - 2025-11-11

Context

The Markitect application operates in unpredictable client-side environments where JavaScript execution can fail due to malicious input, network issues, browser inconsistencies, missing dependencies, or resource exhaustion. Traditional defensive programming approaches often result in cascading failures that crash entire UI components or leave the application in an unusable state.

Requirements

Fault Tolerance: System must continue operating when individual components fail
Security: Protection against malicious input and injection attacks
Resource Protection: Prevention of DoS attacks through resource exhaustion
Graceful Degradation: Non-essential features should fail without breaking core functionality
Error Containment: Failures should be isolated and not cascade throughout the system
User Experience: Users should never see white screens or completely broken interfaces
Developer Experience: Clear error reporting and debugging capabilities

Problem Statement

The existing JavaScript codebase was vulnerable to:

Uncaught Exceptions: Single errors could crash entire UI components
Input Validation Gaps: Malicious or malformed input could break processing
Resource Exhaustion: Large datasets could freeze the browser
Dependency Failures: Missing libraries or features caused complete breakdowns
DOM Manipulation Risks: Direct DOM access without safety checks
Cascading Failures: One component failure affecting others

Decision

We will implement the Robustness Principle as a comprehensive defensive programming strategy with multiple layers of protection throughout the JavaScript codebase, balanced with Fail Fast behavior in development mode to prevent difficult diagnosis and cascading errors.

Alternatives Considered

Option 1: Robustness Principle (Selected)

Approach: Multiple defensive layers with graceful degradation Implementation: Safe wrappers, input validation, error boundaries, resource limits

Option 2: Try-Catch Everything

Approach: Wrap all operations in try-catch blocks Implementation: Granular exception handling without systematic approach

Option 3: Reactive Error Handling

Approach: Error handling through reactive programming patterns Implementation: RxJS or similar libraries for error stream management

Option 4: Minimal Validation

Approach: Basic input checking with assumption of good data Implementation: Simple null checks and basic validation

Decision Matrix

Criteria	Robustness Principle	Try-Catch All	Reactive Patterns	Minimal Validation
Fault Tolerance	✅ Comprehensive	⚠️ Inconsistent	✅ Good	❌ Poor
Security Protection	✅ Multi-layered	❌ Reactive only	⚠️ Limited	❌ Vulnerable
Resource Management	✅ Proactive limits	❌ No protection	⚠️ Some control	❌ No protection
Code Maintainability	✅ Systematic	❌ Scattered	⚠️ Complex	✅ Simple
Performance Impact	⚠️ Moderate overhead	⚠️ High overhead	❌ Library weight	✅ Minimal
Developer Experience	✅ Clear patterns	❌ Repetitive	❌ Learning curve	✅ Familiar
Error Recovery	✅ Graceful fallbacks	⚠️ Manual recovery	✅ Automatic retry	❌ System failure

Balanced Implementation: Robustness + Fail Fast

Development vs Production Behavior

Development Mode (Fail Fast):

Immediate exceptions on errors for fast debugging
Strict validation with no silent failures
Full error context and stack traces
Activated on localhost, 127.0.0.1, or ?strict=true

Production Mode (Robust):

Graceful degradation and fallback behaviors
Silent recovery with detailed logging
User experience preservation
Default behavior in production environments

const MARKITECT_STRICT_MODE = (
    window.location.hostname === 'localhost' ||
    window.location.hostname === '127.0.0.1' ||
    window.location.search.includes('strict=true') ||
    window.markitectStrictMode === true
);

Robustness Principle Implementation

Layer 1: Input Validation & Sanitization

Purpose: Prevent malicious or malformed data from entering the system

safeTextExtraction(element) {
    if (!this.validateElement(element)) {
        return '';
    }

    try {
        const text = element.textContent || element.innerText || '';
        return this.sanitizeText(text.trim());
    } catch (error) {
        console.warn('Text extraction failed:', error);
        return '';
    }
}

sanitizeText(text) {
    if (typeof text !== 'string') return '';

    const maxLength = 100000; // 100KB text limit
    return text
        .replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, '') // Remove control chars
        .slice(0, maxLength); // Limit length
}

Layer 2: Error Boundaries with Fallbacks

Purpose: Contain failures and provide alternative execution paths

safeOperation(operation, fallback = null, context = 'Unknown') {
    try {
        return operation();
    } catch (error) {
        console.warn(`Operation failed in ${context}:`, error);

        // Fail Fast in development mode
        if (MARKITECT_STRICT_MODE) {
            console.error(`🚨 STRICT MODE: Operation failed in ${context}`);
            throw error; // Re-throw for immediate debugging
        }

        // Robust handling in production
        if (window.MarkitectDebugSystem) {
            window.MarkitectDebugSystem.addMessage(
                `Safe operation failed: ${error.message}`,
                'WARNING',
                'RobustnessSystem',
                { context, eventType: 'ERROR' }
            );
        }

        return typeof fallback === 'function' ? fallback() : fallback;
    }
}

Layer 3: Resource Limits & Timeout Protection

Purpose: Prevent resource exhaustion and infinite operations

// Element processing limits
const elements = this.safeQuerySelectorAll(selector);
const maxElements = 10000; // DoS protection
elements.slice(0, maxElements).forEach(processElement);

// Operation timeouts
const timeout = setTimeout(() => {
    if (this.isOperationRunning) {
        console.warn('Operation timed out');
        this.cleanup();
    }
}, 30000); // 30 second safety timeout

Layer 4: Graceful Degradation

Purpose: Maintain core functionality when non-essential features fail

// Dependency checking with fallbacks
initializeControl(controlClass, controlName, icon = '🔧') {
    if (!controlClass) {
        this.safeLog(`${controlName} class not available, skipping`, 'WARNING');
        return null;
    }

    try {
        const instance = new controlClass();
        return instance.createControl() ? instance : null;
    } catch (error) {
        // Create minimal fallback for essential controls
        if (controlName === 'StatusControl') {
            return this.createFallbackControl(controlName, icon);
        }
        return null;
    }
}

Layer 5: Safe DOM Manipulation

Purpose: Protect against DOM-related failures and validate operations

safeQuerySelector(selector, parent = document) {
    try {
        if (!parent || !parent.querySelector) {
            return null;
        }
        return parent.querySelector(selector);
    } catch (error) {
        console.warn(`Invalid selector: ${selector}`, error);
        return null;
    }
}

validateElement(element) {
    return element &&
           element.nodeType === Node.ELEMENT_NODE &&
           element.isConnected &&
           !element.closest('.control-panel'); // Avoid control elements
}

Rationale

Why the Robustness Principle?

Systematic Approach: Unlike ad-hoc try-catch blocks, provides consistent protection patterns
Multiple Defense Layers: Each layer catches different types of failures
Proactive Protection: Prevents problems before they occur rather than just reacting
Maintainable Code: Clear patterns and utility functions reduce repetition
Production Ready: Designed for real-world environments with unpredictable conditions
Performance Conscious: Adds protection without significant overhead

Why Not Try-Catch Everything?

Maintenance Burden: Scattered exception handling is hard to maintain
Inconsistent Coverage: Easy to miss critical paths
Poor Error Recovery: Just catching errors doesn't provide meaningful fallbacks
Performance Impact: Exception handling has overhead when overused

Why Not Reactive Patterns?

Complexity: RxJS adds significant learning curve and bundle size
Overkill: Our error handling needs don't require reactive streams
Library Dependency: Adds external dependency for core functionality
Framework Lock-in: Ties architecture to specific programming paradigm

Implementation Details

Core Protection Utilities

// Central error handling system
const RobustnessSystem = {
    safeOperation(operation, fallback, context),
    safeQuerySelector(selector, parent),
    safeQuerySelectorAll(selector, parent),
    validateElement(element),
    sanitizeText(text),
    safeTextExtraction(element)
};

Integration Pattern

// Before: Fragile operation
function processDocument() {
    const stats = calculateStats(); // Could crash
    updateUI(stats); // Could crash
    saveToStorage(stats); // Could crash
}

// After: Robust operation
function processDocument() {
    const stats = this.safeOperation(
        () => this.calculateStats(),
        this.getDefaultStats(),
        'calculateStats'
    );

    this.safeOperation(
        () => this.updateUI(stats),
        null,
        'updateUI'
    );

    this.safeOperation(
        () => this.saveToStorage(stats),
        null,
        'saveToStorage'
    );
}

Resource Protection Examples

// Memory limits
const characters = Math.min(sectionText.length, 1000000); // Cap at 1MB

// Processing limits
elements.slice(0, maxElements).forEach(processElement);

// Time limits
const timeout = setTimeout(cleanup, OPERATION_TIMEOUT);

Consequences

Positive

✅ System Stability: Individual component failures don't crash the entire application
✅ Security Hardening: Multiple layers protect against various attack vectors
✅ User Experience: Graceful degradation maintains usability during failures
✅ Developer Confidence: Clear patterns reduce fear of production failures
✅ Debugging Capability: Detailed error context and logging
✅ Maintenance Reduction: Fewer emergency fixes for production issues

Negative

⚠️ Performance Overhead: Additional validation and error checking adds some cost
⚠️ Code Complexity: More defensive code requires more careful implementation
⚠️ Initial Development Time: Building robust systems takes longer upfront

Mitigation Strategies

Performance: Use efficient validation techniques and avoid redundant checks
Complexity: Provide clear utility functions and documentation
Development Time: Treat as investment in reduced maintenance and debugging time

Testing Strategy

Robustness Testing Categories

Malicious Input Testing: XSS attempts, oversized data, invalid formats
Resource Exhaustion Testing: Large datasets, memory pressure scenarios
Dependency Failure Testing: Missing libraries, network failures
DOM Manipulation Edge Cases: Invalid selectors, disconnected elements
Timeout Scenarios: Long-running operations, infinite loops
Error Cascade Testing: Multiple simultaneous failures

Automated Testing

// Example robustness test
describe('Robustness Principle', () => {
    it('should handle malicious text input safely', () => {
        const maliciousText = '<script>alert("xss")</script>'.repeat(10000);
        const result = statusControl.safeTextExtraction({ textContent: maliciousText });

        expect(result.length).toBeLessThan(100001); // Respects limits
        expect(result).not.toContain('<script>'); // Sanitized
    });

    it('should gracefully handle missing dependencies', () => {
        delete window.StatusControl;
        const result = MarkitectMain.initialize();

        expect(result).toBeDefined(); // Doesn't crash
        expect(window.statusControl).toBeNull(); // Graceful degradation
    });
});

Future Considerations

Potential Enhancements

Metrics Collection: Track robustness events for system health monitoring
Adaptive Thresholds: Dynamic resource limits based on client capabilities
Recovery Strategies: More sophisticated fallback mechanisms
Performance Monitoring: Track overhead of robustness measures
User Feedback: Notify users when degraded functionality is active

Evolution Path

The Robustness Principle provides foundation for:

Service Worker Integration: Offline robustness capabilities
Web Worker Offloading: Move intensive operations off main thread
Progressive Enhancement: Advanced features for capable browsers
Error Analytics: Aggregate error patterns for system improvements

References

Approval

Decided by: Claude Code Development Team Date: 2025-11-11 Context: Production hardening and security enhancement Next Review: After 6 months of production use or major security incidents

14 KiB Raw Blame History