Files
kontextual-engine/wiki/kontextual-engine_scope_research_md_bundle/05_project-scope-suggestions.md

15 KiB
Raw Blame History

kontextual-engine — Project Scope Suggestions

Research date: 2026-05-05
Purpose: convert market exploration into concrete scope guidance for the project and its INTENT.md.


kontextual-engine should be defined as:

A headless knowledge operations engine for turning heterogeneous information assets into persistent, contextual, governed, retrievable, transformable, and agent-operable knowledge.

This definition is broad enough to support CMS, DMS, ECM, file-service, knowledge-base, research, and AI-assistant use cases, but narrow enough to avoid becoming an unfocused clone of mature enterprise suites.


The project should start from the customer problem:

Corporate customers accumulate valuable information across files, folders, documents, records, datasets, applications, generated AI outputs, and knowledge bases. This information is economically underused because it is fragmented, inconsistently structured, weakly contextualized, hard to govern, difficult to retrieve, and unsafe to automate without explicit controls.

kontextual-engine addresses this demand by giving knowledge assets:

  • stable identity
  • metadata and context
  • relationships
  • provenance
  • lifecycle state
  • permissions and governance
  • search and retrieval
  • transformation workflows
  • API access
  • agent-safe operation

Strategic scope

In scope

kontextual-engine should provide reusable backend capabilities for:

  • ingesting heterogeneous information assets
  • representing assets as persistent entities
  • normalizing and extracting useful structure
  • assigning metadata, relationships, provenance, and lifecycle state
  • retrieving assets through keyword, filtered, semantic, and contextual search
  • transforming content into summaries, extracts, structured views, reports, and generated artifacts
  • orchestrating recurring knowledge workflows
  • exposing APIs for applications, automation systems, and AI agents
  • enforcing permissions, traceability, review, and governance controls

Out of scope as core identity

kontextual-engine should not define itself as:

  • a finished end-user CMS
  • a website builder
  • a generic office suite
  • a sync-and-share client
  • a simple file browser
  • a markdown-only tool
  • a pure vector database
  • a generic chatbot over documents
  • a single-domain knowledge base
  • a one-off automation script collection
  • a full replacement for mature ECM/DMS/records systems in its first maturity phases

These capabilities can be supported at the edges, but they should not define the engine.


1. Context-first knowledge identity

Competitors often anchor identity in repositories, paths, records, pages, documents, or content models. kontextual-engine can differentiate by making identity more semantic and operational.

Recommended design focus:

  • stable asset IDs
  • source IDs and source aliases
  • semantic type
  • business context
  • relationship graph
  • provenance chain
  • lifecycle state
  • derived artifact lineage

2. Traceable transformations

Many systems generate summaries or extract fields, but the strategic value lies in knowing where derived knowledge came from and how it was produced.

Recommended design focus:

  • transformations as first-class operations
  • explicit input/output asset links
  • versioned prompts/configuration where applicable
  • transformation metadata
  • review status
  • reproducibility hooks
  • rollback or supersession semantics

3. Agent-safe operation

Agents should not be treated merely as chat UIs. Agents need permissioned, explicit, auditable operations.

Recommended design focus:

  • scoped tool/API permissions
  • actor identity for human, service, and agent actors
  • precondition checks
  • dry-run support
  • review gates for risky actions
  • audit logs
  • reversible changes where possible
  • policy violation detection

4. Composable utility layer

CMS, DMS, ECM, file-service, knowledge-base, research, and AI-assistant capabilities should be framed as utilities built on the engine.

Recommended design focus:

  • APIs before UI
  • workflows before monolithic apps
  • exportability
  • integration adapters
  • schema extensibility
  • domain-specific extensions

5. Governance without becoming bureaucratic

Governance should be a capability, not a drag on utility.

Recommended design focus:

  • lightweight but explicit permissions
  • lifecycle state
  • review state
  • retention and archival hooks
  • audit log by default
  • policy-aware retrieval and transformation

Suggested architecture-level scope boundaries

Layer Should kontextual-engine own it? Notes
Asset registry Yes Stable identity and core metadata should be central.
Source connectors Yes, selectively Build common connectors and allow extension. Do not try to support every enterprise app initially.
Storage abstraction Yes Assets may live in external systems, but the engine needs a durable representation.
Extraction / normalization Yes Required for search, metadata, AI, and transformations.
Search index Yes or integrated The engine must provide retrieval; it may use external search/vector systems internally.
Relationship graph Yes Core differentiator.
Workflow engine Yes, initially simple Needed for recurring knowledge operations and traceable transformations.
Permissions model Yes Must exist from the beginning even if initially simple.
Audit/provenance Yes Core trust capability.
End-user workspace UI No, optional consumer Useful later, but not the engines identity.
Visual website CMS No, optional extension Publishing can be supported through APIs.
File sync client No Avoid competing directly with Box, Dropbox, OneDrive, Egnyte.
Full records-management suite Not initially Support hooks and lifecycle state; specialized compliance can mature later.
General vector database No Use or integrate with search/vector systems; do not define the project as one.

The first strong wedge should be:

Ingest a heterogeneous project or organizational knowledge corpus, assign stable asset identities, extract metadata and structure, build contextual relationships, support governed retrieval, and produce traceable derived artifacts through API-accessible workflows.

This wedge demonstrates the projects essence without requiring a full enterprise suite.

MVP capability package

  1. Asset registry
  2. Source ingestion for local files, markdown, PDFs, and office-like documents
  3. Metadata extraction and manual metadata override
  4. Stable source/provenance tracking
  5. Search and filtered retrieval
  6. Relationship model
  7. Traceable transformation jobs
  8. API access
  9. Basic permission model
  10. Audit log
  11. Agent-safe operation endpoints

MVP demonstration scenarios

  • “Turn a project folder into a contextual knowledge space.”
  • “Find and cite relevant knowledge assets across mixed formats.”
  • “Generate a traceable summary or report from selected sources.”
  • “Classify and enrich assets through a reviewable workflow.”
  • “Expose project knowledge to an agent through controlled APIs.”

Use language like:

  • “headless knowledge operations engine”
  • “heterogeneous information assets”
  • “persistent identity”
  • “contextual structure”
  • “governed access”
  • “retrievable meaning”
  • “traceable transformation”
  • “workflow-ready and agent-operable interfaces”

Avoid language like:

  • “runtime substrate” unless clarified for external readers
  • “system layer” without a self-contained explanation
  • references to other internal projects
  • “not the tooling layer” unless the tooling is explained generically
  • “AI-first” without grounding it in concrete utility

kontextual-engine exists to operate knowledge assets across heterogeneous sources by giving them durable identity, contextual structure, governed access, retrievable meaning, traceable transformation, and automation-ready interfaces.

Expanded version:

It supports the utility demand behind CMS, DMS, ECM, file-service, knowledge-base, research, and AI-assistant systems without becoming any one of those products. Its core role is to provide reusable backend capabilities for making fragmented information operational.


Risks to avoid

Risk 1: Becoming too broad

Trying to be a CMS, DMS, ECM, file server, RAG system, intranet, and workflow suite at the same time will dilute implementation quality.

Mitigation:

  • Frame these as utility domains supported by a shared engine.
  • Prioritize identity, context, retrieval, transformations, workflows, and governance.

Risk 2: Becoming “chat over files”

Many AI knowledge products reduce to a chatbot over indexed documents.

Mitigation:

  • Make traceability, lifecycle state, transformations, review, and workflows core.

Risk 3: Ignoring permissions until later

Permission retrofits are difficult and dangerous.

Mitigation:

  • Model actors, roles, permissions, and audit from the beginning.

Risk 4: Overfitting to one content format

The project should handle markdown well if useful, but the market demand is heterogeneous.

Mitigation:

  • Treat markdown, PDFs, documents, datasets, and records as asset types, not the system identity.

Risk 5: No clear buyer/use-case anchor

A general knowledge engine can sound abstract.

Mitigation:

  • Anchor early demos in concrete use cases: AI-ready project corpus, document workflow automation, governed retrieval, traceable report generation, contextual knowledge base.

Phase 1 — Engine credibility

  • asset registry
  • ingestion
  • metadata
  • provenance
  • search
  • API
  • audit log

Phase 2 — Knowledge operation

  • relationships
  • transformations
  • workflow jobs
  • review state
  • permissions
  • derived artifacts

Phase 3 — AI and agent operation

  • grounded answers
  • citations
  • agent-safe APIs
  • dry-run and review gates
  • evaluation metrics
  • prompt/config provenance

Phase 4 — Enterprise hardening

  • advanced governance
  • retention and legal hold hooks
  • scaling and performance
  • observability
  • connector ecosystem
  • export and migration tooling

Sources consulted

Primary vendor and market sources consulted while preparing this document:

Research date: 2026-05-05.