Skip to content

Preface: Methodology & Syllabus

Welcome to the CodaCite Textbook, the authoritative technical guide for the GraphRAG-based Document Intelligence platform. This documentation is designed to be read sequentially, providing a pedagogical transition from high-level architectural intent to granular implementation details.

The CodaCite Manifesto

In an era of increasingly opaque AI systems, CodaCite is built upon five non-negotiable pillars:

  1. Absolute Provenance: Every AI-generated claim is anchored to a specific character offset (start_char, end_char) in the source PDF. No blind trust.
  2. Local Sovereignty: All inference (LLM, Embeddings, OCR) is executed on-premises via Podman, ensuring zero data leakage and 100% data ownership.
  3. Graph-Augmented Retrieval: Relationships are not just stored; they are traversed to provide context that traditional vector search misses via multi-hop reasoning.
  4. Self-Correction: The system does not merely "search"; it reasons about its own retrieval quality, rewriting queries and grading context until it reaches engineering-grade precision.
  5. Anaphora Resolution: Advanced coreference resolution ensures that semantic continuity is preserved across document fragments by correctly mapping pronouns to their entities.

The Syllabus

This "textbook" is organized into the following chapters:


[!NOTE] This documentation is a living artifact. All architectural changes must be reflected here to maintain the system's "textbook" integrity.