Preface: Methodology & Syllabus¶

Welcome to the CodaCite Textbook, the authoritative technical guide for the GraphRAG-based Document Intelligence platform. This documentation is designed to be read sequentially, providing a pedagogical transition from high-level architectural intent to granular implementation details.

The CodaCite Manifesto¶

In an era of increasingly opaque AI systems, CodaCite is built upon five non-negotiable pillars:

Absolute Provenance: Every AI-generated claim is anchored to a specific character offset (start_char, end_char) in the source PDF. No blind trust.
Local Sovereignty: All inference (LLM, Embeddings, OCR) is executed on-premises via Podman, ensuring zero data leakage and 100% data ownership.
Graph-Augmented Retrieval: Relationships are not just stored; they are traversed to provide context that traditional vector search misses via multi-hop reasoning.
Self-Correction: The system does not merely "search"; it reasons about its own retrieval quality, rewriting queries and grading context until it reaches engineering-grade precision.
Anaphora Resolution: Advanced coreference resolution ensures that semantic continuity is preserved across document fragments by correctly mapping pronouns to their entities.

The Syllabus¶

This "textbook" is organized into the following chapters:

Chapter 1: System Architecture — Explores the "Vertical Slice" methodology and our modular monolith design.
Chapter 2: The Data Ingestion Lifecycle — A deep dive into the 8-phase transformation from raw text to structured knowledge.
Chapter 3: Search and Retrieval Mechanics — Details the physics of hybrid search and the LangGraph self-correction loop.
Chapter 4: Infrastructure and Foundation — Examines the role of SurrealDB and local model quantization.
Chapter 5: The User Interface — Discusses the UX philosophy of Notebook-scoped analysis.
Chapter 6: Operations & Quality Gates — Details the CI/CD pipeline and container orchestration.
Appendix A: Developer Context — Implementation heuristics and troubleshooting for AI agents.

[!NOTE] This documentation is a living artifact. All architectural changes must be reflected here to maintain the system's "textbook" integrity.