Preface: Methodology & Syllabus¶
Welcome to the CodaCite Textbook, the authoritative technical guide for the GraphRAG-based Document Intelligence platform. This documentation is designed to be read sequentially, providing a pedagogical transition from high-level architectural intent to granular implementation details.
The CodaCite Manifesto¶
In an era of increasingly opaque AI systems, CodaCite is built upon five non-negotiable pillars:
- Absolute Provenance: Every AI-generated claim is anchored to a specific character offset (
start_char,end_char) in the source PDF. No blind trust. - Local Sovereignty: All inference (LLM, Embeddings, OCR) is executed on-premises via Podman, ensuring zero data leakage and 100% data ownership.
- Graph-Augmented Retrieval: Relationships are not just stored; they are traversed to provide context that traditional vector search misses via multi-hop reasoning.
- Self-Correction: The system does not merely "search"; it reasons about its own retrieval quality, rewriting queries and grading context until it reaches engineering-grade precision.
- Anaphora Resolution: Advanced coreference resolution ensures that semantic continuity is preserved across document fragments by correctly mapping pronouns to their entities.
The Syllabus¶
This "textbook" is organized into the following chapters:
- Chapter 1: System Architecture — Explores the "Vertical Slice" methodology and our modular monolith design.
- Chapter 2: The Data Ingestion Lifecycle — A deep dive into the 8-phase transformation from raw text to structured knowledge.
- Chapter 3: Search and Retrieval Mechanics — Details the physics of hybrid search and the LangGraph self-correction loop.
- Chapter 4: Infrastructure and Foundation — Examines the role of SurrealDB and local model quantization.
- Chapter 5: The User Interface — Discusses the UX philosophy of Notebook-scoped analysis.
- Chapter 6: Operations & Quality Gates — Details the CI/CD pipeline and container orchestration.
- Appendix A: Developer Context — Implementation heuristics and troubleshooting for AI agents.
[!NOTE] This documentation is a living artifact. All architectural changes must be reflected here to maintain the system's "textbook" integrity.