Document Processing

Extract and prepare documents for RAG systems.

Transform raw documents into structured, searchable chunks ready for embedding.


Document Parsing

Level: Beginner

Extract text, tables, and structure from PDFs, HTML, Word documents, and handle OCR for scanned documents.

Topics: pdf-parsing • ocr • html-extraction • table-extraction


Chunking Strategies

Level: Beginner

Split documents into optimal chunks that balance context and specificity for better retrieval.

Topics: chunk-size • overlap • semantic-chunking • document-structure