Document Processing
Extract and prepare documents for RAG systems.
Transform raw documents into structured, searchable chunks ready for embedding.
Document Parsing
Level: Beginner
Extract text, tables, and structure from PDFs, HTML, Word documents, and handle OCR for scanned documents.
Topics: pdf-parsing • ocr • html-extraction • table-extraction
Chunking Strategies
Level: Beginner
Split documents into optimal chunks that balance context and specificity for better retrieval.
Topics: chunk-size • overlap • semantic-chunking • document-structure