* feat: initialize rag
* refactor: using cosine distance metric for chromadb
* feat: use RecursiveCharacterTextSplitter as chunker strategy
* feat: support chucker and loader per data_type
* feat: adding JSON loader
* feat: adding CSVLoader
* feat: adding loader for DOCX files
* feat: add loader for MDX files
* feat: add loader for XML files
* feat: add loader for parser Webpage
* feat: support to load files from an entire directory
* feat: support to auto-load the loaders for additional DataType
* feat: add chuckers for some specific data type
- Each chunker uses separators specific to its content type
* feat: prevent document duplication and centralize content management
- Implement document deduplication logic in RAG
* Check for existing documents by source reference
* Compare doc IDs to detect content changes
* Automatically replace outdated content while preventing duplicates
- Centralize common functionality for better maintainability
* Create SourceContent class to handle URLs, files, and text uniformly
* Extract shared utilities (compute_sha256) to misc.py
* Standardize doc ID generation across all loaders
- Improve RAG system architecture
* All loaders now inherit consistent patterns from centralized BaseLoader
* Better separation of concerns with dedicated content management classes
* Standardized LoaderResult structure across all loader implementations
* chore: split text loaders file
* test: adding missing tests about RAG loaders
* refactor: QOL
* fix: add missing uv syntax on DOCXLoader