* initial knowledge

* WIP

* Adding core knowledge sources

* Improve types and better support for file paths

* added additional sources

* fix linting

* update yaml to include optional deps

* adding in lorenze feedback

* ensure embeddings are persisted

* improvements all around Knowledge class

* return this

* properly reset memory

* properly reset memory+knowledge

* consolodation and improvements

* linted

* cleanup rm unused embedder

* fix test

* fix duplicate

* generating cassettes for knowledge test

* updated default embedder

* None embedder to use default on pipeline cloning

* improvements

* fixed text_file_knowledge

* mypysrc fixes

* type check fixes

* added extra cassette

* just mocks

* linted

* mock knowledge query to not spin up db

* linted

* verbose run

* put a flag

* fix

* adding docs

* better docs

* improvements from review

* more docs

* linted

* rm print

* more fixes

* clearer docs

* added docstrings and type hints for cli

---------

Co-authored-by: João Moura <joaomdmoura@gmail.com>
Co-authored-by: Lorenze Jay <lorenzejaytech@gmail.com>
This commit is contained in:
Brandon Hancock (bhancock_ai)
2024-11-20 18:40:08 -05:00
committed by GitHub
parent fde1ee45f9
commit 14a36d3f5e
37 changed files with 2302 additions and 266 deletions

View File

@@ -0,0 +1,36 @@
from pathlib import Path
from typing import Union, List
from pydantic import Field
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from typing import Dict, Any
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
class BaseFileKnowledgeSource(BaseKnowledgeSource):
"""Base class for knowledge sources that load content from files."""
file_path: Union[Path, List[Path]] = Field(...)
content: Dict[Path, str] = Field(init=False, default_factory=dict)
storage: KnowledgeStorage = Field(default_factory=KnowledgeStorage)
def model_post_init(self, _):
"""Post-initialization method to load content."""
self.content = self.load_content()
def load_content(self) -> Dict[Path, str]:
"""Load and preprocess file content. Should be overridden by subclasses."""
paths = [self.file_path] if isinstance(self.file_path, Path) else self.file_path
for path in paths:
if not path.exists():
raise FileNotFoundError(f"File not found: {path}")
if not path.is_file():
raise ValueError(f"Path is not a file: {path}")
return {}
def save_documents(self, metadata: Dict[str, Any]):
"""Save the documents to the storage."""
chunk_metadatas = [metadata.copy() for _ in self.chunks]
self.storage.save(self.chunks, chunk_metadatas)