mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-09 08:08:32 +00:00
Update docs (#1842)
* Update portkey docs * Add more examples to Knowledge docs + clarify issue with `embedder` * fix knowledge params and usage instructions
This commit is contained in:
@@ -4,8 +4,6 @@ description: What is knowledge in CrewAI and how to use it.
|
||||
icon: book
|
||||
---
|
||||
|
||||
# Using Knowledge in CrewAI
|
||||
|
||||
## What is Knowledge?
|
||||
|
||||
Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks.
|
||||
@@ -36,7 +34,20 @@ CrewAI supports various types of knowledge sources out of the box:
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Quick Start
|
||||
## Supported Knowledge Parameters
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| :--------------------------- | :---------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `sources` | **List[BaseKnowledgeSource]** | Yes | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content. |
|
||||
| `collection_name` | **str** | No | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to "knowledge" if not provided. |
|
||||
| `storage` | **Optional[KnowledgeStorage]** | No | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created. |
|
||||
|
||||
## Quickstart Example
|
||||
|
||||
<Tip>
|
||||
For file-Based Knowledge Sources, make sure to place your files in a `knowledge` directory at the root of your project.
|
||||
Also, use relative paths from the `knowledge` directory when creating the source.
|
||||
</Tip>
|
||||
|
||||
Here's an example using string-based knowledge:
|
||||
|
||||
@@ -80,7 +91,8 @@ result = crew.kickoff(inputs={"question": "What city does John live in and how o
|
||||
```
|
||||
|
||||
|
||||
Here's another example with the `CrewDoclingSource`
|
||||
Here's another example with the `CrewDoclingSource`. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including TXT, PDF, DOCX, HTML, and more.
|
||||
|
||||
```python Code
|
||||
from crewai import LLM, Agent, Crew, Process, Task
|
||||
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
|
||||
@@ -128,39 +140,192 @@ result = crew.kickoff(
|
||||
)
|
||||
```
|
||||
|
||||
## More Examples
|
||||
|
||||
Here are examples of how to use different types of knowledge sources:
|
||||
|
||||
### Text File Knowledge Source
|
||||
```python
|
||||
from crewai.knowledge.source import CrewDoclingSource
|
||||
|
||||
# Create a text file knowledge source
|
||||
text_source = CrewDoclingSource(
|
||||
file_paths=["document.txt", "another.txt"]
|
||||
)
|
||||
|
||||
# Create knowledge with text file source
|
||||
knowledge = Knowledge(
|
||||
collection_name="text_knowledge",
|
||||
sources=[text_source]
|
||||
)
|
||||
```
|
||||
|
||||
### PDF Knowledge Source
|
||||
```python
|
||||
from crewai.knowledge.source import PDFKnowledgeSource
|
||||
|
||||
# Create a PDF knowledge source
|
||||
pdf_source = PDFKnowledgeSource(
|
||||
file_paths=["document.pdf", "another.pdf"]
|
||||
)
|
||||
|
||||
# Create knowledge with PDF source
|
||||
knowledge = Knowledge(
|
||||
collection_name="pdf_knowledge",
|
||||
sources=[pdf_source]
|
||||
)
|
||||
```
|
||||
|
||||
### CSV Knowledge Source
|
||||
```python
|
||||
from crewai.knowledge.source import CSVKnowledgeSource
|
||||
|
||||
# Create a CSV knowledge source
|
||||
csv_source = CSVKnowledgeSource(
|
||||
file_paths=["data.csv"]
|
||||
)
|
||||
|
||||
# Create knowledge with CSV source
|
||||
knowledge = Knowledge(
|
||||
collection_name="csv_knowledge",
|
||||
sources=[csv_source]
|
||||
)
|
||||
```
|
||||
|
||||
### Excel Knowledge Source
|
||||
```python
|
||||
from crewai.knowledge.source import ExcelKnowledgeSource
|
||||
|
||||
# Create an Excel knowledge source
|
||||
excel_source = ExcelKnowledgeSource(
|
||||
file_paths=["spreadsheet.xlsx"]
|
||||
)
|
||||
|
||||
# Create knowledge with Excel source
|
||||
knowledge = Knowledge(
|
||||
collection_name="excel_knowledge",
|
||||
sources=[excel_source]
|
||||
)
|
||||
```
|
||||
|
||||
### JSON Knowledge Source
|
||||
```python
|
||||
from crewai.knowledge.source import JSONKnowledgeSource
|
||||
|
||||
# Create a JSON knowledge source
|
||||
json_source = JSONKnowledgeSource(
|
||||
file_paths=["data.json"]
|
||||
)
|
||||
|
||||
# Create knowledge with JSON source
|
||||
knowledge = Knowledge(
|
||||
collection_name="json_knowledge",
|
||||
sources=[json_source]
|
||||
)
|
||||
```
|
||||
|
||||
## Knowledge Configuration
|
||||
|
||||
### Chunking Configuration
|
||||
|
||||
Control how content is split for processing by setting the chunk size and overlap.
|
||||
Knowledge sources automatically chunk content for better processing.
|
||||
You can configure chunking behavior in your knowledge sources:
|
||||
|
||||
```python Code
|
||||
knowledge_source = StringKnowledgeSource(
|
||||
content="Long content...",
|
||||
chunk_size=4000, # Characters per chunk (default)
|
||||
chunk_overlap=200 # Overlap between chunks (default)
|
||||
```python
|
||||
from crewai.knowledge.source import StringKnowledgeSource
|
||||
|
||||
source = StringKnowledgeSource(
|
||||
content="Your content here",
|
||||
chunk_size=4000, # Maximum size of each chunk (default: 4000)
|
||||
chunk_overlap=200 # Overlap between chunks (default: 200)
|
||||
)
|
||||
```
|
||||
|
||||
## Embedder Configuration
|
||||
The chunking configuration helps in:
|
||||
- Breaking down large documents into manageable pieces
|
||||
- Maintaining context through chunk overlap
|
||||
- Optimizing retrieval accuracy
|
||||
|
||||
You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
|
||||
### Embeddings Configuration
|
||||
|
||||
```python Code
|
||||
...
|
||||
You can also configure the embedder for the knowledge store.
|
||||
This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
|
||||
The `embedder` parameter supports various embedding model providers that include:
|
||||
- `openai`: OpenAI's embedding models
|
||||
- `google`: Google's text embedding models
|
||||
- `azure`: Azure OpenAI embeddings
|
||||
- `ollama`: Local embeddings with Ollama
|
||||
- `vertexai`: Google Cloud VertexAI embeddings
|
||||
- `cohere`: Cohere's embedding models
|
||||
- `bedrock`: AWS Bedrock embeddings
|
||||
- `huggingface`: Hugging Face models
|
||||
- `watson`: IBM Watson embeddings
|
||||
|
||||
Here's an example of how to configure the embedder for the knowledge store using Google's `text-embedding-004` model:
|
||||
<CodeGroup>
|
||||
```python Example
|
||||
from crewai import Agent, Task, Crew, Process, LLM
|
||||
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
|
||||
import os
|
||||
|
||||
# Get the GEMINI API key
|
||||
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
|
||||
|
||||
# Create a knowledge source
|
||||
content = "Users name is John. He is 30 years old and lives in San Francisco."
|
||||
string_source = StringKnowledgeSource(
|
||||
content="Users name is John. He is 30 years old and lives in San Francisco.",
|
||||
content=content,
|
||||
)
|
||||
|
||||
# Create an LLM with a temperature of 0 to ensure deterministic outputs
|
||||
gemini_llm = LLM(
|
||||
model="gemini/gemini-1.5-pro-002",
|
||||
api_key=GEMINI_API_KEY,
|
||||
temperature=0,
|
||||
)
|
||||
|
||||
# Create an agent with the knowledge store
|
||||
agent = Agent(
|
||||
role="About User",
|
||||
goal="You know everything about the user.",
|
||||
backstory="""You are a master at understanding people and their preferences.""",
|
||||
verbose=True,
|
||||
allow_delegation=False,
|
||||
llm=gemini_llm,
|
||||
)
|
||||
|
||||
task = Task(
|
||||
description="Answer the following questions about the user: {question}",
|
||||
expected_output="An answer to the question.",
|
||||
agent=agent,
|
||||
)
|
||||
|
||||
crew = Crew(
|
||||
...
|
||||
agents=[agent],
|
||||
tasks=[task],
|
||||
verbose=True,
|
||||
process=Process.sequential,
|
||||
knowledge_sources=[string_source],
|
||||
embedder={
|
||||
"provider": "openai",
|
||||
"config": {"model": "text-embedding-3-small"},
|
||||
},
|
||||
"provider": "google",
|
||||
"config": {
|
||||
"model": "models/text-embedding-004",
|
||||
"api_key": GEMINI_API_KEY,
|
||||
}
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
|
||||
```
|
||||
```text Output
|
||||
# Agent: About User
|
||||
## Task: Answer the following questions about the user: What city does John live in and how old is he?
|
||||
|
||||
# Agent: About User
|
||||
## Final Answer:
|
||||
John is 30 years old and lives in San Francisco.
|
||||
```
|
||||
</CodeGroup>
|
||||
## Clearing Knowledge
|
||||
|
||||
If you need to clear the knowledge stored in CrewAI, you can use the `crewai reset-memories` command with the `--knowledge` option.
|
||||
|
||||
Reference in New Issue
Block a user