Update docs (#1842)

* Update portkey docs * Add more examples to Knowledge docs + clarify issue with `embedder` * fix knowledge params and usage instructions
2026-07-26 17:25:10 +00:00 · 2025-01-02 16:10:31 -05:00
parent 4bcc3b532d
commit c1172a685a
4 changed files with 244 additions and 87 deletions
--- a/docs/concepts/knowledge.mdx
+++ b/docs/concepts/knowledge.mdx
@@ -4,8 +4,6 @@ description: What is knowledge in CrewAI and how to use it.
 icon: book
 ---

-# Using Knowledge in CrewAI
-
 ## What is Knowledge?

 Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks.
@@ -36,7 +34,20 @@ CrewAI supports various types of knowledge sources out of the box:
  </Card>
 </CardGroup>

-## Quick Start
+## Supported Knowledge Parameters
+
+| Parameter                    | Type                                | Required | Description                                                                                                                                           |
+| :--------------------------- | :---------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `sources`                  | **List[BaseKnowledgeSource]**        | Yes      | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content.           |
+| `collection_name`          | **str**                              | No       | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to "knowledge" if not provided.     |
+| `storage`                  | **Optional[KnowledgeStorage]**       | No       | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created.              |
+
+## Quickstart Example
+
+<Tip>
+For file-Based Knowledge Sources, make sure to place your files in a `knowledge` directory at the root of your project. 
+Also, use relative paths from the `knowledge` directory when creating the source.
+</Tip>

 Here's an example using string-based knowledge:

@@ -80,7 +91,8 @@ result = crew.kickoff(inputs={"question": "What city does John live in and how o
 ```


-Here's another example with the `CrewDoclingSource`
+Here's another example with the `CrewDoclingSource`. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including TXT, PDF, DOCX, HTML, and more. 
+
 ```python Code
 from crewai import LLM, Agent, Crew, Process, Task
 from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
@@ -128,39 +140,192 @@ result = crew.kickoff(
 )
 ```

+## More Examples
+
+Here are examples of how to use different types of knowledge sources:
+
+### Text File Knowledge Source
+```python
+from crewai.knowledge.source import CrewDoclingSource
+
+# Create a text file knowledge source
+text_source = CrewDoclingSource(
+    file_paths=["document.txt", "another.txt"]
+)
+
+# Create knowledge with text file source
+knowledge = Knowledge(
+    collection_name="text_knowledge",
+    sources=[text_source]
+)
+```
+
+### PDF Knowledge Source
+```python
+from crewai.knowledge.source import PDFKnowledgeSource
+
+# Create a PDF knowledge source
+pdf_source = PDFKnowledgeSource(
+    file_paths=["document.pdf", "another.pdf"]
+)
+
+# Create knowledge with PDF source
+knowledge = Knowledge(
+    collection_name="pdf_knowledge",
+    sources=[pdf_source]
+)
+```
+
+### CSV Knowledge Source
+```python
+from crewai.knowledge.source import CSVKnowledgeSource
+
+# Create a CSV knowledge source
+csv_source = CSVKnowledgeSource(
+    file_paths=["data.csv"]
+)
+
+# Create knowledge with CSV source
+knowledge = Knowledge(
+    collection_name="csv_knowledge",
+    sources=[csv_source]
+)
+```
+
+### Excel Knowledge Source
+```python
+from crewai.knowledge.source import ExcelKnowledgeSource
+
+# Create an Excel knowledge source
+excel_source = ExcelKnowledgeSource(
+    file_paths=["spreadsheet.xlsx"]
+)
+
+# Create knowledge with Excel source
+knowledge = Knowledge(
+    collection_name="excel_knowledge",
+    sources=[excel_source]
+)
+```
+
+### JSON Knowledge Source
+```python
+from crewai.knowledge.source import JSONKnowledgeSource
+
+# Create a JSON knowledge source
+json_source = JSONKnowledgeSource(
+    file_paths=["data.json"]
+)
+
+# Create knowledge with JSON source
+knowledge = Knowledge(
+    collection_name="json_knowledge",
+    sources=[json_source]
+)
+```
+
 ## Knowledge Configuration

 ### Chunking Configuration

-Control how content is split for processing by setting the chunk size and overlap.
+Knowledge sources automatically chunk content for better processing. 
+You can configure chunking behavior in your knowledge sources:

-```python Code
-knowledge_source = StringKnowledgeSource(
-    content="Long content...",
-    chunk_size=4000,     # Characters per chunk (default)
-    chunk_overlap=200    # Overlap between chunks (default)
+```python
+from crewai.knowledge.source import StringKnowledgeSource
+
+source = StringKnowledgeSource(
+    content="Your content here",
+    chunk_size=4000,      # Maximum size of each chunk (default: 4000)
+    chunk_overlap=200     # Overlap between chunks (default: 200)
 )
 ```

-## Embedder Configuration
+The chunking configuration helps in:
+- Breaking down large documents into manageable pieces
+- Maintaining context through chunk overlap
+- Optimizing retrieval accuracy

-You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
+### Embeddings Configuration

-```python Code
-...
+You can also configure the embedder for the knowledge store. 
+This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
+The `embedder` parameter supports various embedding model providers that include:
+- `openai`: OpenAI's embedding models
+- `google`: Google's text embedding models
+- `azure`: Azure OpenAI embeddings
+- `ollama`: Local embeddings with Ollama
+- `vertexai`: Google Cloud VertexAI embeddings
+- `cohere`: Cohere's embedding models
+- `bedrock`: AWS Bedrock embeddings
+- `huggingface`: Hugging Face models
+- `watson`: IBM Watson embeddings
+
+Here's an example of how to configure the embedder for the knowledge store using Google's `text-embedding-004` model:
+<CodeGroup>
+```python Example
+from crewai import Agent, Task, Crew, Process, LLM
+from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
+import os
+
+# Get the GEMINI API key
+GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
+
+# Create a knowledge source
+content = "Users name is John. He is 30 years old and lives in San Francisco."
 string_source = StringKnowledgeSource(
-    content="Users name is John. He is 30 years old and lives in San Francisco.",
+    content=content,
 )
+
+# Create an LLM with a temperature of 0 to ensure deterministic outputs
+gemini_llm = LLM(
+    model="gemini/gemini-1.5-pro-002",
+    api_key=GEMINI_API_KEY,
+    temperature=0,
+)
+
+# Create an agent with the knowledge store
+agent = Agent(
+    role="About User",
+    goal="You know everything about the user.",
+    backstory="""You are a master at understanding people and their preferences.""",
+    verbose=True,
+    allow_delegation=False,
+    llm=gemini_llm,
+)
+
+task = Task(
+    description="Answer the following questions about the user: {question}",
+    expected_output="An answer to the question.",
+    agent=agent,
+)
+
 crew = Crew(
-    ...
+    agents=[agent],
+    tasks=[task],
+    verbose=True,
+    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
-        "provider": "openai",
-        "config": {"model": "text-embedding-3-small"},
-    },
+        "provider": "google",
+        "config": {
+            "model": "models/text-embedding-004",
+            "api_key": GEMINI_API_KEY,
+        }
+    }
 )
-```

+result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
+```
+```text Output
+# Agent: About User
+## Task: Answer the following questions about the user: What city does John live in and how old is he?
+
+# Agent: About User
+## Final Answer: 
+John is 30 years old and lives in San Francisco.
+```
+</CodeGroup>
 ## Clearing Knowledge

 If you need to clear the knowledge stored in CrewAI, you can use the `crewai reset-memories` command with the `--knowledge` option.