Update docs (#1842)

* Update portkey docs * Add more examples to Knowledge docs + clarify issue with `embedder` * fix knowledge params and usage instructions
2026-01-11 00:58:30 +00:00 · 2025-01-02 16:10:31 -05:00
parent 4bcc3b532d
commit c1172a685a
4 changed files with 244 additions and 87 deletions
--- a/docs/concepts/flows.mdx
+++ b/docs/concepts/flows.mdx
@@ -138,7 +138,7 @@ print("---- Final Output ----")
 print(final_output)
 ````
-``` text Output
+```text Output
 ---- Final Output ----
 Second method received: Output from first_method
 ````
--- a/docs/concepts/knowledge.mdx
+++ b/docs/concepts/knowledge.mdx
@@ -4,8 +4,6 @@ description: What is knowledge in CrewAI and how to use it.
 icon: book
 ---
 # Using Knowledge in CrewAI
 ## What is Knowledge?
 Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks.
@@ -36,7 +34,20 @@ CrewAI supports various types of knowledge sources out of the box:
  </Card>
 </CardGroup>
-## Quick Start
+## Supported Knowledge Parameters
 | Parameter                    | Type                                | Required | Description                                                                                                                                           |
 | :--------------------------- | :---------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `sources`                  | **List[BaseKnowledgeSource]**        | Yes      | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content.           |
 | `collection_name`          | **str**                              | No       | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to "knowledge" if not provided.     |
 | `storage`                  | **Optional[KnowledgeStorage]**       | No       | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created.              |
 ## Quickstart Example
 <Tip>
 For file-Based Knowledge Sources, make sure to place your files in a `knowledge` directory at the root of your project. 
 Also, use relative paths from the `knowledge` directory when creating the source.
 </Tip>
 Here's an example using string-based knowledge:
@@ -80,7 +91,8 @@ result = crew.kickoff(inputs={"question": "What city does John live in and how o
 ```
-Here's another example with the `CrewDoclingSource`
+Here's another example with the `CrewDoclingSource`. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including TXT, PDF, DOCX, HTML, and more. 
 ```python Code
 from crewai import LLM, Agent, Crew, Process, Task
 from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
@@ -128,39 +140,192 @@ result = crew.kickoff(
 )
 ```
 ## More Examples
 Here are examples of how to use different types of knowledge sources:
 ### Text File Knowledge Source
 ```python
 from crewai.knowledge.source import CrewDoclingSource
 # Create a text file knowledge source
 text_source = CrewDoclingSource(
    file_paths=["document.txt", "another.txt"]
 )
 # Create knowledge with text file source
 knowledge = Knowledge(
    collection_name="text_knowledge",
    sources=[text_source]
 )
 ```
 ### PDF Knowledge Source
 ```python
 from crewai.knowledge.source import PDFKnowledgeSource
 # Create a PDF knowledge source
 pdf_source = PDFKnowledgeSource(
    file_paths=["document.pdf", "another.pdf"]
 )
 # Create knowledge with PDF source
 knowledge = Knowledge(
    collection_name="pdf_knowledge",
    sources=[pdf_source]
 )
 ```
 ### CSV Knowledge Source
 ```python
 from crewai.knowledge.source import CSVKnowledgeSource
 # Create a CSV knowledge source
 csv_source = CSVKnowledgeSource(
    file_paths=["data.csv"]
 )
 # Create knowledge with CSV source
 knowledge = Knowledge(
    collection_name="csv_knowledge",
    sources=[csv_source]
 )
 ```
 ### Excel Knowledge Source
 ```python
 from crewai.knowledge.source import ExcelKnowledgeSource
 # Create an Excel knowledge source
 excel_source = ExcelKnowledgeSource(
    file_paths=["spreadsheet.xlsx"]
 )
 # Create knowledge with Excel source
 knowledge = Knowledge(
    collection_name="excel_knowledge",
    sources=[excel_source]
 )
 ```
 ### JSON Knowledge Source
 ```python
 from crewai.knowledge.source import JSONKnowledgeSource
 # Create a JSON knowledge source
 json_source = JSONKnowledgeSource(
    file_paths=["data.json"]
 )
 # Create knowledge with JSON source
 knowledge = Knowledge(
    collection_name="json_knowledge",
    sources=[json_source]
 )
 ```
 ## Knowledge Configuration
 ### Chunking Configuration
-Control how content is split for processing by setting the chunk size and overlap.
+Knowledge sources automatically chunk content for better processing. 
 You can configure chunking behavior in your knowledge sources:
-```python Code
+```python
-knowledge_source = StringKnowledgeSource(
+from crewai.knowledge.source import StringKnowledgeSource
-    content="Long content...",
+
-    chunk_size=4000,     # Characters per chunk (default)
+source = StringKnowledgeSource(
-    chunk_overlap=200    # Overlap between chunks (default)
+    content="Your content here",
    chunk_size=4000,      # Maximum size of each chunk (default: 4000)
    chunk_overlap=200     # Overlap between chunks (default: 200)
 )
 ```
-## Embedder Configuration
+The chunking configuration helps in:
 - Breaking down large documents into manageable pieces
 - Maintaining context through chunk overlap
 - Optimizing retrieval accuracy
-You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
+### Embeddings Configuration
-```python Code
+You can also configure the embedder for the knowledge store. 
-...
+This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
 The `embedder` parameter supports various embedding model providers that include:
 - `openai`: OpenAI's embedding models
 - `google`: Google's text embedding models
 - `azure`: Azure OpenAI embeddings
 - `ollama`: Local embeddings with Ollama
 - `vertexai`: Google Cloud VertexAI embeddings
 - `cohere`: Cohere's embedding models
 - `bedrock`: AWS Bedrock embeddings
 - `huggingface`: Hugging Face models
 - `watson`: IBM Watson embeddings
 Here's an example of how to configure the embedder for the knowledge store using Google's `text-embedding-004` model:
 <CodeGroup>
 ```python Example
 from crewai import Agent, Task, Crew, Process, LLM
 from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
 import os
 # Get the GEMINI API key
 GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
 # Create a knowledge source
 content = "Users name is John. He is 30 years old and lives in San Francisco."
 string_source = StringKnowledgeSource(
-    content="Users name is John. He is 30 years old and lives in San Francisco.",
+    content=content,
 )
 # Create an LLM with a temperature of 0 to ensure deterministic outputs
 gemini_llm = LLM(
    model="gemini/gemini-1.5-pro-002",
    api_key=GEMINI_API_KEY,
    temperature=0,
 )
 # Create an agent with the knowledge store
 agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm,
 )
 task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
 )
 crew = Crew(
-    ...
+    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
-        "provider": "openai",
+        "provider": "google",
-        "config": {"model": "text-embedding-3-small"},
+        "config": {
-    },
+            "model": "models/text-embedding-004",
            "api_key": GEMINI_API_KEY,
        }
    }
 )
 ```
 result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
 ```
 ```text Output
 # Agent: About User
 ## Task: Answer the following questions about the user: What city does John live in and how old is he?
 # Agent: About User
 ## Final Answer: 
 John is 30 years old and lives in San Francisco.
 ```
 </CodeGroup>
 ## Clearing Knowledge
 If you need to clear the knowledge stored in CrewAI, you can use the `crewai reset-memories` command with the `--knowledge` option.
--- a/docs/how-to/Portkey-Observability-and-Guardrails.md
+++ b/docs/how-to/Portkey-Observability-and-Guardrails.md
@@ -1,4 +1,9 @@
-# Portkey Integration with CrewAI
+---
 title: Portkey Observability and Guardrails
 description: How to use Portkey with CrewAI
 icon: key
 ---
 <img src="https://raw.githubusercontent.com/siddharthsambharia-portkey/Portkey-Product-Images/main/Portkey-CrewAI.png" alt="Portkey CrewAI Header Image" width="70%" />
@@ -10,29 +15,24 @@ Portkey adds 4 core production capabilities to any CrewAI agent:
 3. Full-stack tracing & cost, performance analytics
 4. Real-time guardrails to enforce behavior
 ## Getting Started
-1. **Install Required Packages:**
+<Steps>
    <Step title="Install CrewAI and Portkey">
    ```bash
    pip install -qU crewai portkey-ai
    ```
    </Step>
    <Step title="Configure the LLM Client">
    To build CrewAI Agents with Portkey, you'll need two keys:
    - **Portkey API Key**: Sign up on the [Portkey app](https://app.portkey.ai/?utm_source=crewai&utm_medium=crewai&utm_campaign=crewai) and copy your API key
    - **Virtual Key**: Virtual Keys securely manage your LLM API keys in one place. Store your LLM provider API keys securely in Portkey's vault
-```bash
+    ```python
-pip install -qU crewai portkey-ai
+    from crewai import LLM
-```
+    from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
-2. **Configure the LLM Client:**
+    gpt_llm = LLM(
 To build CrewAI Agents with Portkey, you'll need two keys:
 - **Portkey API Key**: Sign up on the [Portkey app](https://app.portkey.ai/?utm_source=crewai&utm_medium=crewai&utm_campaign=crewai) and copy your API key
 - **Virtual Key**: Virtual Keys securely manage your LLM API keys in one place. Store your LLM provider API keys securely in Portkey's vault
 ```python
 from crewai import LLM
 from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
 gpt_llm = LLM(
        model="gpt-4",
        base_url=PORTKEY_GATEWAY_URL,
        api_key="dummy", # We are using Virtual key
@@ -40,44 +40,44 @@ gpt_llm = LLM(
            api_key="YOUR_PORTKEY_API_KEY",
            virtual_key="YOUR_VIRTUAL_KEY", # Enter your Virtual key from Portkey
        )
-)
+    )
-```
+    ```
    </Step>
    <Step title="Create and Run Your First Agent">
    ```python
    from crewai import Agent, Task, Crew
-3. **Create and Run Your First Agent:**
+    # Define your agents with roles and goals
-
+    coder = Agent(
 ```python
 from crewai import Agent, Task, Crew
 # Define your agents with roles and goals
 coder = Agent(
        role='Software developer',
        goal='Write clear, concise code on demand',
        backstory='An expert coder with a keen eye for software trends.',
        llm=gpt_llm
-)
+    )
-# Create tasks for your agents
+    # Create tasks for your agents
-task1 = Task(
+    task1 = Task(
        description="Define the HTML for making a simple website with heading- Hello World! Portkey is working!",
        expected_output="A clear and concise HTML code",
        agent=coder
-)
+    )
-# Instantiate your crew
+    # Instantiate your crew
-crew = Crew(
+    crew = Crew(
        agents=[coder],
        tasks=[task1],
-)
+    )
 result = crew.kickoff()
 print(result)
 ```
    result = crew.kickoff()
    print(result)
    ```
    </Step>
 </Steps>
 ## Key Features
 | Feature | Description |
-|---------|-------------|
+|:--------|:------------|
 | 🌐 Multi-LLM Support | Access OpenAI, Anthropic, Gemini, Azure, and 250+ providers through a unified interface |
 | 🛡️ Production Reliability | Implement retries, timeouts, load balancing, and fallbacks |
 | 📊 Advanced Observability | Track 40+ metrics including costs, tokens, latency, and custom metadata |
@@ -200,12 +200,3 @@ For detailed information on creating and managing Configs, visit the [Portkey do
 - [📊 Portkey Dashboard](https://app.portkey.ai/?utm_source=crewai&utm_medium=crewai&utm_campaign=crewai)
 - [🐦 Twitter](https://twitter.com/portkeyai)
 - [💬 Discord Community](https://discord.gg/DD7vgKK299)
--- a/docs/mint.json
+++ b/docs/mint.json
@@ -100,7 +100,8 @@
        "how-to/conditional-tasks",
        "how-to/agentops-observability",
        "how-to/langtrace-observability",
-        "how-to/openlit-observability"
+        "how-to/openlit-observability",
        "how-to/portkey-observability"
      ]
    },
    {