--- title: Knowledge description: What is knowledge in CrewAI and how to use it. icon: book --- ## What is Knowledge? Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks. Think of it as giving your agents a reference library they can consult while working. Key benefits of using Knowledge: - Enhance agents with domain-specific information - Support decisions with real-world data - Maintain context across conversations - Ground responses in factual information ## Supported Knowledge Sources CrewAI supports various types of knowledge sources out of the box: - Raw strings - Text files (.txt) - PDF documents - CSV files - Excel spreadsheets - JSON documents ## Supported Knowledge Parameters | Parameter | Type | Required | Description | | :--------------------------- | :---------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- | | `sources` | **List[BaseKnowledgeSource]** | Yes | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content. | | `collection_name` | **str** | No | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to "knowledge" if not provided. | | `storage` | **Optional[KnowledgeStorage]** | No | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created. | ## Quickstart Example For file-Based Knowledge Sources, make sure to place your files in a `knowledge` directory at the root of your project. Also, use relative paths from the `knowledge` directory when creating the source. Here's an example using string-based knowledge: ```python Code from crewai import Agent, Task, Crew, Process, LLM from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource # Create a knowledge source content = "Users name is John. He is 30 years old and lives in San Francisco." string_source = StringKnowledgeSource( content=content, ) # Create an LLM with a temperature of 0 to ensure deterministic outputs llm = LLM(model="gpt-4o-mini", temperature=0) # Create an agent with the knowledge store agent = Agent( role="About User", goal="You know everything about the user.", backstory="""You are a master at understanding people and their preferences.""", verbose=True, allow_delegation=False, llm=llm, ) task = Task( description="Answer the following questions about the user: {question}", expected_output="An answer to the question.", agent=agent, ) crew = Crew( agents=[agent], tasks=[task], verbose=True, process=Process.sequential, knowledge_sources=[string_source], # Enable knowledge by adding the sources here. You can also add more sources to the sources list. ) result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"}) ``` Here's another example with the `CrewDoclingSource`. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including MD, PDF, DOCX, HTML, and more. You need to install `docling` for the following example to work: `uv add docling` ```python Code from crewai import LLM, Agent, Crew, Process, Task from crewai.knowledge.source.crew_docling_source import CrewDoclingSource # Create a knowledge source content_source = CrewDoclingSource( file_paths=[ "https://lilianweng.github.io/posts/2024-11-28-reward-hacking", "https://lilianweng.github.io/posts/2024-07-07-hallucination", ], ) # Create an LLM with a temperature of 0 to ensure deterministic outputs llm = LLM(model="gpt-4o-mini", temperature=0) # Create an agent with the knowledge store agent = Agent( role="About papers", goal="You know everything about the papers.", backstory="""You are a master at understanding papers and their content.""", verbose=True, allow_delegation=False, llm=llm, ) task = Task( description="Answer the following questions about the papers: {question}", expected_output="An answer to the question.", agent=agent, ) crew = Crew( agents=[agent], tasks=[task], verbose=True, process=Process.sequential, knowledge_sources=[ content_source ], # Enable knowledge by adding the sources here. You can also add more sources to the sources list. ) result = crew.kickoff( inputs={ "question": "What is the reward hacking paper about? Be sure to provide sources." } ) ``` ## More Examples Here are examples of how to use different types of knowledge sources: ### Text File Knowledge Source ```python from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource # Create a text file knowledge source text_source = TextFileKnowledgeSource( file_paths=["document.txt", "another.txt"] ) # Create crew with text file source on agents or crew level agent = Agent( ... knowledge_sources=[text_source] ) crew = Crew( ... knowledge_sources=[text_source] ) ``` ### PDF Knowledge Source ```python from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource # Create a PDF knowledge source pdf_source = PDFKnowledgeSource( file_paths=["document.pdf", "another.pdf"] ) # Create crew with PDF knowledge source on agents or crew level agent = Agent( ... knowledge_sources=[pdf_source] ) crew = Crew( ... knowledge_sources=[pdf_source] ) ``` ### CSV Knowledge Source ```python from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource # Create a CSV knowledge source csv_source = CSVKnowledgeSource( file_paths=["data.csv"] ) # Create crew with CSV knowledge source or on agent level agent = Agent( ... knowledge_sources=[csv_source] ) crew = Crew( ... knowledge_sources=[csv_source] ) ``` ### Excel Knowledge Source ```python from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource # Create an Excel knowledge source excel_source = ExcelKnowledgeSource( file_paths=["spreadsheet.xlsx"] ) # Create crew with Excel knowledge source on agents or crew level agent = Agent( ... knowledge_sources=[excel_source] ) crew = Crew( ... knowledge_sources=[excel_source] ) ``` ### JSON Knowledge Source ```python from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource # Create a JSON knowledge source json_source = JSONKnowledgeSource( file_paths=["data.json"] ) # Create crew with JSON knowledge source on agents or crew level agent = Agent( ... knowledge_sources=[json_source] ) crew = Crew( ... knowledge_sources=[json_source] ) ``` ## Knowledge Configuration ### Chunking Configuration Knowledge sources automatically chunk content for better processing. You can configure chunking behavior in your knowledge sources: ```python from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource source = StringKnowledgeSource( content="Your content here", chunk_size=4000, # Maximum size of each chunk (default: 4000) chunk_overlap=200 # Overlap between chunks (default: 200) ) ``` The chunking configuration helps in: - Breaking down large documents into manageable pieces - Maintaining context through chunk overlap - Optimizing retrieval accuracy ### Embeddings Configuration You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents. The `embedder` parameter supports various embedding model providers that include: - `openai`: OpenAI's embedding models - `google`: Google's text embedding models - `azure`: Azure OpenAI embeddings - `ollama`: Local embeddings with Ollama - `vertexai`: Google Cloud VertexAI embeddings - `cohere`: Cohere's embedding models - `voyageai`: VoyageAI's embedding models - `bedrock`: AWS Bedrock embeddings - `huggingface`: Hugging Face models - `watson`: IBM Watson embeddings Here's an example of how to configure the embedder for the knowledge store using Google's `text-embedding-004` model: ```python Example from crewai import Agent, Task, Crew, Process, LLM from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource import os # Get the GEMINI API key GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY") # Create a knowledge source content = "Users name is John. He is 30 years old and lives in San Francisco." string_source = StringKnowledgeSource( content=content, ) # Create an LLM with a temperature of 0 to ensure deterministic outputs gemini_llm = LLM( model="gemini/gemini-1.5-pro-002", api_key=GEMINI_API_KEY, temperature=0, ) # Create an agent with the knowledge store agent = Agent( role="About User", goal="You know everything about the user.", backstory="""You are a master at understanding people and their preferences.""", verbose=True, allow_delegation=False, llm=gemini_llm, embedder={ "provider": "google", "config": { "model": "models/text-embedding-004", "api_key": GEMINI_API_KEY, } } ) task = Task( description="Answer the following questions about the user: {question}", expected_output="An answer to the question.", agent=agent, ) crew = Crew( agents=[agent], tasks=[task], verbose=True, process=Process.sequential, knowledge_sources=[string_source], embedder={ "provider": "google", "config": { "model": "models/text-embedding-004", "api_key": GEMINI_API_KEY, } } ) result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"}) ``` ```text Output # Agent: About User ## Task: Answer the following questions about the user: What city does John live in and how old is he? # Agent: About User ## Final Answer: John is 30 years old and lives in San Francisco. ``` ## Clearing Knowledge If you need to clear the knowledge stored in CrewAI, you can use the `crewai reset-memories` command with the `--knowledge` option. ```bash Command crewai reset-memories --knowledge ``` This is useful when you've updated your knowledge sources and want to ensure that the agents are using the most recent information. ## Agent-Specific Knowledge While knowledge can be provided at the crew level using `crew.knowledge_sources`, individual agents can also have their own knowledge sources using the `knowledge_sources` parameter: ```python Code from crewai import Agent, Task, Crew from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource # Create agent-specific knowledge about a product product_specs = StringKnowledgeSource( content="""The XPS 13 laptop features: - 13.4-inch 4K display - Intel Core i7 processor - 16GB RAM - 512GB SSD storage - 12-hour battery life""", metadata={"category": "product_specs"} ) # Create a support agent with product knowledge support_agent = Agent( role="Technical Support Specialist", goal="Provide accurate product information and support.", backstory="You are an expert on our laptop products and specifications.", knowledge_sources=[product_specs] # Agent-specific knowledge ) # Create a task that requires product knowledge support_task = Task( description="Answer this customer question: {question}", agent=support_agent ) # Create and run the crew crew = Crew( agents=[support_agent], tasks=[support_task] ) # Get answer about the laptop's specifications result = crew.kickoff( inputs={"question": "What is the storage capacity of the XPS 13?"} ) ``` Benefits of agent-specific knowledge: - Give agents specialized information for their roles - Maintain separation of concerns between agents - Combine with crew-level knowledge for layered information access ## Custom Knowledge Sources CrewAI allows you to create custom knowledge sources for any type of data by extending the `BaseKnowledgeSource` class. Let's create a practical example that fetches and processes space news articles. #### Space News Knowledge Source Example ```python Code from crewai import Agent, Task, Crew, Process, LLM from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource import requests from datetime import datetime from typing import Dict, Any from pydantic import BaseModel, Field class SpaceNewsKnowledgeSource(BaseKnowledgeSource): """Knowledge source that fetches data from Space News API.""" api_endpoint: str = Field(description="API endpoint URL") limit: int = Field(default=10, description="Number of articles to fetch") def load_content(self) -> Dict[Any, str]: """Fetch and format space news articles.""" try: response = requests.get( f"{self.api_endpoint}?limit={self.limit}" ) response.raise_for_status() data = response.json() articles = data.get('results', []) formatted_data = self._format_articles(articles) return {self.api_endpoint: formatted_data} except Exception as e: raise ValueError(f"Failed to fetch space news: {str(e)}") def _format_articles(self, articles: list) -> str: """Format articles into readable text.""" formatted = "Space News Articles:\n\n" for article in articles: formatted += f""" Title: {article['title']} Published: {article['published_at']} Summary: {article['summary']} News Site: {article['news_site']} URL: {article['url']} -------------------""" return formatted def add(self) -> None: """Process and store the articles.""" content = self.load_content() for _, text in content.items(): chunks = self._chunk_text(text) self.chunks.extend(chunks) self._save_documents() # Create knowledge source recent_news = SpaceNewsKnowledgeSource( api_endpoint="https://api.spaceflightnewsapi.net/v4/articles", limit=10, ) # Create specialized agent space_analyst = Agent( role="Space News Analyst", goal="Answer questions about space news accurately and comprehensively", backstory="""You are a space industry analyst with expertise in space exploration, satellite technology, and space industry trends. You excel at answering questions about space news and providing detailed, accurate information.""", knowledge_sources=[recent_news], llm=LLM(model="gpt-4", temperature=0.0) ) # Create task that handles user questions analysis_task = Task( description="Answer this question about space news: {user_question}", expected_output="A detailed answer based on the recent space news articles", agent=space_analyst ) # Create and run the crew crew = Crew( agents=[space_analyst], tasks=[analysis_task], verbose=True, process=Process.sequential ) # Example usage result = crew.kickoff( inputs={"user_question": "What are the latest developments in space exploration?"} ) ``` ```output Output # Agent: Space News Analyst ## Task: Answer this question about space news: What are the latest developments in space exploration? # Agent: Space News Analyst ## Final Answer: The latest developments in space exploration, based on recent space news articles, include the following: 1. SpaceX has received the final regulatory approvals to proceed with the second integrated Starship/Super Heavy launch, scheduled for as soon as the morning of Nov. 17, 2023. This is a significant step in SpaceX's ambitious plans for space exploration and colonization. [Source: SpaceNews](https://spacenews.com/starship-cleared-for-nov-17-launch/) 2. SpaceX has also informed the US Federal Communications Commission (FCC) that it plans to begin launching its first next-generation Starlink Gen2 satellites. This represents a major upgrade to the Starlink satellite internet service, which aims to provide high-speed internet access worldwide. [Source: Teslarati](https://www.teslarati.com/spacex-first-starlink-gen2-satellite-launch-2022/) 3. AI startup Synthetaic has raised $15 million in Series B funding. The company uses artificial intelligence to analyze data from space and air sensors, which could have significant applications in space exploration and satellite technology. [Source: SpaceNews](https://spacenews.com/ai-startup-synthetaic-raises-15-million-in-series-b-funding/) 4. The Space Force has formally established a unit within the U.S. Indo-Pacific Command, marking a permanent presence in the Indo-Pacific region. This could have significant implications for space security and geopolitics. [Source: SpaceNews](https://spacenews.com/space-force-establishes-permanent-presence-in-indo-pacific-region/) 5. Slingshot Aerospace, a space tracking and data analytics company, is expanding its network of ground-based optical telescopes to increase coverage of low Earth orbit. This could improve our ability to track and analyze objects in low Earth orbit, including satellites and space debris. [Source: SpaceNews](https://spacenews.com/slingshots-space-tracking-network-to-extend-coverage-of-low-earth-orbit/) 6. The National Natural Science Foundation of China has outlined a five-year project for researchers to study the assembly of ultra-large spacecraft. This could lead to significant advancements in spacecraft technology and space exploration capabilities. [Source: SpaceNews](https://spacenews.com/china-researching-challenges-of-kilometer-scale-ultra-large-spacecraft/) 7. The Center for AEroSpace Autonomy Research (CAESAR) at Stanford University is focusing on spacecraft autonomy. The center held a kickoff event on May 22, 2024, to highlight the industry, academia, and government collaboration it seeks to foster. This could lead to significant advancements in autonomous spacecraft technology. [Source: SpaceNews](https://spacenews.com/stanford-center-focuses-on-spacecraft-autonomy/) ``` #### Key Components Explained 1. **Custom Knowledge Source (`SpaceNewsKnowledgeSource`)**: - Extends `BaseKnowledgeSource` for integration with CrewAI - Configurable API endpoint and article limit - Implements three key methods: - `load_content()`: Fetches articles from the API - `_format_articles()`: Structures the articles into readable text - `add()`: Processes and stores the content 2. **Agent Configuration**: - Specialized role as a Space News Analyst - Uses the knowledge source to access space news 3. **Task Setup**: - Takes a user question as input through `{user_question}` - Designed to provide detailed answers based on the knowledge source 4. **Crew Orchestration**: - Manages the workflow between agent and task - Handles input/output through the kickoff method This example demonstrates how to: - Create a custom knowledge source that fetches real-time data - Process and format external data for AI consumption - Use the knowledge source to answer specific user questions - Integrate everything seamlessly with CrewAI's agent system #### About the Spaceflight News API The example uses the [Spaceflight News API](https://api.spaceflightnewsapi.net/v4/docs/), which: - Provides free access to space-related news articles - Requires no authentication - Returns structured data about space news - Supports pagination and filtering You can customize the API query by modifying the endpoint URL: ```python # Fetch more articles recent_news = SpaceNewsKnowledgeSource( api_endpoint="https://api.spaceflightnewsapi.net/v4/articles", limit=20, # Increase the number of articles ) # Add search parameters recent_news = SpaceNewsKnowledgeSource( api_endpoint="https://api.spaceflightnewsapi.net/v4/articles?search=NASA", # Search for NASA news limit=10, ) ``` ## Best Practices - Keep chunk sizes appropriate for your content type - Consider content overlap for context preservation - Organize related information into separate knowledge sources - Adjust chunk sizes based on content complexity - Configure appropriate embedding models - Consider using local embedding providers for faster processing