docs: Add transparency features for prompts and memory systems (#2902)

* docs: Fix major memory system documentation issues - Remove misleading deprecation warnings, fix confusing comments, clearly separate three memory approaches, provide accurate examples that match implementation * fix: Correct broken image paths in README - Update crewai_logo.png and asset.png paths to point to docs/images/ directory instead of docs/ directly * docs: Add system prompt transparency and customization guide - Add 'Understanding Default System Instructions' section to address black-box concerns - Document what CrewAI automatically injects into prompts - Provide code examples to inspect complete system prompts - Show 3 methods to override default instructions - Include observability integration examples with Langfuse - Add best practices for production prompt management * docs: Fix implementation accuracy issues in memory documentation - Fix Ollama embedding URL parameter and remove unsupported Cohere input_type parameter * docs: Reference observability docs instead of showing specific tool examples * docs: Reorganize knowledge documentation for better developer experience - Move quickstart examples right after overview for immediate hands-on experience - Create logical learning progression: basics → configuration → advanced → troubleshooting - Add comprehensive agent vs crew knowledge guide with working examples - Consolidate debugging and troubleshooting in dedicated section - Organize best practices by topic in accordion format - Improve content flow from simple concepts to advanced features - Ensure all examples are grounded in actual codebase implementation * docs: enhance custom LLM documentation with comprehensive examples and accurate imports * docs: reorganize observability tools into dedicated section with comprehensive overview and improved navigation * docs: rename how-to section to learn and add comprehensive overview page * docs: finalize documentation reorganization and update navigation labels * docs: enhance README with comprehensive badges, navigation links, and getting started video
2026-01-20 05:18:16 +00:00 · 2025-05-27 13:08:40 -04:00
parent e4e9bf343a
commit c90272d601
39 changed files with 2241 additions and 1172 deletions
--- a/docs/concepts/knowledge.mdx
+++ b/docs/concepts/knowledge.mdx
--- a/docs/concepts/memory.mdx
+++ b/docs/concepts/memory.mdx
@@ -46,22 +46,96 @@ crew = Crew(
 - **Storage Location**: Platform-specific location via `appdirs` package
 - **Custom Storage Directory**: Set `CREWAI_STORAGE_DIR` environment variable

-### Custom Embedder Configuration
+## Storage Location Transparency
+
+<Info>
+**Understanding Storage Locations**: CrewAI uses platform-specific directories to store memory and knowledge files following OS conventions. Understanding these locations helps with production deployments, backups, and debugging.
+</Info>
+
+### Where CrewAI Stores Files
+
+By default, CrewAI uses the `appdirs` library to determine storage locations following platform conventions. Here's exactly where your files are stored:
+
+#### Default Storage Locations by Platform
+
+**macOS:**
+```
+~/Library/Application Support/CrewAI/{project_name}/
+├── knowledge/           # Knowledge base ChromaDB files
+├── short_term_memory/   # Short-term memory ChromaDB files  
+├── long_term_memory/    # Long-term memory ChromaDB files
+├── entities/            # Entity memory ChromaDB files
+└── long_term_memory_storage.db  # SQLite database
+```
+
+**Linux:**
+```
+~/.local/share/CrewAI/{project_name}/
+├── knowledge/
+├── short_term_memory/
+├── long_term_memory/
+├── entities/
+└── long_term_memory_storage.db
+```
+
+**Windows:**
+```
+C:\Users\{username}\AppData\Local\CrewAI\{project_name}\
+├── knowledge\
+├── short_term_memory\
+├── long_term_memory\
+├── entities\
+└── long_term_memory_storage.db
+```
+
+### Finding Your Storage Location
+
+To see exactly where CrewAI is storing files on your system:
+
 ```python
+from crewai.utilities.paths import db_storage_path
+import os
+
+# Get the base storage path
+storage_path = db_storage_path()
+print(f"CrewAI storage location: {storage_path}")
+
+# List all CrewAI storage directories
+if os.path.exists(storage_path):
+    print("\nStored files and directories:")
+    for item in os.listdir(storage_path):
+        item_path = os.path.join(storage_path, item)
+        if os.path.isdir(item_path):
+            print(f"📁 {item}/")
+            # Show ChromaDB collections
+            if os.path.exists(item_path):
+                for subitem in os.listdir(item_path):
+                    print(f"   └── {subitem}")
+        else:
+            print(f"📄 {item}")
+else:
+    print("No CrewAI storage directory found yet.")
+```
+
+### Controlling Storage Locations
+
+#### Option 1: Environment Variable (Recommended)
+```python
+import os
+from crewai import Crew
+
+# Set custom storage location
+os.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"
+
+# All memory and knowledge will now be stored in ./my_project_storage/
 crew = Crew(
    agents=[...],
    tasks=[...],
-    memory=True,
-    embedder={
-        "provider": "openai",
-        "config": {
-            "model": "text-embedding-3-small"
-        }
-    }
+    memory=True
 )
 ```

-### Custom Storage Paths
+#### Option 2: Custom Storage Paths
 ```python
 import os
 from crewai import Crew
@@ -69,16 +143,547 @@ from crewai.memory import LongTermMemory
 from crewai.memory.storage.ltm_sqlite_storage import LTMSQLiteStorage

 # Configure custom storage location
+custom_storage_path = "./storage"
+os.makedirs(custom_storage_path, exist_ok=True)
+
 crew = Crew(
    memory=True,
    long_term_memory=LongTermMemory(
        storage=LTMSQLiteStorage(
-            db_path=os.getenv("CREWAI_STORAGE_DIR", "./storage") + "/memory.db"
+            db_path=f"{custom_storage_path}/memory.db"
        )
    )
 )
 ```

+#### Option 3: Project-Specific Storage
+```python
+import os
+from pathlib import Path
+
+# Store in project directory
+project_root = Path(__file__).parent
+storage_dir = project_root / "crewai_storage"
+
+os.environ["CREWAI_STORAGE_DIR"] = str(storage_dir)
+
+# Now all storage will be in your project directory
+```
+
+### Embedding Provider Defaults
+
+<Info>
+**Default Embedding Provider**: CrewAI defaults to OpenAI embeddings for consistency and reliability. You can easily customize this to match your LLM provider or use local embeddings.
+</Info>
+
+#### Understanding Default Behavior
+```python
+# When using Claude as your LLM...
+from crewai import Agent, LLM
+
+agent = Agent(
+    role="Analyst",
+    goal="Analyze data",
+    backstory="Expert analyst",
+    llm=LLM(provider="anthropic", model="claude-3-sonnet")  # Using Claude
+)
+
+# CrewAI will use OpenAI embeddings by default for consistency
+# You can easily customize this to match your preferred provider
+```
+
+#### Customizing Embedding Providers
+```python
+from crewai import Crew
+
+# Option 1: Match your LLM provider
+crew = Crew(
+    agents=[agent],
+    tasks=[task],
+    memory=True,
+    embedder={
+        "provider": "anthropic",  # Match your LLM provider
+        "config": {
+            "api_key": "your-anthropic-key",
+            "model": "text-embedding-3-small"
+        }
+    }
+)
+
+# Option 2: Use local embeddings (no external API calls)
+crew = Crew(
+    agents=[agent],
+    tasks=[task],
+    memory=True,
+    embedder={
+        "provider": "ollama",
+        "config": {"model": "mxbai-embed-large"}
+    }
+)
+```
+
+### Debugging Storage Issues
+
+#### Check Storage Permissions
+```python
+import os
+from crewai.utilities.paths import db_storage_path
+
+storage_path = db_storage_path()
+print(f"Storage path: {storage_path}")
+print(f"Path exists: {os.path.exists(storage_path)}")
+print(f"Is writable: {os.access(storage_path, os.W_OK) if os.path.exists(storage_path) else 'Path does not exist'}")
+
+# Create with proper permissions
+if not os.path.exists(storage_path):
+    os.makedirs(storage_path, mode=0o755, exist_ok=True)
+    print(f"Created storage directory: {storage_path}")
+```
+
+#### Inspect ChromaDB Collections
+```python
+import chromadb
+from crewai.utilities.paths import db_storage_path
+
+# Connect to CrewAI's ChromaDB
+storage_path = db_storage_path()
+chroma_path = os.path.join(storage_path, "knowledge")
+
+if os.path.exists(chroma_path):
+    client = chromadb.PersistentClient(path=chroma_path)
+    collections = client.list_collections()
+    
+    print("ChromaDB Collections:")
+    for collection in collections:
+        print(f"  - {collection.name}: {collection.count()} documents")
+else:
+    print("No ChromaDB storage found")
+```
+
+#### Reset Storage (Debugging)
+```python
+from crewai import Crew
+
+# Reset all memory storage
+crew = Crew(agents=[...], tasks=[...], memory=True)
+
+# Reset specific memory types
+crew.reset_memories(command_type='short')     # Short-term memory
+crew.reset_memories(command_type='long')      # Long-term memory  
+crew.reset_memories(command_type='entity')    # Entity memory
+crew.reset_memories(command_type='knowledge') # Knowledge storage
+```
+
+### Production Best Practices
+
+1. **Set `CREWAI_STORAGE_DIR`** to a known location in production for better control
+2. **Choose explicit embedding providers** to match your LLM setup
+3. **Monitor storage directory size** for large-scale deployments
+4. **Include storage directories** in your backup strategy
+5. **Set appropriate file permissions** (0o755 for directories, 0o644 for files)
+6. **Use project-relative paths** for containerized deployments
+
+### Common Storage Issues
+
+**"ChromaDB permission denied" errors:**
+```bash
+# Fix permissions
+chmod -R 755 ~/.local/share/CrewAI/
+```
+
+**"Database is locked" errors:**
+```python
+# Ensure only one CrewAI instance accesses storage
+import fcntl
+import os
+
+storage_path = db_storage_path()
+lock_file = os.path.join(storage_path, ".crewai.lock")
+
+with open(lock_file, 'w') as f:
+    fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
+    # Your CrewAI code here
+```
+
+**Storage not persisting between runs:**
+```python
+# Verify storage location is consistent
+import os
+print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
+print("Current working directory:", os.getcwd())
+print("Computed storage path:", db_storage_path())
+```
+
+## Custom Embedder Configuration
+
+CrewAI supports multiple embedding providers to give you flexibility in choosing the best option for your use case. Here's a comprehensive guide to configuring different embedding providers for your memory system.
+
+### Why Choose Different Embedding Providers?
+
+- **Cost Optimization**: Local embeddings (Ollama) are free after initial setup
+- **Privacy**: Keep your data local with Ollama or use your preferred cloud provider
+- **Performance**: Some models work better for specific domains or languages
+- **Consistency**: Match your embedding provider with your LLM provider
+- **Compliance**: Meet specific regulatory or organizational requirements
+
+### OpenAI Embeddings (Default)
+
+OpenAI provides reliable, high-quality embeddings that work well for most use cases.
+
+```python
+from crewai import Crew
+
+# Basic OpenAI configuration (uses environment OPENAI_API_KEY)
+crew = Crew(
+    agents=[...],
+    tasks=[...],
+    memory=True,
+    embedder={
+        "provider": "openai",
+        "config": {
+            "model": "text-embedding-3-small"  # or "text-embedding-3-large"
+        }
+    }
+)
+
+# Advanced OpenAI configuration
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "openai",
+        "config": {
+            "api_key": "your-openai-api-key",  # Optional: override env var
+            "model": "text-embedding-3-large",
+            "dimensions": 1536,  # Optional: reduce dimensions for smaller storage
+            "organization_id": "your-org-id"  # Optional: for organization accounts
+        }
+    }
+)
+```
+
+### Azure OpenAI Embeddings
+
+For enterprise users with Azure OpenAI deployments.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "openai",  # Use openai provider for Azure
+        "config": {
+            "api_key": "your-azure-api-key",
+            "api_base": "https://your-resource.openai.azure.com/",
+            "api_type": "azure",
+            "api_version": "2023-05-15",
+            "model": "text-embedding-3-small",
+            "deployment_id": "your-deployment-name"  # Azure deployment name
+        }
+    }
+)
+```
+
+### Google AI Embeddings
+
+Use Google's text embedding models for integration with Google Cloud services.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "google",
+        "config": {
+            "api_key": "your-google-api-key",
+            "model": "text-embedding-004"  # or "text-embedding-preview-0409"
+        }
+    }
+)
+```
+
+### Vertex AI Embeddings
+
+For Google Cloud users with Vertex AI access.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "vertexai",
+        "config": {
+            "project_id": "your-gcp-project-id",
+            "region": "us-central1",  # or your preferred region
+            "api_key": "your-service-account-key",
+            "model_name": "textembedding-gecko"
+        }
+    }
+)
+```
+
+### Ollama Embeddings (Local)
+
+Run embeddings locally for privacy and cost savings.
+
+```python
+# First, install and run Ollama locally, then pull an embedding model:
+# ollama pull mxbai-embed-large
+
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "ollama",
+        "config": {
+            "model": "mxbai-embed-large",  # or "nomic-embed-text"
+            "url": "http://localhost:11434/api/embeddings"  # Default Ollama URL
+        }
+    }
+)
+
+# For custom Ollama installations
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "ollama",
+        "config": {
+            "model": "mxbai-embed-large",
+            "url": "http://your-ollama-server:11434/api/embeddings"
+        }
+    }
+)
+```
+
+### Cohere Embeddings
+
+Use Cohere's embedding models for multilingual support.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "cohere",
+        "config": {
+            "api_key": "your-cohere-api-key",
+            "model": "embed-english-v3.0"  # or "embed-multilingual-v3.0"
+        }
+    }
+)
+```
+
+### VoyageAI Embeddings
+
+High-performance embeddings optimized for retrieval tasks.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "voyageai",
+        "config": {
+            "api_key": "your-voyage-api-key",
+            "model": "voyage-large-2",  # or "voyage-code-2" for code
+            "input_type": "document"  # or "query"
+        }
+    }
+)
+```
+
+### AWS Bedrock Embeddings
+
+For AWS users with Bedrock access.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "bedrock",
+        "config": {
+            "aws_access_key_id": "your-access-key",
+            "aws_secret_access_key": "your-secret-key",
+            "region_name": "us-east-1",
+            "model": "amazon.titan-embed-text-v1"
+        }
+    }
+)
+```
+
+### Hugging Face Embeddings
+
+Use open-source models from Hugging Face.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "huggingface",
+        "config": {
+            "api_key": "your-hf-token",  # Optional for public models
+            "model": "sentence-transformers/all-MiniLM-L6-v2",
+            "api_url": "https://api-inference.huggingface.co"  # or your custom endpoint
+        }
+    }
+)
+```
+
+### IBM Watson Embeddings
+
+For IBM Cloud users.
+
+```python
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "watson",
+        "config": {
+            "api_key": "your-watson-api-key",
+            "url": "your-watson-instance-url",
+            "model": "ibm/slate-125m-english-rtrvr"
+        }
+    }
+)
+```
+
+### Choosing the Right Embedding Provider
+
+| Provider | Best For | Pros | Cons |
+|:---------|:----------|:------|:------|
+| **OpenAI** | General use, reliability | High quality, well-tested | Cost, requires API key |
+| **Ollama** | Privacy, cost savings | Free, local, private | Requires local setup |
+| **Google AI** | Google ecosystem | Good performance | Requires Google account |
+| **Azure OpenAI** | Enterprise, compliance | Enterprise features | Complex setup |
+| **Cohere** | Multilingual content | Great language support | Specialized use case |
+| **VoyageAI** | Retrieval tasks | Optimized for search | Newer provider |
+
+### Environment Variable Configuration
+
+For security, store API keys in environment variables:
+
+```python
+import os
+
+# Set environment variables
+os.environ["OPENAI_API_KEY"] = "your-openai-key"
+os.environ["GOOGLE_API_KEY"] = "your-google-key"
+os.environ["COHERE_API_KEY"] = "your-cohere-key"
+
+# Use without exposing keys in code
+crew = Crew(
+    memory=True,
+    embedder={
+        "provider": "openai",
+        "config": {
+            "model": "text-embedding-3-small"
+            # API key automatically loaded from environment
+        }
+    }
+)
+```
+
+### Testing Different Embedding Providers
+
+Compare embedding providers for your specific use case:
+
+```python
+from crewai import Crew
+from crewai.utilities.paths import db_storage_path
+
+# Test different providers with the same data
+providers_to_test = [
+    {
+        "name": "OpenAI",
+        "config": {
+            "provider": "openai",
+            "config": {"model": "text-embedding-3-small"}
+        }
+    },
+    {
+        "name": "Ollama",
+        "config": {
+            "provider": "ollama", 
+            "config": {"model": "mxbai-embed-large"}
+        }
+    }
+]
+
+for provider in providers_to_test:
+    print(f"\nTesting {provider['name']} embeddings...")
+    
+    # Create crew with specific embedder
+    crew = Crew(
+        agents=[...],
+        tasks=[...],
+        memory=True,
+        embedder=provider['config']
+    )
+    
+    # Run your test and measure performance
+    result = crew.kickoff()
+    print(f"{provider['name']} completed successfully")
+```
+
+### Troubleshooting Embedding Issues
+
+**Model not found errors:**
+```python
+# Verify model availability
+from crewai.utilities.embedding_configurator import EmbeddingConfigurator
+
+configurator = EmbeddingConfigurator()
+try:
+    embedder = configurator.configure_embedder({
+        "provider": "ollama",
+        "config": {"model": "mxbai-embed-large"}
+    })
+    print("Embedder configured successfully")
+except Exception as e:
+    print(f"Configuration error: {e}")
+```
+
+**API key issues:**
+```python
+import os
+
+# Check if API keys are set
+required_keys = ["OPENAI_API_KEY", "GOOGLE_API_KEY", "COHERE_API_KEY"]
+for key in required_keys:
+    if os.getenv(key):
+        print(f"✅ {key} is set")
+    else:
+        print(f"❌ {key} is not set")
+```
+
+**Performance comparison:**
+```python
+import time
+
+def test_embedding_performance(embedder_config, test_text="This is a test document"):
+    start_time = time.time()
+    
+    crew = Crew(
+        agents=[...],
+        tasks=[...],
+        memory=True,
+        embedder=embedder_config
+    )
+    
+    # Simulate memory operation
+    crew.kickoff()
+    
+    end_time = time.time()
+    return end_time - start_time
+
+# Compare performance
+openai_time = test_embedding_performance({
+    "provider": "openai",
+    "config": {"model": "text-embedding-3-small"}
+})
+
+ollama_time = test_embedding_performance({
+    "provider": "ollama", 
+    "config": {"model": "mxbai-embed-large"}
+})
+
+print(f"OpenAI: {openai_time:.2f}s")
+print(f"Ollama: {ollama_time:.2f}s")
+```
+
 ## 2. User Memory with Mem0 (Legacy)

 <Warning>