mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-04-30 23:02:50 +00:00
New Memory Improvements (#4484)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
* better DevEx * Refactor: Update supported native providers and enhance memory handling - Removed "groq" and "meta" from the list of supported native providers in `llm.py`. - Added a safeguard in `flow.py` to ensure all background memory saves complete before returning. - Improved error handling in `unified_memory.py` to prevent exceptions during shutdown, ensuring smoother memory operations and event bus interactions. * Enhance Memory System with Consolidation and Learning Features - Introduced memory consolidation mechanisms to prevent duplicate records during content saving, utilizing similarity checks and LLM decision-making. - Implemented non-blocking save operations in the memory system, allowing agents to continue tasks while memory is being saved. - Added support for learning from human feedback, enabling the system to distill lessons from past corrections and improve future outputs. - Updated documentation to reflect new features and usage examples for memory consolidation and HITL learning. * Enhance cyclic flow handling for or_() listeners - Updated the Flow class to ensure that all fired or_() listeners are cleared between cycle iterations, allowing them to fire again in subsequent cycles. This change addresses a bug where listeners remained suppressed across iterations. - Added regression tests to verify that or_() listeners fire correctly on every iteration in cyclic flows, ensuring expected behavior in complex routing scenarios.
This commit is contained in:
@@ -380,22 +380,124 @@ Memory uses the LLM in three ways:
|
||||
All analysis degrades gracefully on LLM failure -- see [Failure Behavior](#failure-behavior).
|
||||
|
||||
|
||||
## RecallFlow (Deep Recall)
|
||||
## Memory Consolidation
|
||||
|
||||
`recall()` supports three depths:
|
||||
When saving new content, the encoding pipeline automatically checks for similar existing records in storage. If the similarity is above `consolidation_threshold` (default 0.85), the LLM decides what to do:
|
||||
|
||||
- **`depth="shallow"`** -- Direct vector search with composite scoring. Fast; used by default when agents load context.
|
||||
- **`depth="deep"` or `depth="auto"`** -- Runs a multi-step RecallFlow: query analysis, scope selection, vector search, confidence-based routing, and optional recursive exploration when confidence is low.
|
||||
- **keep** -- The existing record is still accurate and not redundant.
|
||||
- **update** -- The existing record should be updated with new information (LLM provides the merged content).
|
||||
- **delete** -- The existing record is outdated, superseded, or contradicted.
|
||||
- **insert_new** -- Whether the new content should also be inserted as a separate record.
|
||||
|
||||
This prevents duplicates from accumulating. For example, if you save "CrewAI ensures reliable operation" three times, consolidation recognizes the duplicates and keeps only one record.
|
||||
|
||||
### Intra-batch Dedup
|
||||
|
||||
When using `remember_many()`, items within the same batch are compared against each other before hitting storage. If two items have cosine similarity >= `batch_dedup_threshold` (default 0.98), the later one is silently dropped. This catches exact or near-exact duplicates within a single batch without any LLM calls (pure vector math).
|
||||
|
||||
```python
|
||||
# Fast path (default for agent task context)
|
||||
# Only 2 records are stored (the third is a near-duplicate of the first)
|
||||
memory.remember_many([
|
||||
"CrewAI supports complex workflows.",
|
||||
"Python is a great language.",
|
||||
"CrewAI supports complex workflows.", # dropped by intra-batch dedup
|
||||
])
|
||||
```
|
||||
|
||||
|
||||
## Non-blocking Saves
|
||||
|
||||
`remember_many()` is **non-blocking** -- it submits the encoding pipeline to a background thread and returns immediately. This means the agent can continue to the next task while memories are being saved.
|
||||
|
||||
```python
|
||||
# Returns immediately -- save happens in background
|
||||
memory.remember_many(["Fact A.", "Fact B.", "Fact C."])
|
||||
|
||||
# recall() automatically waits for pending saves before searching
|
||||
matches = memory.recall("facts") # sees all 3 records
|
||||
```
|
||||
|
||||
### Read Barrier
|
||||
|
||||
Every `recall()` call automatically calls `drain_writes()` before searching, ensuring the query always sees the latest persisted records. This is transparent -- you never need to think about it.
|
||||
|
||||
### Crew Shutdown
|
||||
|
||||
When a crew finishes, `kickoff()` drains all pending memory saves in its `finally` block, so no saves are lost even if the crew completes while background saves are in flight.
|
||||
|
||||
### Standalone Usage
|
||||
|
||||
For scripts or notebooks where there's no crew lifecycle, call `drain_writes()` or `close()` explicitly:
|
||||
|
||||
```python
|
||||
memory = Memory()
|
||||
memory.remember_many(["Fact A.", "Fact B."])
|
||||
|
||||
# Option 1: Wait for pending saves
|
||||
memory.drain_writes()
|
||||
|
||||
# Option 2: Drain and shut down the background pool
|
||||
memory.close()
|
||||
```
|
||||
|
||||
|
||||
## Source and Privacy
|
||||
|
||||
Every memory record can carry a `source` tag for provenance tracking and a `private` flag for access control.
|
||||
|
||||
### Source Tracking
|
||||
|
||||
The `source` parameter identifies where a memory came from:
|
||||
|
||||
```python
|
||||
# Tag memories with their origin
|
||||
memory.remember("User prefers dark mode", source="user:alice")
|
||||
memory.remember("System config updated", source="admin")
|
||||
memory.remember("Agent found a bug", source="agent:debugger")
|
||||
|
||||
# Recall only memories from a specific source
|
||||
matches = memory.recall("user preferences", source="user:alice")
|
||||
```
|
||||
|
||||
### Private Memories
|
||||
|
||||
Private memories are only visible to recall when the `source` matches:
|
||||
|
||||
```python
|
||||
# Store a private memory
|
||||
memory.remember("Alice's API key is sk-...", source="user:alice", private=True)
|
||||
|
||||
# This recall sees the private memory (source matches)
|
||||
matches = memory.recall("API key", source="user:alice")
|
||||
|
||||
# This recall does NOT see it (different source)
|
||||
matches = memory.recall("API key", source="user:bob")
|
||||
|
||||
# Admin access: see all private records regardless of source
|
||||
matches = memory.recall("API key", include_private=True)
|
||||
```
|
||||
|
||||
This is particularly useful in multi-user or enterprise deployments where different users' memories should be isolated.
|
||||
|
||||
|
||||
## RecallFlow (Deep Recall)
|
||||
|
||||
`recall()` supports two depths:
|
||||
|
||||
- **`depth="shallow"`** -- Direct vector search with composite scoring. Fast (~200ms), no LLM calls.
|
||||
- **`depth="deep"` (default)** -- Runs a multi-step RecallFlow: query analysis, scope selection, parallel vector search, confidence-based routing, and optional recursive exploration when confidence is low.
|
||||
|
||||
**Smart LLM skip**: Queries shorter than `query_analysis_threshold` (default 200 characters) skip the LLM query analysis entirely, even in deep mode. Short queries like "What database do we use?" are already good search phrases -- the LLM analysis adds little value. This saves ~1-3s per recall for typical short queries. Only longer queries (e.g. full task descriptions) go through LLM distillation into targeted sub-queries.
|
||||
|
||||
```python
|
||||
# Shallow: pure vector search, no LLM
|
||||
matches = memory.recall("What did we decide?", limit=10, depth="shallow")
|
||||
|
||||
# Intelligent path for complex questions
|
||||
# Deep (default): intelligent retrieval with LLM analysis for long queries
|
||||
matches = memory.recall(
|
||||
"Summarize all architecture decisions from this quarter",
|
||||
limit=10,
|
||||
depth="auto",
|
||||
depth="deep",
|
||||
)
|
||||
```
|
||||
|
||||
@@ -406,6 +508,7 @@ memory = Memory(
|
||||
confidence_threshold_high=0.9, # Only synthesize when very confident
|
||||
confidence_threshold_low=0.4, # Explore deeper more aggressively
|
||||
exploration_budget=2, # Allow up to 2 exploration rounds
|
||||
query_analysis_threshold=200, # Skip LLM for queries shorter than this
|
||||
)
|
||||
```
|
||||
|
||||
@@ -613,6 +716,45 @@ memory = Memory(embedder=my_embedder)
|
||||
| Custom | `custom` | -- | Requires `embedding_callable`. |
|
||||
|
||||
|
||||
## LLM Configuration
|
||||
|
||||
Memory uses an LLM for save analysis (scope, categories, importance inference), consolidation decisions, and deep recall query analysis. You can configure which model to use.
|
||||
|
||||
```python
|
||||
from crewai import Memory, LLM
|
||||
|
||||
# Default: gpt-4o-mini
|
||||
memory = Memory()
|
||||
|
||||
# Use a different OpenAI model
|
||||
memory = Memory(llm="gpt-4o")
|
||||
|
||||
# Use Anthropic
|
||||
memory = Memory(llm="anthropic/claude-3-haiku-20240307")
|
||||
|
||||
# Use Ollama for fully local/private analysis
|
||||
memory = Memory(llm="ollama/llama3.2")
|
||||
|
||||
# Use Google Gemini
|
||||
memory = Memory(llm="gemini/gemini-2.0-flash")
|
||||
|
||||
# Pass a pre-configured LLM instance with custom settings
|
||||
llm = LLM(model="gpt-4o", temperature=0)
|
||||
memory = Memory(llm=llm)
|
||||
```
|
||||
|
||||
The LLM is initialized **lazily** -- it's only created when first needed. This means `Memory()` never fails at construction time, even if API keys aren't set. Errors only surface when the LLM is actually called (e.g. when saving without explicit scope/categories, or during deep recall).
|
||||
|
||||
For fully offline/private operation, use a local model for both the LLM and embedder:
|
||||
|
||||
```python
|
||||
memory = Memory(
|
||||
llm="ollama/llama3.2",
|
||||
embedder={"provider": "ollama", "config": {"model_name": "mxbai-embed-large"}},
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
## Storage Backend
|
||||
|
||||
- **Default**: LanceDB, stored under `./.crewai/memory` (or `$CREWAI_STORAGE_DIR/memory` if the env var is set, or the path you pass as `storage="path/to/dir"`).
|
||||
@@ -685,11 +827,18 @@ class MemoryMonitor(BaseEventListener):
|
||||
- When using a crew, confirm `memory=True` or `memory=Memory(...)` is set.
|
||||
|
||||
**Slow recall?**
|
||||
- Use `depth="shallow"` for routine agent context. Reserve `depth="auto"` or `"deep"` for complex queries.
|
||||
- Use `depth="shallow"` for routine agent context. Reserve `depth="deep"` for complex queries.
|
||||
- Increase `query_analysis_threshold` to skip LLM analysis for more queries.
|
||||
|
||||
**LLM analysis errors in logs?**
|
||||
- Memory still saves/recalls with safe defaults. Check API keys, rate limits, and model availability if you want full LLM analysis.
|
||||
|
||||
**Background save errors in logs?**
|
||||
- Memory saves run in a background thread. Errors are emitted as `MemorySaveFailedEvent` but don't crash the agent. Check logs for the root cause (usually LLM or embedder connection issues).
|
||||
|
||||
**Concurrent write conflicts?**
|
||||
- LanceDB operations are serialized with a shared lock and retried automatically on conflict. This handles multiple `Memory` instances pointing at the same database (e.g. agent memory + crew memory). No action needed.
|
||||
|
||||
**Browse memory from the terminal:**
|
||||
```bash
|
||||
crewai memory # Opens the TUI browser
|
||||
@@ -721,7 +870,9 @@ All configuration is passed as keyword arguments to `Memory(...)`. Every paramet
|
||||
| `consolidation_threshold` | `0.85` | Similarity above which consolidation is triggered on save. Set to `1.0` to disable. |
|
||||
| `consolidation_limit` | `5` | Max existing records to compare during consolidation. |
|
||||
| `default_importance` | `0.5` | Importance assigned when not provided and LLM analysis is skipped. |
|
||||
| `batch_dedup_threshold` | `0.98` | Cosine similarity for dropping near-duplicates within a `remember_many()` batch. |
|
||||
| `confidence_threshold_high` | `0.8` | Recall confidence above which results are returned directly. |
|
||||
| `confidence_threshold_low` | `0.5` | Recall confidence below which deeper exploration is triggered. |
|
||||
| `complex_query_threshold` | `0.7` | For complex queries, explore deeper below this confidence. |
|
||||
| `exploration_budget` | `1` | Number of LLM-driven exploration rounds during deep recall. |
|
||||
| `query_analysis_threshold` | `200` | Queries shorter than this (in characters) skip LLM analysis during deep recall. |
|
||||
|
||||
@@ -73,6 +73,8 @@ When this flow runs, it will:
|
||||
| `default_outcome` | `str` | No | Outcome to use if no feedback provided. Must be in `emit` |
|
||||
| `metadata` | `dict` | No | Additional data for enterprise integrations |
|
||||
| `provider` | `HumanFeedbackProvider` | No | Custom provider for async/non-blocking feedback. See [Async Human Feedback](#async-human-feedback-non-blocking) |
|
||||
| `learn` | `bool` | No | Enable HITL learning: distill lessons from feedback and pre-review future output. Default `False`. See [Learning from Feedback](#learning-from-feedback) |
|
||||
| `learn_limit` | `int` | No | Max past lessons to recall for pre-review. Default `5` |
|
||||
|
||||
### Basic Usage (No Routing)
|
||||
|
||||
@@ -576,6 +578,64 @@ If you're using an async web framework (FastAPI, aiohttp, Slack Bolt async mode)
|
||||
5. **Automatic persistence**: State is automatically saved when `HumanFeedbackPending` is raised and uses `SQLiteFlowPersistence` by default
|
||||
6. **Custom persistence**: Pass a custom persistence instance to `from_pending()` if needed
|
||||
|
||||
## Learning from Feedback
|
||||
|
||||
The `learn=True` parameter enables a feedback loop between human reviewers and the memory system. When enabled, the system progressively improves its outputs by learning from past human corrections.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **After feedback**: The LLM extracts generalizable lessons from the output + feedback and stores them in memory with `source="hitl"`. If the feedback is just approval (e.g. "looks good"), nothing is stored.
|
||||
2. **Before next review**: Past HITL lessons are recalled from memory and applied by the LLM to improve the output before the human sees it.
|
||||
|
||||
Over time, the human sees progressively better pre-reviewed output because each correction informs future reviews.
|
||||
|
||||
### Example
|
||||
|
||||
```python Code
|
||||
class ArticleReviewFlow(Flow):
|
||||
@start()
|
||||
@human_feedback(
|
||||
message="Review this article draft:",
|
||||
emit=["approved", "needs_revision"],
|
||||
llm="gpt-4o-mini",
|
||||
learn=True, # enable HITL learning
|
||||
)
|
||||
def generate_article(self):
|
||||
return self.crew.kickoff(inputs={"topic": "AI Safety"}).raw
|
||||
|
||||
@listen("approved")
|
||||
def publish(self):
|
||||
print(f"Publishing: {self.last_human_feedback.output}")
|
||||
|
||||
@listen("needs_revision")
|
||||
def revise(self):
|
||||
print("Revising based on feedback...")
|
||||
```
|
||||
|
||||
**First run**: The human sees the raw output and says "Always include citations for factual claims." The lesson is distilled and stored in memory.
|
||||
|
||||
**Second run**: The system recalls the citation lesson, pre-reviews the output to add citations, then shows the improved version. The human's job shifts from "fix everything" to "catch what the system missed."
|
||||
|
||||
### Configuration
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `learn` | `False` | Enable HITL learning |
|
||||
| `learn_limit` | `5` | Max past lessons to recall for pre-review |
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
- **Same LLM for everything**: The `llm` parameter on the decorator is shared by outcome collapsing, lesson distillation, and pre-review. No need to configure multiple models.
|
||||
- **Structured output**: Both distillation and pre-review use function calling with Pydantic models when the LLM supports it, falling back to text parsing otherwise.
|
||||
- **Non-blocking storage**: Lessons are stored via `remember_many()` which runs in a background thread -- the flow continues immediately.
|
||||
- **Graceful degradation**: If the LLM fails during distillation, nothing is stored. If it fails during pre-review, the raw output is shown. Neither failure blocks the flow.
|
||||
- **No scope/categories needed**: When storing lessons, only `source` is passed. The encoding pipeline infers scope, categories, and importance automatically.
|
||||
|
||||
<Note>
|
||||
`learn=True` requires the Flow to have memory available. Flows get memory automatically by default, but if you've disabled it with `_skip_auto_memory`, HITL learning will be silently skipped.
|
||||
</Note>
|
||||
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Flows Overview](/en/concepts/flows) - Learn about CrewAI Flows
|
||||
@@ -583,3 +643,4 @@ If you're using an async web framework (FastAPI, aiohttp, Slack Bolt async mode)
|
||||
- [Flow Persistence](/en/concepts/flows#persistence) - Persisting flow state
|
||||
- [Routing with @router](/en/concepts/flows#router) - More about conditional routing
|
||||
- [Human Input on Execution](/en/learn/human-input-on-execution) - Task-level human input
|
||||
- [Memory](/en/concepts/memory) - The unified memory system used by HITL learning
|
||||
|
||||
Reference in New Issue
Block a user