Files
crewAI/docs/edge/en/enterprise/features/hallucination-guardrail.mdx
Lucas Gomide a237ebabba feat: adopt directory-based docs versioning with Edge channel (#6202)
* feat: adopt directory-based docs versioning with Edge channel

Switch docs.crewai.com from navigation-only versioning (every version
selector entry rendered the same docs/<lang>/* source files) to
Mintlify's directory-based versioning so each version selector entry
renders its own snapshot. Add an "Edge" channel under docs/edge/<lang>/*
that always reflects main HEAD for unreleased work, eliminating
pre-release leakage onto frozen release labels. External links to
canonical /<lang>/* URLs are preserved via wildcard redirects that
always land on the current default version.

Layout:
- docs/edge/<lang>/*         rolling source (you edit here)
- docs/edge/enterprise-api.*.yaml
- docs/v<X.Y.Z>/<lang>/*     frozen, immutable snapshots
- docs/v<X.Y.Z>/enterprise-api.*.yaml
- docs/images/               shared, append-only
- docs/docs.json             nav + redirects

URLs follow the Mintlify-idiomatic shape: /edge/<lang>/<page> for
Edge, /v<X.Y.Z>/<lang>/<page> for every frozen snapshot. The wildcard
redirects /<lang>/:slug* -> /<default>/<lang>/:slug* keep stale links
working, and every freeze rewrites them (plus all per-section/per-page
redirects) so destinations always resolve to the current default
without depending on a second redirect hop.

Release flow integration (devtools release):
- New module crewai_devtools.docs_versioning.freeze() materialises
  docs/v<X.Y.Z>/ from docs/edge/, rewrites openapi: refs inside the
  snapshot, inserts the version into every language block in
  docs.json, and refreshes all redirect destinations.
- _update_docs_and_create_pr() in cli.py now calls that freeze during
  Phase 2 of devtools release. Edge changelogs are updated first (so
  the snapshot freeze picks them up), then the snapshot is staged
  alongside docs.json, branched as docs/freeze-v<X.Y.Z>, and the PR
  is titled [docs-freeze] docs: snapshot and changelog for v<X.Y.Z>
  — the title prefix the new CI guard reads.
- The PR still gates tag, GitHub release, PyPI publish, and the
  enterprise release as before; no new PRs are added.
- Pre-releases (1.X.YaN, 1.X.YbN, ...) skip the snapshot — they ride
  Edge — and the docs PR title omits the [docs-freeze] prefix.
- docs_check (AI-generated docs scaffolding) writes to
  docs/edge/<lang>/* so newly-generated unreleased docs land in Edge
  and never accidentally touch a frozen snapshot.

Migration scripts (one-shot):
- scripts/docs/freeze_historical_versions.py reconstructs all 16
  historical snapshots (v1.10.0 .. v1.14.7) from git tags via
  git archive | tar, rewriting openapi: MDX refs so each snapshot
  reads its own enterprise-api YAML rather than the live one.
- scripts/docs/prefix_version_paths.py one-shot-migrates docs.json:
  rewrites every page path in 16 versioned blocks to point under
  docs/v<X.Y.Z>/, inserts a new Edge entry per language, tags
  v1.14.7 as Latest (default), prunes pages whose target file
  doesn't exist in the snapshot (e.g. docs/ar/ didn't exist before
  v1.12.0), and writes the wildcard + per-section redirects.
- scripts/docs/freeze_current_edge.py is now a thin CLI wrapper
  around docs_versioning.freeze for manual one-off freezes (e.g.
  retroactively snapshotting a forgotten release).

CI guards (.github/workflows/docs-snapshots.yml):
- Frozen snapshots under docs/v[0-9]*/ are immutable; only PRs whose
  title contains [docs-freeze] (i.e. release-cut PRs generated by
  devtools release or the manual wrapper) may modify them.
- Images under docs/images/ are append-only since snapshots share a
  single image directory. Deleting or renaming an image breaks every
  historical snapshot that still references it.

Restored docs/images/crewai-otel-export.png from PR #3673; it was
deleted in PR #4908 but v1.10.0 / v1.10.1 snapshots still reference
it. Restoring instead of editing the snapshots preserves historical
rendering fidelity and validates the new append-only rule
retroactively.

Tests:
- lib/devtools/tests/test_docs_versioning.py covers the freeze: file
  copy, openapi rewrite, version insertion, default demotion, redirect
  upserts, per-section redirect rewriting, idempotency, and invalid
  inputs.

Verified locally with mintlify broken-links: 0 broken links across
the full site (Edge + 16 frozen versions, 4 locales).

AGENTS.md (repo root) is the contributor guide for the new model;
RELEASING.md is the release-cut runbook; README's Contribution
section links to both.

Co-authored-by: Cursor <cursoragent@cursor.com>

* style: resolve linter issues

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-17 11:56:59 -04:00

252 lines
7.9 KiB
Plaintext

---
title: Hallucination Guardrail
description: "Prevent and detect AI hallucinations in your CrewAI tasks"
icon: "shield-check"
mode: "wide"
---
## Overview
The Hallucination Guardrail is an enterprise feature that validates AI-generated content to ensure it's grounded in facts and doesn't contain hallucinations. It analyzes task outputs against reference context and provides detailed feedback when potentially hallucinated content is detected.
## What are Hallucinations?
AI hallucinations occur when language models generate content that appears plausible but is factually incorrect or not supported by the provided context. The Hallucination Guardrail helps prevent these issues by:
- Comparing outputs against reference context
- Evaluating faithfulness to source material
- Providing detailed feedback on problematic content
- Supporting custom thresholds for validation strictness
## Basic Usage
### Setting Up the Guardrail
```python
from crewai.tasks.hallucination_guardrail import HallucinationGuardrail
from crewai import LLM
# Basic usage - will use task's expected_output as context
guardrail = HallucinationGuardrail(
llm=LLM(model="gpt-4o-mini")
)
# With explicit reference context
context_guardrail = HallucinationGuardrail(
context="AI helps with various tasks including analysis and generation.",
llm=LLM(model="gpt-4o-mini")
)
```
### Adding to Tasks
```python
from crewai import Task
# Create your task with the guardrail
task = Task(
description="Write a summary about AI capabilities",
expected_output="A factual summary based on the provided context",
agent=my_agent,
guardrail=guardrail # Add the guardrail to validate output
)
```
## Advanced Configuration
### Custom Threshold Validation
For stricter validation, you can set a custom faithfulness threshold (0-10 scale):
```python
# Strict guardrail requiring high faithfulness score
strict_guardrail = HallucinationGuardrail(
context="Quantum computing uses qubits that exist in superposition states.",
llm=LLM(model="gpt-4o-mini"),
threshold=8.0 # Requires score >= 8 to pass validation
)
```
### Including Tool Response Context
When your task uses tools, you can include tool responses for more accurate validation:
```python
# Guardrail with tool response context
weather_guardrail = HallucinationGuardrail(
context="Current weather information for the requested location",
llm=LLM(model="gpt-4o-mini"),
tool_response="Weather API returned: Temperature 22°C, Humidity 65%, Clear skies"
)
```
## How It Works
### Validation Process
1. **Context Analysis**: The guardrail compares task output against the provided reference context
2. **Faithfulness Scoring**: Uses an internal evaluator to assign a faithfulness score (0-10)
3. **Verdict Determination**: Determines if content is faithful or contains hallucinations
4. **Threshold Checking**: If a custom threshold is set, validates against that score
5. **Feedback Generation**: Provides detailed reasons when validation fails
### Validation Logic
- **Default Mode**: Uses verdict-based validation (FAITHFUL vs HALLUCINATED)
- **Threshold Mode**: Requires faithfulness score to meet or exceed the specified threshold
- **Error Handling**: Gracefully handles evaluation errors and provides informative feedback
## Guardrail Results
The guardrail returns structured results indicating validation status:
```python
# Example of guardrail result structure
{
"valid": False,
"feedback": "Content appears to be hallucinated (score: 4.2/10, verdict: HALLUCINATED). The output contains information not supported by the provided context."
}
```
### Result Properties
- **valid**: Boolean indicating whether the output passed validation
- **feedback**: Detailed explanation when validation fails, including:
- Faithfulness score
- Verdict classification
- Specific reasons for failure
## Integration with Task System
### Automatic Validation
When a guardrail is added to a task, it automatically validates the output before the task is marked as complete:
```python
# Task output validation flow
task_output = agent.execute_task(task)
validation_result = guardrail(task_output)
if validation_result.valid:
# Task completes successfully
return task_output
else:
# Task fails with validation feedback
raise ValidationError(validation_result.feedback)
```
### Event Tracking
The guardrail integrates with CrewAI's event system to provide observability:
- **Validation Started**: When guardrail evaluation begins
- **Validation Completed**: When evaluation finishes with results
- **Validation Failed**: When technical errors occur during evaluation
## Best Practices
### Context Guidelines
<Steps>
<Step title="Provide Comprehensive Context">
Include all relevant factual information that the AI should base its output on:
```python
context = """
Company XYZ was founded in 2020 and specializes in renewable energy solutions.
They have 150 employees and generated $50M revenue in 2023.
Their main products include solar panels and wind turbines.
"""
```
</Step>
<Step title="Keep Context Relevant">
Only include information directly related to the task to avoid confusion:
```python
# Good: Focused context
context = "The current weather in New York is 18°C with light rain."
# Avoid: Unrelated information
context = "The weather is 18°C. The city has 8 million people. Traffic is heavy."
```
</Step>
<Step title="Update Context Regularly">
Ensure your reference context reflects current, accurate information.
</Step>
</Steps>
### Threshold Selection
<Steps>
<Step title="Start with Default Validation">
Begin without custom thresholds to understand baseline performance.
</Step>
<Step title="Adjust Based on Requirements">
- **High-stakes content**: Use threshold 8-10 for maximum accuracy
- **General content**: Use threshold 6-7 for balanced validation
- **Creative content**: Use threshold 4-5 or default verdict-based validation
</Step>
<Step title="Monitor and Iterate">
Track validation results and adjust thresholds based on false positives/negatives.
</Step>
</Steps>
## Performance Considerations
### Impact on Execution Time
- **Validation Overhead**: Each guardrail adds ~1-3 seconds per task
- **LLM Efficiency**: Choose efficient models for evaluation (e.g., gpt-4o-mini)
### Cost Optimization
- **Model Selection**: Use smaller, efficient models for guardrail evaluation
- **Context Size**: Keep reference context concise but comprehensive
- **Caching**: Consider caching validation results for repeated content
## Troubleshooting
<Accordion title="Validation Always Fails">
**Possible Causes:**
- Context is too restrictive or unrelated to task output
- Threshold is set too high for the content type
- Reference context contains outdated information
**Solutions:**
- Review and update context to match task requirements
- Lower threshold or use default verdict-based validation
- Ensure context is current and accurate
</Accordion>
<Accordion title="False Positives (Valid Content Marked Invalid)">
**Possible Causes:**
- Threshold too high for creative or interpretive tasks
- Context doesn't cover all valid aspects of the output
- Evaluation model being overly conservative
**Solutions:**
- Lower threshold or use default validation
- Expand context to include broader acceptable content
- Test with different evaluation models
</Accordion>
<Accordion title="Evaluation Errors">
**Possible Causes:**
- Network connectivity issues
- LLM model unavailable or rate limited
- Malformed task output or context
**Solutions:**
- Check network connectivity and LLM service status
- Implement retry logic for transient failures
- Validate task output format before guardrail evaluation
</Accordion>
<Card title="Need Help?" icon="headset" href="mailto:support@crewai.com">
Contact our support team for assistance with hallucination guardrail configuration or troubleshooting.
</Card>