Compare commits

...

1 Commits

Author SHA1 Message Date
Devin AI
27f33b201d feat: add Confident AI observability integration documentation
- Add comprehensive Confident AI integration guide
- Include setup instructions, code examples, and best practices
- Update observability overview to include Confident AI card
- Follow existing documentation patterns and structure

Resolves #3383

Co-Authored-By: João <joao@crewai.com>
2025-08-22 06:06:09 +00:00
2 changed files with 141 additions and 0 deletions

View File

@@ -0,0 +1,137 @@
---
title: Confident AI Integration
description: Monitor and evaluate your CrewAI agents with Confident AI's comprehensive evaluation platform powered by DeepEval.
icon: shield-check
---
# Confident AI Overview
[Confident AI](https://confident-ai.com) is a comprehensive evaluation platform for LLM applications, powered by [DeepEval](https://github.com/confident-ai/deepeval). It provides advanced monitoring, evaluation, and optimization capabilities specifically designed for AI agent workflows.
Confident AI offers both tracing capabilities to monitor your agents in real-time and evaluation tools to assess the quality, safety, and performance of your CrewAI applications.
### Features
- **Real-time Monitoring**: Track agent interactions, task execution, and performance metrics
- **Comprehensive Evaluation**: Assess output quality, relevance, safety, and consistency
- **Cost Tracking**: Monitor LLM API usage and associated costs across your crews
- **Safety & Compliance**: Detect potential issues like bias, toxicity, and PII leaks
- **Performance Analytics**: Analyze execution times, success rates, and bottlenecks
- **Custom Metrics**: Define and track domain-specific evaluation criteria
- **Team Collaboration**: Share insights and collaborate on agent optimization
## Setup Instructions
<Steps>
<Step title="Install Dependencies">
```shell
pip install deepeval crewai
```
</Step>
<Step title="Get API Key">
1. Sign up at [Confident AI](https://confident-ai.com)
2. Navigate to your project settings
3. Copy your API key
</Step>
<Step title="Configure CrewAI">
Instrument CrewAI with your Confident API key using `instrument_crewai`:
```python
from crewai import Task, Crew, Agent
from deepeval.integrations.crewai import instrument_crewai
instrument_crewai()
agent = Agent(
role="Consultant",
goal="Write clear, concise explanation.",
backstory="An expert consultant with a keen eye for software trends.",
)
task = Task(
description="Explain the importance of {topic}",
expected_output="A clear and concise explanation of the topic.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff(inputs={"topic": "AI"})
```
</Step>
<Step title="Add Evaluation (Optional)">
For comprehensive evaluation of your crew's outputs:
```python
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
# Define evaluation metrics
relevancy_metric = AnswerRelevancyMetric(threshold=0.7)
faithfulness_metric = FaithfulnessMetric(threshold=0.8)
# Execute crew
result = crew.kickoff(inputs={"topic": "artificial intelligence"})
# Create test case for evaluation
test_case = LLMTestCase(
input="Explain the importance of artificial intelligence",
actual_output=str(result),
expected_output="A comprehensive explanation of AI's significance"
)
# Evaluate the output
evaluate([test_case], [relevancy_metric, faithfulness_metric])
```
</Step>
<Step title="View Results">
After running your CrewAI application with Confident AI integration:
1. Visit your [Confident AI dashboard](https://confident-ai.com/dashboard)
2. Navigate to your project to view traces and evaluations
3. Analyze agent performance, costs, and quality metrics
4. Set up alerts for performance thresholds or quality issues
</Step>
</Steps>
## Key Metrics Tracked
### Performance Metrics
- **Execution Time**: Duration of individual tasks and overall crew execution
- **Token Usage**: Input/output tokens consumed by each agent
- **API Latency**: Response times from LLM providers
- **Success Rate**: Percentage of successfully completed tasks
### Quality Metrics
- **Answer Relevancy**: How well outputs address the given tasks
- **Faithfulness**: Accuracy and consistency of agent responses
- **Coherence**: Logical flow and structure of outputs
- **Safety**: Detection of harmful or inappropriate content
### Cost Metrics
- **API Costs**: Real-time tracking of LLM usage costs
- **Cost per Task**: Economic efficiency analysis
- **Budget Monitoring**: Alerts for spending thresholds
## Best Practices
### Development Phase
- Start with basic tracing to understand agent behavior
- Implement evaluation metrics early in development
- Use custom metrics for domain-specific requirements
- Monitor resource usage during testing
### Production Phase
- Set up comprehensive monitoring and alerting
- Track performance trends over time
- Implement automated quality checks
- Maintain cost visibility and control
### Continuous Improvement
- Regular performance reviews using Confident AI analytics
- A/B testing of different agent configurations
- Feedback loops for quality improvement
- Documentation of optimization insights
For more detailed information and advanced configurations, visit the [Confident AI documentation](https://confident-ai.com/docs) and [DeepEval documentation](https://docs.deepeval.com/).

View File

@@ -57,6 +57,10 @@ Observability is crucial for understanding how your CrewAI agents perform, ident
<Card title="Weave" icon="network-wired" href="/en/observability/weave">
Weights & Biases platform for tracking and evaluating AI applications.
</Card>
<Card title="Confident AI" icon="shield-check" href="/en/observability/confident-ai">
Comprehensive evaluation platform powered by DeepEval for monitoring and optimizing agent performance.
</Card>
</CardGroup>
### Evaluation & Quality Assurance