From 27f33b201d976fb16c5ea0d00b9da6f89702cc0b Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Fri, 22 Aug 2025 06:06:09 +0000
Subject: [PATCH] feat: add Confident AI observability integration
 documentation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add comprehensive Confident AI integration guide
- Include setup instructions, code examples, and best practices
- Update observability overview to include Confident AI card
- Follow existing documentation patterns and structure

Resolves #3383

Co-Authored-By: João <joao@crewai.com>
---
 docs/en/observability/confident-ai.mdx | 137 +++++++++++++++++++++++++
 docs/en/observability/overview.mdx     |   4 +
 2 files changed, 141 insertions(+)
 create mode 100644 docs/en/observability/confident-ai.mdx
diff --git a/docs/en/observability/confident-ai.mdx b/docs/en/observability/confident-ai.mdx
new file mode 100644
index 000000000..4ba2412a1
--- /dev/null
+++ b/docs/en/observability/confident-ai.mdx
@@ -0,0 +1,137 @@
+---
+title: Confident AI Integration
+description: Monitor and evaluate your CrewAI agents with Confident AI's comprehensive evaluation platform powered by DeepEval.
+icon: shield-check
+---
+
+# Confident AI Overview
+
+[Confident AI](https://confident-ai.com) is a comprehensive evaluation platform for LLM applications, powered by [DeepEval](https://github.com/confident-ai/deepeval). It provides advanced monitoring, evaluation, and optimization capabilities specifically designed for AI agent workflows.
+
+Confident AI offers both tracing capabilities to monitor your agents in real-time and evaluation tools to assess the quality, safety, and performance of your CrewAI applications.
+
+### Features
+
+- **Real-time Monitoring**: Track agent interactions, task execution, and performance metrics
+- **Comprehensive Evaluation**: Assess output quality, relevance, safety, and consistency
+- **Cost Tracking**: Monitor LLM API usage and associated costs across your crews
+- **Safety & Compliance**: Detect potential issues like bias, toxicity, and PII leaks
+- **Performance Analytics**: Analyze execution times, success rates, and bottlenecks
+- **Custom Metrics**: Define and track domain-specific evaluation criteria
+- **Team Collaboration**: Share insights and collaborate on agent optimization
+
+## Setup Instructions
+
+<Steps>
+    <Step title="Install Dependencies">
+      ```shell
+      pip install deepeval crewai
+      ```
+    </Step>
+    <Step title="Get API Key">
+      1. Sign up at [Confident AI](https://confident-ai.com)
+      2. Navigate to your project settings
+      3. Copy your API key
+    </Step>
+    <Step title="Configure CrewAI">
+      Instrument CrewAI with your Confident API key using `instrument_crewai`:
+
+      ```python
+      from crewai import Task, Crew, Agent
+      from deepeval.integrations.crewai import instrument_crewai
+
+      instrument_crewai()
+
+      agent = Agent(
+          role="Consultant",
+          goal="Write clear, concise explanation.",
+          backstory="An expert consultant with a keen eye for software trends.",
+      )
+
+      task = Task(
+          description="Explain the importance of {topic}",
+          expected_output="A clear and concise explanation of the topic.",
+          agent=agent,
+      )
+
+      crew = Crew(agents=[agent], tasks=[task])
+
+      result = crew.kickoff(inputs={"topic": "AI"})
+      ```
+    </Step>
+    <Step title="Add Evaluation (Optional)">
+      For comprehensive evaluation of your crew's outputs:
+
+      ```python
+      from deepeval import evaluate
+      from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
+      from deepeval.test_case import LLMTestCase
+
+      # Define evaluation metrics
+      relevancy_metric = AnswerRelevancyMetric(threshold=0.7)
+      faithfulness_metric = FaithfulnessMetric(threshold=0.8)
+
+      # Execute crew
+      result = crew.kickoff(inputs={"topic": "artificial intelligence"})
+
+      # Create test case for evaluation
+      test_case = LLMTestCase(
+          input="Explain the importance of artificial intelligence",
+          actual_output=str(result),
+          expected_output="A comprehensive explanation of AI's significance"
+      )
+
+      # Evaluate the output
+      evaluate([test_case], [relevancy_metric, faithfulness_metric])
+      ```
+    </Step>
+    <Step title="View Results">
+      After running your CrewAI application with Confident AI integration:
+      
+      1. Visit your [Confident AI dashboard](https://confident-ai.com/dashboard)
+      2. Navigate to your project to view traces and evaluations
+      3. Analyze agent performance, costs, and quality metrics
+      4. Set up alerts for performance thresholds or quality issues
+    </Step>
+</Steps>
+
+## Key Metrics Tracked
+
+### Performance Metrics
+- **Execution Time**: Duration of individual tasks and overall crew execution
+- **Token Usage**: Input/output tokens consumed by each agent
+- **API Latency**: Response times from LLM providers
+- **Success Rate**: Percentage of successfully completed tasks
+
+### Quality Metrics
+- **Answer Relevancy**: How well outputs address the given tasks
+- **Faithfulness**: Accuracy and consistency of agent responses
+- **Coherence**: Logical flow and structure of outputs
+- **Safety**: Detection of harmful or inappropriate content
+
+### Cost Metrics
+- **API Costs**: Real-time tracking of LLM usage costs
+- **Cost per Task**: Economic efficiency analysis
+- **Budget Monitoring**: Alerts for spending thresholds
+
+## Best Practices
+
+### Development Phase
+- Start with basic tracing to understand agent behavior
+- Implement evaluation metrics early in development
+- Use custom metrics for domain-specific requirements
+- Monitor resource usage during testing
+
+### Production Phase
+- Set up comprehensive monitoring and alerting
+- Track performance trends over time
+- Implement automated quality checks
+- Maintain cost visibility and control
+
+### Continuous Improvement
+- Regular performance reviews using Confident AI analytics
+- A/B testing of different agent configurations
+- Feedback loops for quality improvement
+- Documentation of optimization insights
+
+For more detailed information and advanced configurations, visit the [Confident AI documentation](https://confident-ai.com/docs) and [DeepEval documentation](https://docs.deepeval.com/).
diff --git a/docs/en/observability/overview.mdx b/docs/en/observability/overview.mdx
index e99858c9e..ca4e48a8c 100644
--- a/docs/en/observability/overview.mdx
+++ b/docs/en/observability/overview.mdx
@@ -57,6 +57,10 @@ Observability is crucial for understanding how your CrewAI agents perform, ident
   <Card title="Weave" icon="network-wired" href="/en/observability/weave">
     Weights & Biases platform for tracking and evaluating AI applications.
   </Card>
+
+  <Card title="Confident AI" icon="shield-check" href="/en/observability/confident-ai">
+    Comprehensive evaluation platform powered by DeepEval for monitoring and optimizing agent performance.
+  </Card>
 </CardGroup>
 
 ### Evaluation & Quality Assurance