From 27f33b201d976fb16c5ea0d00b9da6f89702cc0b Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Fri, 22 Aug 2025 06:06:09 +0000 Subject: [PATCH] feat: add Confident AI observability integration documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add comprehensive Confident AI integration guide - Include setup instructions, code examples, and best practices - Update observability overview to include Confident AI card - Follow existing documentation patterns and structure Resolves #3383 Co-Authored-By: João --- docs/en/observability/confident-ai.mdx | 137 +++++++++++++++++++++++++ docs/en/observability/overview.mdx | 4 + 2 files changed, 141 insertions(+) create mode 100644 docs/en/observability/confident-ai.mdx diff --git a/docs/en/observability/confident-ai.mdx b/docs/en/observability/confident-ai.mdx new file mode 100644 index 000000000..4ba2412a1 --- /dev/null +++ b/docs/en/observability/confident-ai.mdx @@ -0,0 +1,137 @@ +--- +title: Confident AI Integration +description: Monitor and evaluate your CrewAI agents with Confident AI's comprehensive evaluation platform powered by DeepEval. +icon: shield-check +--- + +# Confident AI Overview + +[Confident AI](https://confident-ai.com) is a comprehensive evaluation platform for LLM applications, powered by [DeepEval](https://github.com/confident-ai/deepeval). It provides advanced monitoring, evaluation, and optimization capabilities specifically designed for AI agent workflows. + +Confident AI offers both tracing capabilities to monitor your agents in real-time and evaluation tools to assess the quality, safety, and performance of your CrewAI applications. + +### Features + +- **Real-time Monitoring**: Track agent interactions, task execution, and performance metrics +- **Comprehensive Evaluation**: Assess output quality, relevance, safety, and consistency +- **Cost Tracking**: Monitor LLM API usage and associated costs across your crews +- **Safety & Compliance**: Detect potential issues like bias, toxicity, and PII leaks +- **Performance Analytics**: Analyze execution times, success rates, and bottlenecks +- **Custom Metrics**: Define and track domain-specific evaluation criteria +- **Team Collaboration**: Share insights and collaborate on agent optimization + +## Setup Instructions + + + + ```shell + pip install deepeval crewai + ``` + + + 1. Sign up at [Confident AI](https://confident-ai.com) + 2. Navigate to your project settings + 3. Copy your API key + + + Instrument CrewAI with your Confident API key using `instrument_crewai`: + + ```python + from crewai import Task, Crew, Agent + from deepeval.integrations.crewai import instrument_crewai + + instrument_crewai() + + agent = Agent( + role="Consultant", + goal="Write clear, concise explanation.", + backstory="An expert consultant with a keen eye for software trends.", + ) + + task = Task( + description="Explain the importance of {topic}", + expected_output="A clear and concise explanation of the topic.", + agent=agent, + ) + + crew = Crew(agents=[agent], tasks=[task]) + + result = crew.kickoff(inputs={"topic": "AI"}) + ``` + + + For comprehensive evaluation of your crew's outputs: + + ```python + from deepeval import evaluate + from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric + from deepeval.test_case import LLMTestCase + + # Define evaluation metrics + relevancy_metric = AnswerRelevancyMetric(threshold=0.7) + faithfulness_metric = FaithfulnessMetric(threshold=0.8) + + # Execute crew + result = crew.kickoff(inputs={"topic": "artificial intelligence"}) + + # Create test case for evaluation + test_case = LLMTestCase( + input="Explain the importance of artificial intelligence", + actual_output=str(result), + expected_output="A comprehensive explanation of AI's significance" + ) + + # Evaluate the output + evaluate([test_case], [relevancy_metric, faithfulness_metric]) + ``` + + + After running your CrewAI application with Confident AI integration: + + 1. Visit your [Confident AI dashboard](https://confident-ai.com/dashboard) + 2. Navigate to your project to view traces and evaluations + 3. Analyze agent performance, costs, and quality metrics + 4. Set up alerts for performance thresholds or quality issues + + + +## Key Metrics Tracked + +### Performance Metrics +- **Execution Time**: Duration of individual tasks and overall crew execution +- **Token Usage**: Input/output tokens consumed by each agent +- **API Latency**: Response times from LLM providers +- **Success Rate**: Percentage of successfully completed tasks + +### Quality Metrics +- **Answer Relevancy**: How well outputs address the given tasks +- **Faithfulness**: Accuracy and consistency of agent responses +- **Coherence**: Logical flow and structure of outputs +- **Safety**: Detection of harmful or inappropriate content + +### Cost Metrics +- **API Costs**: Real-time tracking of LLM usage costs +- **Cost per Task**: Economic efficiency analysis +- **Budget Monitoring**: Alerts for spending thresholds + +## Best Practices + +### Development Phase +- Start with basic tracing to understand agent behavior +- Implement evaluation metrics early in development +- Use custom metrics for domain-specific requirements +- Monitor resource usage during testing + +### Production Phase +- Set up comprehensive monitoring and alerting +- Track performance trends over time +- Implement automated quality checks +- Maintain cost visibility and control + +### Continuous Improvement +- Regular performance reviews using Confident AI analytics +- A/B testing of different agent configurations +- Feedback loops for quality improvement +- Documentation of optimization insights + +For more detailed information and advanced configurations, visit the [Confident AI documentation](https://confident-ai.com/docs) and [DeepEval documentation](https://docs.deepeval.com/). diff --git a/docs/en/observability/overview.mdx b/docs/en/observability/overview.mdx index e99858c9e..ca4e48a8c 100644 --- a/docs/en/observability/overview.mdx +++ b/docs/en/observability/overview.mdx @@ -57,6 +57,10 @@ Observability is crucial for understanding how your CrewAI agents perform, ident Weights & Biases platform for tracking and evaluating AI applications. + + + Comprehensive evaluation platform powered by DeepEval for monitoring and optimizing agent performance. + ### Evaluation & Quality Assurance