Introduce Evaluator Experiment (#3133)

* feat: add exchanged messages in LLMCallCompletedEvent

* feat: add GoalAlignment metric for Agent evaluation

* feat: add SemanticQuality metric for Agent evaluation

* feat: add Tool Metrics for Agent evaluation

* feat: add Reasoning Metrics for Agent evaluation, still in progress

* feat: add AgentEvaluator class

This class will evaluate Agent' results and report to user

* fix: do not evaluate Agent by default

This is a experimental feature we still need refine it further

* test: add Agent eval tests

* fix: render all feedback per iteration

* style: resolve linter issues

* style: fix mypy issues

* fix: allow messages be empty on LLMCallCompletedEvent

* feat: add Experiment evaluation framework with baseline comparison

* fix: reset evaluator for each experiement iteraction

* fix: fix track of new test cases

* chore: split Experimental evaluation classes

* refactor: remove unused method

* refactor: isolate Console print in a dedicated class

* fix: make crew required to run an experiment

* fix: use time-aware to define experiment result

* test: add tests for Evaluator Experiment

* style: fix linter issues

* fix: encode string before hashing

* style: resolve linter issues

* feat: add experimental folder for beta features (#3141)

* test: move tests to experimental folder
This commit is contained in:
Lucas Gomide
2025-07-14 10:06:45 -03:00
committed by GitHub
parent 3ada4053bd
commit 1b6b2b36d9
27 changed files with 2512 additions and 16 deletions

View File

@@ -0,0 +1,28 @@
import pytest
from unittest.mock import MagicMock
from crewai.agent import Agent
from crewai.task import Task
class BaseEvaluationMetricsTest:
@pytest.fixture
def mock_agent(self):
agent = MagicMock(spec=Agent)
agent.id = "test_agent_id"
agent.role = "Test Agent"
agent.goal = "Test goal"
agent.tools = []
return agent
@pytest.fixture
def mock_task(self):
task = MagicMock(spec=Task)
task.description = "Test task description"
task.expected_output = "Test expected output"
return task
@pytest.fixture
def execution_trace(self):
return {
"thinking": ["I need to analyze this data carefully"],
"actions": ["Gathered information", "Analyzed data"]
}