Introduce Evaluator Experiment (#3133)

* feat: add exchanged messages in LLMCallCompletedEvent * feat: add GoalAlignment metric for Agent evaluation * feat: add SemanticQuality metric for Agent evaluation * feat: add Tool Metrics for Agent evaluation * feat: add Reasoning Metrics for Agent evaluation, still in progress * feat: add AgentEvaluator class This class will evaluate Agent' results and report to user * fix: do not evaluate Agent by default This is a experimental feature we still need refine it further * test: add Agent eval tests * fix: render all feedback per iteration * style: resolve linter issues * style: fix mypy issues * fix: allow messages be empty on LLMCallCompletedEvent * feat: add Experiment evaluation framework with baseline comparison * fix: reset evaluator for each experiement iteraction * fix: fix track of new test cases * chore: split Experimental evaluation classes * refactor: remove unused method * refactor: isolate Console print in a dedicated class * fix: make crew required to run an experiment * fix: use time-aware to define experiment result * test: add tests for Evaluator Experiment * style: fix linter issues * fix: encode string before hashing * style: resolve linter issues * feat: add experimental folder for beta features (#3141) * test: move tests to experimental folder
2026-01-10 08:38:30 +00:00 · 2025-07-14 10:06:45 -03:00
parent 3ada4053bd
commit 1b6b2b36d9
27 changed files with 2512 additions and 16 deletions
--- a/src/crewai/crew.py
+++ b/src/crewai/crew.py
@@ -1337,7 +1337,7 @@ class Crew(FlowTrackable, BaseModel):
            evaluator = CrewEvaluator(test_crew, llm_instance)

            if include_agent_eval:
-                from crewai.evaluation import create_default_evaluator
+                from crewai.experimental.evaluation import create_default_evaluator
                agent_evaluator = create_default_evaluator(crew=test_crew)

            for i in range(1, n_iterations + 1):