Introducing Agent evaluation (#3130)

* feat: add exchanged messages in LLMCallCompletedEvent

* feat: add GoalAlignment metric for Agent evaluation

* feat: add SemanticQuality metric for Agent evaluation

* feat: add Tool Metrics for Agent evaluation

* feat: add Reasoning Metrics for Agent evaluation, still in progress

* feat: add AgentEvaluator class

This class will evaluate Agent' results and report to user

* fix: do not evaluate Agent by default

This is a experimental feature we still need refine it further

* test: add Agent eval tests

* fix: render all feedback per iteration

* style: resolve linter issues

* style: fix mypy issues

* fix: allow messages be empty on LLMCallCompletedEvent
This commit is contained in:
Lucas Gomide
2025-07-11 14:18:03 -03:00
committed by GitHub
parent bf8fa3232b
commit 08fa3797ca
26 changed files with 2930 additions and 14 deletions

File diff suppressed because one or more lines are too long