Files
crewAI/docs/core-concepts/Testing.md
Eduardo Chiarotti 2d2154ed65 feat: add crew Testing/Evaluating feature (#998)
* feat: add crew Testing/evalauting feature

* feat: add docs and add unit test

* feat: improve testing output table

* feat: add tests

* feat: fix type checking issue

* feat: add raise ValueError when testing if output is not the expected

* docs: add docs for Testing

* feat: improve tests and fix some issue

* feat: back to sync

* feat: change opdeai model

* feat: fix test
2024-07-26 14:23:51 -03:00

42 lines
2.0 KiB
Markdown

---
title: crewAI Testing
description: Learn how to test your crewAI Crew and evaluate their performance.
---
## Introduction
Testing is a crucial part of the development process, and it is essential to ensure that your crew is performing as expected. And with crewAI, you can easily test your crew and evaluate its performance using the built-in testing capabilities.
### Using the Testing Feature
We added the CLI command `crewai test` to make it easy to test your crew. This command will run your crew for a specified number of iterations and provide detailed performance metrics.
The parameters are `n_iterations` and `model` which are optional and default to 2 and `gpt-4o-mini` respectively. For now the only provider available is OpenAI.
```bash
crewai test
```
If you want to run more iterations or use a different model, you can specify the parameters like this:
```bash
crewai test --n_iterations 5 --model gpt-4o
```
What happens when you run the `crewai test` command is that the crew will be executed for the specified number of iterations, and the performance metrics will be displayed at the end of the run.
A table of scores at the end will show the performance of the crew in terms of the following metrics:
```
Task Scores
(1-10 Higher is better)
┏━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Tasks/Crew ┃ Run 1 ┃ Run 2 ┃ Avg. Total ┃
┡━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ Task 1 │ 10.0 │ 9.0 │ 9.5 │
│ Task 2 │ 9.0 │ 9.0 │ 9.0 │
│ Crew │ 9.5 │ 9.0 │ 9.2 │
└────────────┴───────┴───────┴────────────┘
```
The example above shows the test results for two runs of the crew with two tasks, with the average total score for each task and the crew as a whole.