crewAI/docs/core-concepts/Testing.md at 2f09652d8511e4882f7a4eb7afe1a61c95059dee

mirror of https://github.com/crewAIInc/crewAI.git synced 2026-01-04 05:38:33 +00:00

Files

João Moura e77442cf34 Removing LangChain and Rebuilding Executor (#1322 )

* rebuilding executor

* removing langchain

* Making all tests good

* fixing types and adding ability for nor using system prompts

* improving types

* pleasing the types gods

* pleasing the types gods

* fixing parser, tools and executor

* making sure all tests pass

* final pass

* fixing type

* Updating Docs

* preparing to cut new version

2024-09-16 14:14:04 -03:00

4.4 KiB

Raw Blame History

title, description

title	description
crewAI Testing	Learn how to test your crewAI Crew and evaluate their performance.

Introduction

Testing is a crucial part of the development process, and it is essential to ensure that your crew is performing as expected. With crewAI, you can easily test your crew and evaluate its performance using the built-in testing capabilities.

Using the Testing Feature

We added the CLI command crewai test to make it easy to test your crew. This command will run your crew for a specified number of iterations and provide detailed performance metrics. The parameters are n_iterations and model, which are optional and default to 2 and gpt-4o-mini respectively. For now, the only provider available is OpenAI.

crewai test

If you want to run more iterations or use a different model, you can specify the parameters like this:

crewai test --n_iterations 5 --model gpt-4o

or using the short forms:

crewai test -n 5 -m gpt-4o

When you run the crewai test command, the crew will be executed for the specified number of iterations, and the performance metrics will be displayed at the end of the run.

A table of scores at the end will show the performance of the crew in terms of the following metrics:

                                                     Tasks Scores
                                                (1-10 Higher is better)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Tasks/Crew/Agents  │ Run 1 │ Run 2 │ Avg. Total │ Agents                         │                                 ┃
┠────────────────────┼───────┼───────┼────────────┼────────────────────────────────┼─────────────────────────────────┨
┃ Task 1             │  9.0  │  9.5  │    9.2     │ - Professional Insights        │                                 ┃
┃                    │       │       │            │ Researcher                     │                                 ┃
┃                    │       │       │            │                                │                                 ┃
┃ Task 2             │  9.0  │ 10.0  │    9.5     │ - Company Profile Investigator │                                 ┃
┃                    │       │       │            │                                │                                 ┃
┃ Task 3             │  9.0  │  9.0  │    9.0     │ - Automation Insights          │                                 ┃
┃                    │       │       │            │ Specialist                     │                                 ┃
┃                    │       │       │            │                                │                                 ┃
┃ Task 4             │  9.0  │  9.0  │    9.0     │ - Final Report Compiler        │                                 ┃
┃                    │       │       │            │                                │ - Automation Insights           ┃
┃                    │       │       │            │                                │ Specialist                      ┃
┃ Crew               │ 9.00  │ 9.38  │    9.2     │                                │                                 ┃
┃ Execution Time (s) │  126  │  145  │    135     │                                │                                 ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

The example above shows the test results for two runs of the crew with two tasks, with the average total score for each task and the crew as a whole.

4.4 KiB Raw Blame History

Introduction

Using the Testing Feature

4.4 KiB

Raw Blame History