diff --git a/docs/tools/patronusevaltool.mdx b/docs/tools/patronusevaltool.mdx new file mode 100644 index 000000000..3e384e214 --- /dev/null +++ b/docs/tools/patronusevaltool.mdx @@ -0,0 +1,64 @@ + +--- +title: Patronus Eval Tool +description: The `PatronusEvalTool` is designed to evaluate agent inputs, outputs and context with a contextually selected criteria and log results to app.patronus.ai +icon: shield +--- + +# `PatronusEvalTool` + +## Description + +The `PatronusEvalTool` is designed to evaluate agent inputs, outputs and context with a contextually selected criteria and log results to [app.patronus.ai](https://app.patronus.ai) +It utilizes the [Patronus AI](https://patronus.ai/) API to +1. Fetch all available criteria for the specific user associated with the `PATRONUS_API_KEY` +2. Select the most fitting criteria for the task according to the defined `Task` +3. Evaluates the inputs/outputs/context according to the selected list of criteria and logs them to [app.patronus.ai](https://app.patronus.ai) + +## Installation + +To incorporate this tool into your project, follow the installation instructions below: + +```shell +pip install 'crewai[tools]' +``` + +## Steps to Get Started + +Follow these steps correctly to use the PatronusEvalTool : + +1. Confirm that the `crewai[tools]` package is installed in your Python environment. +2. Acquire a Patronus API key by registering for a free account at [patronus.ai](https://patronus.ai/). +3. Export your API key using `export PATRONUS_API_KEY=[YOUR_KEY_HERE]` + +## Example + +This example demonstrates the use of the PatronusEvalTool for verifying if the generated content is code or not. Here, the agent selects the `contains-code` predefined-criteria, evaluates the output generated for the instruction and logs the results to [app.patronus.ai](https://app.patronus.ai) + +```python +from crewai_tools import PatronusEvalTool +tool = PatronusEvalTool() + +coding_agent = Agent( + role="Coding Agent", + goal="Generate high quality code and verify that the output is code by using Patronus AI's evaluation tool.", + backstory="You are an experienced coder who can generate high quality python code. You can follow complex instructions accurately and effectively.", + tools=[patronus_eval_tool], + verbose=True, +) + +generate_code = Task( + description="Create a simple program to generate the first N numbers in the Fibonacci sequence. Select the most appropriate evaluator and criteria for evaluating your output.", + expected_output="Program that generates the first N numbers in the Fibonacci sequence.", + agent=coding_agent, +) + +crew = Crew(agents=[coding_agent], tasks=[generate_code]) +crew.kickoff() +``` + +## Conclusion + +With the `PatronusEvalTool`, users can build confidence in their agentic systems and improve reliability of their product. +Using [patronus.ai](https://patronus.ai), agents can choose from several of the pre-defined or custom defined criteria from the platform and evaluate their outputs, making it easier for the user to debug their agentic pipelines. +Users can also define their own criteria at [app.patronus.ai](https://app.patronus.ai) or local evaluation function (guide [here](https://docs.patronus.ai/docs/experiment-evaluators)) to help with custom evaluation needs. For using custom-defined criteria and local evaluators it is encouraged to use the `PatronusPredifinedCriteriaEvalTool` and `PatronusLocalEvaluatorTool` respectively. \ No newline at end of file diff --git a/docs/tools/patronuslocalevaluatortool.mdx b/docs/tools/patronuslocalevaluatortool.mdx new file mode 100644 index 000000000..42b96e574 --- /dev/null +++ b/docs/tools/patronuslocalevaluatortool.mdx @@ -0,0 +1,75 @@ + +--- +title: Patronus Local Evaluator Tool +description: The `PatronusLocalEvaluatorTool` is designed to evaluate agent inputs,outputs and context based on a user defined function and log evaluation results to [app.patronus.ai](http://app.patronus.ai) +icon: shield +--- + +# `PatronusLocalEvaluatorTool` + +## Description + +The `PatronusLocalEvaluatorTool` is designed to evaluate agent inputs, outputs and context based on a user defined local function and log evaluation results to [app.patronus.ai](http://app.patronus.ai) +It utilizes the [Patronus AI](https://patronus.ai/) API to +1. Evaluate the inputs/outputs/context according to the user defined metric/evaluation criteria +2. Log the results to [app.patronus.ai](https://app.patronus.ai) where they can be visualized + +## Installation + +To incorporate this tool into your project, follow the installation instructions below: + +```shell +pip install 'crewai[tools]' +``` + +## Steps to Get Started + +Follow these steps correctly to use the PatronusLocalEvaluatorTool : + +1. Confirm that the `crewai[tools]` package is installed in your Python environment. +2. Acquire a Patronus API key by registering for a free account at [patronus.ai](https://patronus.ai/). +3. Export your API key using `export PATRONUS_API_KEY=[YOUR_KEY_HERE]` + +## Example + +This example demonstrates the use of the PatronusLocalEvaluatorTool. + +```python +from patronus import Client, EvaluationResult +from crewai_tools import PatronusLocalEvaluatorTool + +client = Client() + +# Register a local evaluation function. For more details refer to https://docs.patronus.ai/docs/experiment-evaluators +@client.register_local_evaluator("local_evaluator_name") +def my_evaluator(**kwargs): + return EvaluationResult(pass_="PASS", score=0.5, explanation="Explanation test") + +patronus_eval_tool = PatronusLocalEvaluatorTool( + evaluator="local_evaluator_name", evaluated_model_gold_answer="test" +) + +coding_agent = Agent( + role="Coding Agent", + goal="Generate high quality code and verify that the output is code by using Patronus AI's evaluation tool.", + backstory="You are an experienced coder who can generate high quality python code. You can follow complex instructions accurately and effectively.", + tools=[patronus_eval_tool], + verbose=True, +) + +generate_code = Task( + description="Create a simple program to generate the first N numbers in the Fibonacci sequence. Select the most appropriate evaluator and criteria for evaluating your output.", + expected_output="Program that generates the first N numbers in the Fibonacci sequence.", + agent=coding_agent, +) + +crew = Crew(agents=[coding_agent], tasks=[generate_code]) +crew.kickoff() +``` + +## Conclusion + +Using `PatronusLocalEvaluatorTool` users can easily and quickly define custom evaluation functions and help improve customer confidence in their agentic systems. +Using patronus.ai, users can conveniently log their custom metrics to [app.patronus.ai](https://app.patronus.ai), making it easier for the user to visualize trends and debug their agentic pipelines. +Users can also define their own criteria at [app.patronus.ai](https://app.patronus.ai) or let the agent choose an existing evaluator on the Patronus platform to help with custom evaluation needs. +For using custom-defined criteria and local evaluators it is encouraged to use the `PatronusPredifinedCriteriaEvalTool` and `PatronusEvalTool` respectively. \ No newline at end of file diff --git a/docs/tools/patronuspredefinedcriteriaevaltool.mdx b/docs/tools/patronuspredefinedcriteriaevaltool.mdx new file mode 100644 index 000000000..7bbda28be --- /dev/null +++ b/docs/tools/patronuspredefinedcriteriaevaltool.mdx @@ -0,0 +1,65 @@ + +--- +title: Patronus Eval Tool +description: The `PatronusPredefinedCriteriaEvalTool` is designed to evaluate agent outputs for a specific criteria on the Patronus platform. The evaluation results for this are logged to [app.patronus.ai](https://app.patronus.ai) +icon: shield +--- + +# `PatronusPredefinedCriteriaEvalTool` + +## Description + +The `PatronusPredefinedCriteriaEvalTool` is designed to evaluate agent outputs for a specific criteria on the Patronus platform. The evaluation results for this are logged to [app.patronus.ai](https://app.patronus.ai) +It utilizes the [Patronus AI](https://patronus.ai/) API to +1. Evaluate the agent input, output, context and gold answer (if available) according to the criteria +2. Logs the results to [app.patronus.ai](https://app.patronus.ai) + +## Installation + +To incorporate this tool into your project, follow the installation instructions below: + +```shell +pip install 'crewai[tools]' +``` + +## Steps to Get Started + +Follow these steps correctly to use the PatronusPredefinedCriteriaEvalTool : + +1. Confirm that the `crewai[tools]` package is installed in your Python environment. +2. Acquire a Patronus API key by registering for a free account at [patronus.ai](https://patronus.ai/). +3. Export your API key using `export PATRONUS_API_KEY=[YOUR_KEY_HERE]` + +## Example + +This example demonstrates the use of the PatronusPredefinedCriteriaEvalTool for verifying if the generated content is code or not. Here, the agent selects the `contains-code` predefined-criteria, evaluates the output generated for the instruction and logs the results to [app.patronus.ai](https://app.patronus.ai) + +```python +from crewai_tools import PatronusPredefinedCriteriaEvalTool +patronus_eval_tool = PatronusPredifinedCriteriaEvalTool( + evaluators=[{"evaluator": "judge", "criteria": "contains-code"}] # Selecting the "contains-code" criteria and using the default "judge" from Patronus AI +) + +coding_agent = Agent( + role="Coding Agent", + goal="Generate high quality code and verify that the output is code by using Patronus AI's evaluation tool.", + backstory="You are an experienced coder who can generate high quality python code. You can follow complex instructions accurately and effectively.", + tools=[patronus_eval_tool], + verbose=True, +) + +generate_code = Task( + description="Create a simple program to generate the first N numbers in the Fibonacci sequence. Select the most appropriate evaluator and criteria for evaluating your output.", + expected_output="Program that generates the first N numbers in the Fibonacci sequence.", + agent=coding_agent, +) + +crew = Crew(agents=[coding_agent], tasks=[generate_code]) +crew.kickoff() +``` + +## Conclusion + +Using `PatronusPredefinedCriteriaEvalTool`, users can conveniently evaluate the inputs, outputs and context provided to the agent. +Using patronus.ai, agents can choose from several of the pre-defined or custom defined criteria from the platform and evaluate their outputs, making it easier to debug agentic pipelines. +In the case where the user wants the agent to contextually select the criteria from the list available at [app.patronus.ai](https://app.patronus.ai) or if a local evaluation function is preferred (guide [here](https://docs.patronus.ai/docs/experiment-evaluators)), it is encouraged to use the `PatronusEvalTool` and `PatronusLocalEvaluatorTool` respectively. \ No newline at end of file