updating docs

2026-07-08 08:25:09 +00:00 · 2024-08-30 00:15:06 -03:00
parent 2f2945169d
commit d03b89486d
2 changed files with 93 additions and 29 deletions
--- a/docs/telemetry/Telemetry.md
+++ b/docs/telemetry/Telemetry.md
@@ -5,24 +5,38 @@ description: Understanding the telemetry data collected by CrewAI and how it con

 ## Telemetry

-CrewAI utilizes anonymous telemetry to gather usage statistics with the primary goal of enhancing the library. Our focus is on improving and developing the features, integrations, and tools most utilized by our users. We don't offer a way to disable it now, but we will in the future.
+!!! note "Personal Information"
+    By default, we collect no data that would be considered personal information under GDPR and other privacy regulations.
+    We do collect Tool's names and Agent's roles, so be advised not to include any personal information in the tool's names or the Agent's roles.
+		Because no personal information is collected, it's not necessary to worry about data residency.
+		When `share_crew` is enabled, additional data is collected which may contain personal information if included by the user. Users should exercise caution when enabling this feature to ensure compliance with privacy regulations.

-It's pivotal to understand that **NO data is collected** concerning prompts, task descriptions, agents' backstories or goals, usage of tools, API calls, responses, any data processed by the agents, or secrets and environment variables, with the exception of the conditions mentioned. When the `share_crew` feature is enabled, detailed data including task descriptions, agents' backstories or goals, and other specific attributes are collected to provide deeper insights while respecting user privacy.
+CrewAI utilizes anonymous telemetry to gather usage statistics with the primary goal of enhancing the library. Our focus is on improving and developing the features, integrations, and tools most utilized by our users.

-### Data Collected Includes:
- **Version of CrewAI**: Assessing the adoption rate of our latest version helps us understand user needs and guide our updates.
- **Python Version**: Identifying the Python versions our users operate with assists in prioritizing our support efforts for these versions.
- **General OS Information**: Details like the number of CPUs and the operating system type (macOS, Windows, Linux) enable us to focus our development on the most used operating systems and explore the potential for OS-specific features.
- **Number of Agents and Tasks in a Crew**: Ensures our internal testing mirrors real-world scenarios, helping us guide users towards best practices.
- **Crew Process Utilization**: Understanding how crews are utilized aids in directing our development focus.
- **Memory and Delegation Use by Agents**: Insights into how these features are used help evaluate their effectiveness and future.
- **Task Execution Mode**: Knowing whether tasks are executed in parallel or sequentially influences our emphasis on enhancing parallel execution capabilities.
- **Language Model Utilization**: Supports our goal to improve support for the most popular languages among our users.
- **Roles of Agents within a Crew**: Understanding the various roles agents play aids in crafting better tools, integrations, and examples.
- **Tool Usage**: Identifying which tools are most frequently used allows us to prioritize improvements in those areas.
+It's pivotal to understand that by default, **NO personal data is collected** concerning prompts, task descriptions, agents' backstories or goals, usage of tools, API calls, responses, any data processed by the agents, or secrets and environment variables.
+When the `share_crew` feature is enabled, detailed data including task descriptions, agents' backstories or goals, and other specific attributes are collected to provide deeper insights. This expanded data collection may include personal information if users have incorporated it into their crews or tasks. Users should carefully consider the content of their crews and tasks before enabling `share_crew`. Users can disable telemetry by setting the environment variable OTEL_SDK_DISABLED to true.
+
+### Data Explanation:
+| Defaulted | Data                                      | Reason and Specifics                                                                                                       |
+|-----------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
+| Yes       | CrewAI and Python Version                 | Tracks software versions. Example: CrewAI v1.2.3, Python 3.8.10. No personal data. |
+| Yes       | Crew Metadata | Includes: randomly generated key and ID, process type (e.g., 'sequential', 'parallel'), boolean flag for memory usage (true/false), count of tasks, count of agents. All non-personal. |
+| Yes       | Agent Data | Includes: randomly generated key and ID, role name (should not include personal info), boolean settings (verbose, delegation enabled, code execution allowed), max iterations, max RPM, max retry limit, LLM info (see LLM Attributes), list of tool names (should not include personal info). No personal data. |
+| Yes       | Task Metadata | Includes: randomly generated key and ID, boolean execution settings (async_execution, human_input), associated agent's role and key, list of tool names. All non-personal. |
+| Yes       | Tool Usage Statistics | Includes: tool name (should not include personal info), number of usage attempts (integer), LLM attributes used. No personal data. |
+| Yes       | Test Execution Data | Includes: crew's randomly generated key and ID, number of iterations, model name used, quality score (float), execution time (in seconds). All non-personal. |
+| Yes       | Task Lifecycle Data | Includes: creation and execution start/end times, crew and task identifiers. Stored as spans with timestamps. No personal data. |
+| Yes       | LLM Attributes | Includes: name, model_name, model, top_k, temperature, and class name of the LLM. All technical, non-personal data. |
+| No        | Agent's Expanded Data | Includes: goal description, backstory text, i18n prompt file identifier. Users should ensure no personal info is included in text fields. |
+| No        | Detailed Task Information | Includes: task description, expected output description, context references. Users should ensure no personal info is included in these fields. |
+| No        | Environment Information | Includes: platform, release, system, version, and CPU count. Example: 'Windows 10', 'x86_64'. No personal data. |
+| No        | Crew and Task Inputs and Outputs | Includes: input parameters and output results as non-identifiable data. Users should ensure no personal info is included. |
+| No        | Comprehensive Crew Execution Data | Includes: detailed logs of crew operations, all agents and tasks data, final output. All non-personal and technical in nature. |
+
+Note: "No" in the "Defaulted" column indicates that this data is only collected when `share_crew` is set to `true`.

 ### Opt-In Further Telemetry Sharing
-Users can choose to share their complete telemetry data by enabling the `share_crew` attribute to `True` in their crew configurations. Enabling `share_crew` results in the collection of detailed crew and task execution data, including `goal`, `backstory`, `context`, and `output` of tasks. This enables a deeper insight into usage patterns while respecting the user's choice to share.
+Users can choose to share their complete telemetry data by enabling the `share_crew` attribute to `True` in their crew configurations. Enabling `share_crew` results in the collection of detailed crew and task execution data, including `goal`, `backstory`, `context`, and `output` of tasks. This enables a deeper insight into usage patterns.

-### Updates and Revisions
-We are committed to maintaining the accuracy and transparency of our documentation. Regular reviews and updates are performed to ensure our documentation accurately reflects the latest developments of our codebase and telemetry practices. Users are encouraged to review this section for the most current information on our data collection practices and how they contribute to the improvement of CrewAI.
+!!! warning "Potential Personal Information"
+    If you enable `share_crew`, the collected data may include personal information if it has been incorporated into crew configurations, task descriptions, or outputs. Users should carefully review their data and ensure compliance with GDPR and other applicable privacy regulations before enabling this feature.
--- a/src/crewai/telemetry/telemetry.py
+++ b/src/crewai/telemetry/telemetry.py
@@ -28,18 +28,6 @@ class Telemetry:
    agents backstories or goals nor responses or any data that is being
    processed by the agents, nor any secrets and env vars.

-    Data collected includes:
-    - Version of crewAI
-    - Version of Python
-    - General OS (e.g. number of CPUs, macOS/Windows/Linux)
-    - Number of agents and tasks in a crew
-    - Crew Process being used
-    - If Agents are using memory or allowing delegation
-    - If Tasks are being executed in parallel or sequentially
-    - Language model being used
-    - Roles of agents in a crew
-    - Tools names available
-
    Users can opt-in to sharing more complete data using the `share_crew`
    attribute in the Crew class.
    """
@@ -114,10 +102,17 @@ class Telemetry:
                                    "max_iter": agent.max_iter,
                                    "max_rpm": agent.max_rpm,
                                    "i18n": agent.i18n.prompt_file,
+                                    "function_calling_llm": json.dumps(
+                                        self._safe_llm_attributes(
+                                            agent.function_calling_llm
+                                        )
+                                    ),
                                    "llm": json.dumps(
                                        self._safe_llm_attributes(agent.llm)
                                    ),
                                    "delegation_enabled?": agent.allow_delegation,
+                                    "allow_code_execution?": agent.allow_code_execution,
+                                    "max_retry_limit": agent.max_retry_limit,
                                    "tools_names": [
                                        tool.name.casefold()
                                        for tool in agent.tools or []
@@ -165,7 +160,62 @@ class Telemetry:
                    self._add_attribute(
                        span, "crew_inputs", json.dumps(inputs) if inputs else None
                    )
-
+                else:
+                    self._add_attribute(
+                        span,
+                        "crew_agents",
+                        json.dumps(
+                            [
+                                {
+                                    "key": agent.key,
+                                    "id": str(agent.id),
+                                    "role": agent.role,
+                                    "verbose?": agent.verbose,
+                                    "max_iter": agent.max_iter,
+                                    "max_rpm": agent.max_rpm,
+                                    "function_calling_llm": json.dumps(
+                                        self._safe_llm_attributes(
+                                            agent.function_calling_llm
+                                        )
+                                    ),
+                                    "llm": json.dumps(
+                                        self._safe_llm_attributes(agent.llm)
+                                    ),
+                                    "delegation_enabled?": agent.allow_delegation,
+                                    "allow_code_execution?": agent.allow_code_execution,
+                                    "max_retry_limit": agent.max_retry_limit,
+                                    "tools_names": [
+                                        tool.name.casefold()
+                                        for tool in agent.tools or []
+                                    ],
+                                }
+                                for agent in crew.agents
+                            ]
+                        ),
+                    )
+                    self._add_attribute(
+                        span,
+                        "crew_tasks",
+                        json.dumps(
+                            [
+                                {
+                                    "key": task.key,
+                                    "id": str(task.id),
+                                    "async_execution?": task.async_execution,
+                                    "human_input?": task.human_input,
+                                    "agent_role": task.agent.role
+                                    if task.agent
+                                    else "None",
+                                    "agent_key": task.agent.key if task.agent else None,
+                                    "tools_names": [
+                                        tool.name.casefold()
+                                        for tool in task.tools or []
+                                    ],
+                                }
+                                for task in crew.tasks
+                            ]
+                        ),
+                    )
                span.set_status(Status(StatusCode.OK))
                span.end()
            except Exception: