mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-07 15:18:29 +00:00
* Make tests green again * Add Git validations for publishing tools (#1381) This commit prevents tools from being published if the underlying Git repository is unsynced with origin. * fix: JSON encoding date objects (#1374) * Update README (#1376) * Change all instaces of crewAI to CrewAI and fix installation step * Update the example to use YAML format * Update to come after setup and edits * Remove double tool instance * docs: correct miswritten command name (#1365) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Add `--force` option to `crewai tool publish` (#1383) This commit adds an option to bypass Git remote validations when publishing tools. * add plotting to flows documentation (#1394) * Brandon/cre 288 add telemetry to flows (#1391) * Telemetry for flows * store node names * Brandon/cre 291 flow improvements (#1390) * Implement joao feedback * update colors for crew nodes * clean up * more linting clean up * round legend corners --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> * quick fixes (#1385) * quick fixes * add generic name --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> * reduce import time by 6x (#1396) * reduce import by 6x * fix linting * Added version details (#1402) Co-authored-by: João Moura <joaomdmoura@gmail.com> * Update twitter logo to x-twiiter (#1403) * fix task cloning error (#1416) * Migrate docs from MkDocs to Mintlify (#1423) * add new mintlify docs * add favicon.svg * minor edits * add github stats * Fix/logger - fix #1412 (#1413) * improved logger * log file looks better * better lines written to log file --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> * fixing tests * preparing new version * updating init * Preparing new version * Trying to fix linting and other warnings (#1417) * Trying to fix linting * fixing more type issues * clean up ci * more ci fixes --------- Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> * Feat/poetry to uv migration (#1406) * feat: Start migrating to UV * feat: add uv to flows * feat: update docs on Poetry -> uv * feat: update docs and uv.locl * feat: update tests and github CI * feat: run ruff format * feat: update typechecking * feat: fix type checking * feat: update python version * feat: type checking gic * feat: adapt uv command to run the tool repo * Adapt tool build command to uv * feat: update logic to let only projects with crew to be deployed * feat: add uv to tools * fix; tests * fix: remove breakpoint * fix :test * feat: add crewai update to migrate from poetry to uv * fix: tests * feat: add validation for ˆ character on pyproject * feat: add run_crew to pyproject if doesnt exist * feat: add validation for poetry migration * fix: warning --------- Co-authored-by: Vinicius Brasil <vini@hey.com> * fix: training issue (#1433) * fix: training issue * fix: output from crew * fix: message * Use a slice for the manager request. Make the task use the agent i18n settings (#1446) * Fix Cache Typo in Documentation (#1441) * Correct the role for the message being added to the messages list (#1438) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * fix typo in template file (#1432) * Adapt Tools CLI to uv (#1455) * Adapt Tools CLI to UV * Fix failing test * use the same i18n as the agent for tool usage (#1440) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Upgrade docs to mirror change from `Poetry` to `UV` (#1451) * Update docs to use instead of * Add Flows YouTube tutorial & link images * feat: ADd warning from poetry -> uv (#1458) * feat/updated CLI to allow for model selection & submitting API keys (#1430) * updated CLI to allow for submitting API keys * updated click prompt to remove default number * removed all unnecessary comments * feat: implement crew creation CLI command - refactor code to multiple functions - Added ability for users to select provider and model when uing crewai create command and ave API key to .env * refactered select_choice function for early return * refactored select_provider to have an ealry return * cleanup of comments * refactor/Move functions into utils file, added new provider file and migrated fucntions thre, new constants file + general function refactor * small comment cleanup * fix unnecessary deps --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: Brandon Hancock <brandon@brandonhancock.io> * Fix incorrect parameter name in Vision tool docs page (#1461) Co-authored-by: João Moura <joaomdmoura@gmail.com> * Feat/memory base (#1444) * byom - short/entity memory * better * rm uneeded * fix text * use context * rm dep and sync * type check fix * fixed test using new cassete * fixing types * fixed types * fix types * fixed types * fixing types * fix type * cassette update * just mock the return of short term mem * remove print * try catch block * added docs * dding error handling here * preparing new version * fixing annotations * fix tasks and agents ordering * Avoiding exceptions * feat: add poetry.lock to uv migration (#1468) * fix tool calling issue (#1467) * fix tool calling issue * Update tool type check * Drop print * cutting new version * new verison * Adapt `crewai tool install <tool>` to uv (#1481) This commit updates the tool install comamnd to uv's new custom index feature. Related: https://github.com/astral-sh/uv/pull/7746/ * fix(docs): typo (#1470) * drop unneccesary tests (#1484) * drop uneccesary tests * fix linting * simplify flow (#1482) * simplify flow * propogate changes * Update docs and scripts * Template fix * make flow kickoff sync * Clean up docs * Add Cerebras LLM example configuration to LLM docs (#1488) * ensure original embedding config works (#1476) * ensure original embedding config works * some fixes * raise error on unsupported provider * WIP: brandons notes * fixes * rm prints * fixed docs * fixed run types * updates to add more docs and correct imports with huggingface embedding server enabled --------- Co-authored-by: Brandon Hancock <brandon@brandonhancock.io> * use copy to split testing and training on crews (#1491) * use copy to split testing and training on crews * make tests handle new copy functionality on train and test * fix last test * fix test * preparing new verison * fix/fixed missing API prompt + CLI docs update (#1464) * updated CLI to allow for submitting API keys * updated click prompt to remove default number * removed all unnecessary comments * feat: implement crew creation CLI command - refactor code to multiple functions - Added ability for users to select provider and model when uing crewai create command and ave API key to .env * refactered select_choice function for early return * refactored select_provider to have an ealry return * cleanup of comments * refactor/Move functions into utils file, added new provider file and migrated fucntions thre, new constants file + general function refactor * small comment cleanup * fix unnecessary deps * Added docs for new CLI provider + fixed missing API prompt * Minor doc updates * allow user to bypass api key entry + incorect number selected logic + ruff formatting * ruff updates * Fix spelling mistake --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: Brandon Hancock <brandon@brandonhancock.io> * chore(readme-fix): fixing step for 'running tests' in the contribution section (#1490) Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> * support unsafe code execution. add in docker install and running checks. (#1496) * support unsafe code execution. add in docker install and running checks. * Update return type * Fix memory imports for embedding functions (#1497) * updating crewai version * new version * new version * update plot command (#1504) * feat: add tomli so we can support 3.10 (#1506) * feat: add tomli so we can support 3.10 * feat: add validation for poetry data * Forward install command options to `uv sync` (#1510) Allow passing additional options from `crewai install` directly to `uv sync`. This enables commands like `crewai install --locked` to work as expected by forwarding all flags and options to the underlying uv command. * improve tool text description and args (#1512) * improve tool text descriptoin and args * fix lint * Drop print * add back in docstring * Improve tooling docs * Update flow docs to talk about self evaluation example * Update flow docs to talk about self evaluation example * Update flows.mdx - Fix link * Update flows cli to allow you to easily add additional crews to a flow (#1525) * Update flows cli to allow you to easily add additional crews to a flow * fix failing test * adding more error logs to test thats failing * try again * Bugfix/flows with multiple starts plus ands breaking (#1531) * bugfix/flows-with-multiple-starts-plus-ands-breaking * fix user found issue * remove prints * prepare new version * Added security.md file (#1533) * Disable telemetry explicitly (#1536) * Disable telemetry explicitly * fix linting * revert parts to og * Enhance log storage to support more data types (#1530) * Add llm providers accordion group (#1534) * add llm providers accordion group * fix numbering * Replace .netrc with uv environment variables (#1541) This commit replaces .netrc with uv environment variables for installing tools from private repositories. To store credentials, I created a new and reusable settings file for the CLI in `$HOME/.config/crewai/settings.json`. The issue with .netrc files is that they are applied system-wide and are scoped by hostname, meaning we can't differentiate tool repositories requests from regular requests to CrewAI's API. * refactor: Move BaseTool to main package and centralize tool description generation (#1514) * move base_tool to main package and consolidate tool desscription generation * update import path * update tests * update doc * add base_tool test * migrate agent delegation tools to use BaseTool * update tests * update import path for tool * fix lint * update param signature * add from_langchain to BaseTool for backwards support of langchain tools * fix the case where StructuredTool doesn't have func --------- Co-authored-by: c0dez <li@vitablehealth.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Update docs (#1550) * add llm providers accordion group * fix numbering * Fix directory tree & add llms to accordion * Feat/ibm memory (#1549) * Everything looks like its working. Waiting for lorenze review. * Update docs as well. * clean up for PR * add inputs to flows (#1553) * add inputs to flows * fix flows lint * Increase providers fetching timeout * Raise an error if an LLM doesnt return a response (#1548) * docs update (#1558) * add llm providers accordion group * fix numbering * Fix directory tree & add llms to accordion * update crewai enterprise link in docs * Feat/watson in cli (#1535) * getting cli and .env to work together for different models * support new models * clean up prints * Add support for cerebras * Fix watson keys * Fix flows to support cycles and added in test (#1556) * fix missing config (#1557) * making sure we don't check for agents that were not used in the crew * preparing new version * updating LLM docs * preparing new version * curring new version * preparing new version * preparing new version * add missing init * fix LiteLLM callback replacement * fix test_agent_usage_metrics_are_captured_for_hierarchical_process * removing prints * fix: Step callback issue (#1595) * fix: Step callback issue * fix: Add empty thought since its required * Cached prompt tokens on usage metrics * do not include cached on total * Fix crew_train_success test * feat: Reduce level for Bandit and fix code to adapt (#1604) * Add support for retrieving user preferences and memories using Mem0 (#1209) * Integrate Mem0 * Update src/crewai/memory/contextual/contextual_memory.py Co-authored-by: Deshraj Yadav <deshraj@gatech.edu> * pending commit for _fetch_user_memories * update poetry.lock * fixes mypy issues * fix mypy checks * New fixes for user_id * remove memory_provider * handle memory_provider * checks for memory_config * add mem0 to dependency * Update pyproject.toml Co-authored-by: Deshraj Yadav <deshraj@gatech.edu> * update docs * update doc * bump mem0 version * fix api error msg and mypy issue * mypy fix * resolve comments * fix memory usage without mem0 * mem0 version bump * lazy import mem0 --------- Co-authored-by: Deshraj Yadav <deshraj@gatech.edu> Co-authored-by: João Moura <joaomdmoura@gmail.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * upgrade chroma and adjust embedder function generator (#1607) * upgrade chroma and adjust embedder function generator * >= version * linted * preparing enw version * adding before and after crew * Update CLI Watson supported models + docs (#1628) * docs: add gh_token documentation to GithubSearchTool * Move kickoff callbacks to crew's domain * Cassettes * Make mypy happy * Knowledge (#1567) * initial knowledge * WIP * Adding core knowledge sources * Improve types and better support for file paths * added additional sources * fix linting * update yaml to include optional deps * adding in lorenze feedback * ensure embeddings are persisted * improvements all around Knowledge class * return this * properly reset memory * properly reset memory+knowledge * consolodation and improvements * linted * cleanup rm unused embedder * fix test * fix duplicate * generating cassettes for knowledge test * updated default embedder * None embedder to use default on pipeline cloning * improvements * fixed text_file_knowledge * mypysrc fixes * type check fixes * added extra cassette * just mocks * linted * mock knowledge query to not spin up db * linted * verbose run * put a flag * fix * adding docs * better docs * improvements from review * more docs * linted * rm print * more fixes * clearer docs * added docstrings and type hints for cli --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> Co-authored-by: Lorenze Jay <lorenzejaytech@gmail.com> * Updated README.md, fix typo(s) (#1637) * Update Perplexity example in documentation (#1623) * Fix threading * preparing new version * Log in to Tool Repository on `crewai login` (#1650) This commit adds an extra step to `crewai login` to ensure users also log in to Tool Repository, that is, exchanging their Auth0 tokens for a Tool Repository username and password to be used by UV downloads and API tool uploads. * add knowledge to mint.json * Improve typed task outputs (#1651) * V1 working * clean up imports and prints * more clean up and add tests * fixing tests * fix test * fix linting * Fix tests * Fix linting * add doc string as requested by eduardo * Update Github actions (#1639) * actions/checkout@v4 * actions/cache@v4 * actions/setup-python@v5 --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * update (#1638) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * fix spelling issue found by @Jacques-Murray (#1660) * Update readme for running mypy (#1614) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Feat/remove langchain (#1654) * feat: add initial changes from langchain * feat: remove kwargs of being processed * feat: remove langchain, update uv.lock and fix type_hint * feat: change docs * feat: remove forced requirements for parameter * feat add tests for new structure tool * feat: fix tests and adapt code for args * Feat/remove langchain (#1668) * feat: add initial changes from langchain * feat: remove kwargs of being processed * feat: remove langchain, update uv.lock and fix type_hint * feat: change docs * feat: remove forced requirements for parameter * feat add tests for new structure tool * feat: fix tests and adapt code for args * fix tool calling for langchain tools * doc strings --------- Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> * added knowledge to agent level (#1655) * added knowledge to agent level * linted * added doc * added from suggestions * added test * fixes from discussion * fix docs * fix test * rm cassette for knowledge_sources test as its a mock and update agent doc string * fix test * rm unused * linted * Update Agents docs to include two approaches for creating an agent: with and without YAML configuration * Documentation Improvements: LLM Configuration and Usage (#1684) * docs: improve tasks documentation clarity and structure - Add Task Execution Flow section - Add variable interpolation explanation - Add Task Dependencies section with examples - Improve overall document structure and readability - Update code examples with proper syntax highlighting * docs: update agent documentation with improved examples and formatting - Replace DuckDuckGoSearchRun with SerperDevTool - Update code block formatting to be consistent - Improve template examples with actual syntax - Update LLM examples to use current models - Clean up formatting and remove redundant comments * docs: enhance LLM documentation with Cerebras provider and formatting improvements * docs: simplify LLMs documentation title * docs: improve installation guide clarity and structure - Add clear Python version requirements with check command - Simplify installation options to recommended method - Improve upgrade section clarity for existing users - Add better visual structure with Notes and Tips - Update description and formatting * docs: improve introduction page organization and clarity - Update organizational analogy in Note section - Improve table formatting and alignment - Remove emojis from component table for cleaner look - Add 'helps you' to make the note more action-oriented * docs: add enterprise and community cards - Add Enterprise deployment card in quickstart - Add community card focused on open source discussions - Remove deployment reference from community description - Clean up introduction page cards - Remove link from Enterprise description text * Fixes issues with result as answer not properly exiting LLM loop (#1689) * v1 of fix implemented. Need to confirm with tokens. * remove print statements * preparing new version * fix missing code in flows docs (#1690) * docs: improve tasks documentation clarity and structure - Add Task Execution Flow section - Add variable interpolation explanation - Add Task Dependencies section with examples - Improve overall document structure and readability - Update code examples with proper syntax highlighting * docs: update agent documentation with improved examples and formatting - Replace DuckDuckGoSearchRun with SerperDevTool - Update code block formatting to be consistent - Improve template examples with actual syntax - Update LLM examples to use current models - Clean up formatting and remove redundant comments * docs: enhance LLM documentation with Cerebras provider and formatting improvements * docs: simplify LLMs documentation title * docs: improve installation guide clarity and structure - Add clear Python version requirements with check command - Simplify installation options to recommended method - Improve upgrade section clarity for existing users - Add better visual structure with Notes and Tips - Update description and formatting * docs: improve introduction page organization and clarity - Update organizational analogy in Note section - Improve table formatting and alignment - Remove emojis from component table for cleaner look - Add 'helps you' to make the note more action-oriented * docs: add enterprise and community cards - Add Enterprise deployment card in quickstart - Add community card focused on open source discussions - Remove deployment reference from community description - Clean up introduction page cards - Remove link from Enterprise description text * docs: add code snippet to Getting Started section in flows.mdx --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Update reset memories command based on the SDK (#1688) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Update using langchain tools docs (#1664) * Update example of how to use LangChain tools with correct syntax * Use .env * Add Code back --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * [FEATURE] Support for custom path in RAGStorage (#1659) * added path to RAGStorage * added path to short term and entity memory * add path for long_term_storage for completeness --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * [Doc]: Add documenation for openlit observability (#1612) * Create openlit-observability.mdx * Update doc with images and steps * Update mkdocs.yml and add OpenLIT guide link --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Fix indentation in llm-connections.mdx code block (#1573) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Knowledge project directory standard (#1691) * Knowledge project directory standard * fixed types * comment fix * made base file knowledge source an abstract class * cleaner validator on model_post_init * fix type checker * cleaner refactor * better template * Update README.md (#1694) Corrected the statement which says users can not disable telemetry, but now users can disable by setting the environment variable OTEL_SDK_DISABLED to true. Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Talk about getting structured consistent outputs with tasks. * remove all references to pipeline and pipeline router (#1661) * remove all references to pipeline and router * fix linting * drop poetry.lock * docs: add nvidia as provider (#1632) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * add knowledge demo + improve knowledge docs (#1706) * Brandon/cre 509 hitl multiple rounds of followup (#1702) * v1 of HITL working * Drop print statements * HITL code more robust. Still needs to be refactored. * refactor and more clear messages * Fix type issue * fix tests * Fix test again * Drop extra print * New docs about yaml crew with decorators. Simplify template crew with… (#1701) * New docs about yaml crew with decorators. Simplify template crew with links * Fix spelling issues. * updating tools * curting new verson * Incorporate Stale PRs that have feedback (#1693) * incorporate #1683 * add in --version flag to cli. closes #1679. * Fix env issue * Add in suggestions from @caike to make sure ragstorage doesnt exceed os file limit. Also, included additional checks to support windows. * remove poetry.lock as pointed out by @sanders41 in #1574. * Incorporate feedback from crewai reviewer * Incorporate @lorenzejay feedback * drop metadata requirement (#1712) * drop metadata requirement * fix linting * Update docs for new knowledge * more linting * more linting * make save_documents private * update docs to the new way we use knowledge and include clearing memory * add support for langfuse with litellm (#1721) * docs: Add quotes to agentops installing command (#1729) * docs: Add quotes to agentops installing command * feat: Add ContextualMemory to __init__ * feat: remove import due to circular improt * feat: update tasks config main template typos * Fixed output_file not respecting system path (#1726) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * fix:typo error (#1732) * Update crew_agent_executor.py typo error * Update en.json typo error * Fix Knowledge docs Spaceflight News API dead link * call storage.search in user context search instead of memory.search (#1692) Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> * Add doc structured tool (#1713) * Add doc structured tool * Fix example --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * _execute_tool_and_check_finality 结果给回调参数,这样就可以提前拿到结果信息,去做数据解析判断做预判 (#1716) Co-authored-by: xiaohan <fuck@qq.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * format bullet points (#1734) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Add missing @functools.wraps when wrapping functions and preserve wrapped class name in @CrewBase. (#1560) * Update annotations.py * Update utils.py * Update crew_base.py * Update utils.py * Update crew_base.py --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Fix disk I/O error when resetting short-term memory. (#1724) * Fix disk I/O error when resetting short-term memory. Reset chromadb client and nullifies references before removing directory. * Nit for clarity * did the same for knowledge_storage * cleanup * cleanup order * Cleanup after the rm of the directories --------- Co-authored-by: Lorenze Jay <lorenzejaytech@gmail.com> Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com> * restrict python version compatibility (#1731) * drop 3.13 * revert * Drop test cassette that was causing error * trying to fix failing test * adding thiago changes * resolve final tests * Drop skip * Bugfix/restrict python version compatibility (#1736) * drop 3.13 * revert * Drop test cassette that was causing error * trying to fix failing test * adding thiago changes * resolve final tests * Drop skip * drop pipeline * Update pyproject.toml and uv.lock to drop crewai-tools as a default requirement (#1711) * copy googles changes. Fix tests. Improve LLM file (#1737) * copy googles changes. Fix tests. Improve LLM file * Fix type issue * fix:typo error (#1738) * Update base_agent_tools.py typo error * Update main.py typo error * Update base_file_knowledge_source.py typo error * Update test_main.py typo error * Update en.json * Update prompts.json --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Remove manager_callbacks reference (#1741) * include event emitter in flows (#1740) * include event emitter in flows * Clean up * Fix linter * sort imports with isort rules by ruff linter (#1730) * sort imports * update --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> * Added is_auto_end flag in agentops.end session in crew.py (#1320) When using agentops, we have the option to pass the `skip_auto_end_session` parameter, which is supposed to not end the session if the `end_session` function is called by Crew. Now the way it works is, the `agentops.end_session` accepts `is_auto_end` flag and crewai should have passed it as `True` (its `False` by default). I have changed the code to pass is_auto_end=True Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * NVIDIA Provider : UI changes (#1746) * docs: add nvidia as provider * nvidia ui docs changes * add note for updated list --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Fix small typo in sample tool (#1747) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Feature/add workflow permissions (#1749) * fix: Call ChromaDB reset before removing storage directory to fix disk I/O errors * feat: add workflow permissions to stale.yml * revert rag_storage.py changes * revert rag_storage.py changes --------- Co-authored-by: Matt B <mattb@Matts-MacBook-Pro.local> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * remove pkg_resources which was causing issues (#1751) * apply agent ops changes and resolve merge conflicts (#1748) * apply agent ops changes and resolve merge conflicts * Trying to fix tests * add back in vcr * update tools * remove pkg_resources which was causing issues * Fix tests * experimenting to see if unique content is an issue with knowledge * experimenting to see if unique content is an issue with knowledge * update chromadb which seems to have issues with upsert * generate new yaml for failing test * Investigating upsert * Drop patch * Update casettes * Fix duplicate document issue * more fixes * add back in vcr * new cassette for test --------- Co-authored-by: Lorenze Jay <lorenzejaytech@gmail.com> * drop print (#1755) * Fix: CrewJSONEncoder now accepts enums (#1752) * bugfix: CrewJSONEncoder now accepts enums * sort imports --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Fix bool and null handling (#1771) * include 12 but not 13 * change to <13 instead of <=12 * Gemini 2.0 (#1773) * Update llms.mdx (Gemini 2.0) - Add Gemini 2.0 flash to Gemini table. - Add link to 2 hosting paths for Gemini in Tip. - Change to lower case model slugs vs names, user convenience. - Add https://artificialanalysis.ai/ as alternate leaderboard. - Move Gemma to "other" tab. * Update llm.py (gemini 2.0) Add setting for Gemini 2.0 context window to llm.py --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * Remove relative import in flow `main.py` template (#1782) * Add `tool.crewai.type` pyproject attribute in templates (#1789) * Correcting a small grammatical issue that was bugging me: from _satisfy the expect criteria_ to _satisfies the expected criteria_ (#1783) Signed-off-by: PJ Hagerty <pjhagerty@gmail.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> * feat: Add task guardrails feature (#1742) * feat: Add task guardrails feature Add support for custom code guardrails in tasks that validate outputs before proceeding to the next task. Features include: - Optional task-level guardrail function - Pre-next-task execution timing - Tuple return format (success, data) - Automatic result/error routing - Configurable retry mechanism - Comprehensive documentation and tests Link to Devin run: https://app.devin.ai/sessions/39f6cfd6c5a24d25a7bd70ce070ed29a Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Add type check for guardrail result and remove unused import Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Remove unnecessary f-string prefix Co-Authored-By: Joe Moura <joao@crewai.com> * feat: Add guardrail validation improvements - Add result/error exclusivity validation in GuardrailResult - Make return type annotations optional in Task guardrail validator - Improve error messages for validation failures Co-Authored-By: Joe Moura <joao@crewai.com> * docs: Add comprehensive guardrails documentation - Add type hints and examples - Add error handling best practices - Add structured error response patterns - Document retry mechanisms - Improve documentation organization Co-Authored-By: Joe Moura <joao@crewai.com> * refactor: Update guardrail functions to handle TaskOutput objects Co-Authored-By: Joe Moura <joao@crewai.com> * feat: Add task guardrails feature Add support for custom code guardrails in tasks that validate outputs before proceeding to the next task. Features include: - Optional task-level guardrail function - Pre-next-task execution timing - Tuple return format (success, data) - Automatic result/error routing - Configurable retry mechanism - Comprehensive documentation and tests Link to Devin run: https://app.devin.ai/sessions/39f6cfd6c5a24d25a7bd70ce070ed29a Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Add type check for guardrail result and remove unused import Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Remove unnecessary f-string prefix Co-Authored-By: Joe Moura <joao@crewai.com> * feat: Add guardrail validation improvements - Add result/error exclusivity validation in GuardrailResult - Make return type annotations optional in Task guardrail validator - Improve error messages for validation failures Co-Authored-By: Joe Moura <joao@crewai.com> * docs: Add comprehensive guardrails documentation - Add type hints and examples - Add error handling best practices - Add structured error response patterns - Document retry mechanisms - Improve documentation organization Co-Authored-By: Joe Moura <joao@crewai.com> * refactor: Update guardrail functions to handle TaskOutput objects Co-Authored-By: Joe Moura <joao@crewai.com> * style: Fix import sorting in task guardrails files Co-Authored-By: Joe Moura <joao@crewai.com> * fixing docs * Fixing guardarils implementation * docs: Enhance guardrail validator docstring with runtime validation rationale Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: João Moura <joaomdmoura@gmail.com> * feat: Add interpolate_only method and improve error handling (#1791) * Fixed output_file not respecting system path * Fixed yaml config is not escaped properly for output requirements * feat: Add interpolate_only method and improve error handling - Add interpolate_only method for string interpolation while preserving JSON structure - Add comprehensive test coverage for interpolate_only - Add proper type annotation for logger using ClassVar - Improve error handling and documentation for _save_file method Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Sort imports to fix lint issues Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Reorganize imports using ruff --fix Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Consolidate imports and fix formatting Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Apply ruff automatic import sorting Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Sort imports using ruff --fix Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com> Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: Frieda Huang <124417784+frieda-huang@users.noreply.github.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * Feat/docling-support (#1763) * added tool for docling support * docling support installation * use file_paths instead of file_path * fix import * organized imports * run_type docs * needs to be list * fixed logic * logged but file_path is backwards compatible * use file_paths instead of file_path 2 * added test for multiple sources for file_paths * fix run-types * enabling local files to work and type cleanup * linted * fix test and types * fixed run types * fix types * renamed to CrewDoclingSource * linted * added docs * resolve conflicts --------- Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> Co-authored-by: Brandon Hancock <brandon@brandonhancock.io> * removed some redundancies (#1796) * removed some redundancies * cleanup * Feat/joao flow improvement requests (#1795) * Add in or and and in router * In the middle of improving plotting * final plot changes --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> * Adding Multimodal Abilities to Crew (#1805) * initial fix on delegation tools * fixing tests for delegations and coding * Refactor prepare tool and adding initial add images logic * supporting image tool * fixing linter * fix linter * Making sure multimodal feature support i18n * fix linter and types * mixxing translations * fix types and linter * Revert "fixing linter" This reverts commit ef323e3487e62ee4f5bce7f86378068a5ac77e16. * fix linters * test * fix * fix * fix linter * fix * ignore * type improvements * chore: removing crewai-tools from dev-dependencies (#1760) As mentioned in issue #1759, listing crewai-tools as dev-dependencies makes pip install it a required dependency, and not an optional Co-authored-by: João Moura <joaomdmoura@gmail.com> * docs: add guide for multimodal agents (#1807) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * Portkey Integration with CrewAI (#1233) * Create Portkey-Observability-and-Guardrails.md * crewAI update with new changes * small change --------- Co-authored-by: siddharthsambharia-portkey <siddhath.s@portkey.ai> Co-authored-by: João Moura <joaomdmoura@gmail.com> * fix: Change storage initialization to None for KnowledgeStorage (#1804) * fix: Change storage initialization to None for KnowledgeStorage * refactor: Change storage field to optional and improve error handling when saving documents --------- Co-authored-by: João Moura <joaomdmoura@gmail.com> * fix: handle optional storage with null checks (#1808) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: João Moura <joaomdmoura@gmail.com> * docs: update README to highlight Flows (#1809) * docs: highlight Flows feature in README Co-Authored-By: Joe Moura <joao@crewai.com> * docs: enhance README with LangGraph comparison and flows-crews synergy Co-Authored-By: Joe Moura <joao@crewai.com> * docs: replace initial Flow example with advanced Flow+Crew example; enhance LangGraph comparison Co-Authored-By: Joe Moura <joao@crewai.com> * docs: incorporate key terms and enhance feature descriptions Co-Authored-By: Joe Moura <joao@crewai.com> * docs: refine technical language, enhance feature descriptions, fix string interpolation Co-Authored-By: Joe Moura <joao@crewai.com> * docs: update README with performance metrics, feature enhancements, and course links Co-Authored-By: Joe Moura <joao@crewai.com> * docs: update LangGraph comparison with paragraph and P.S. section Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * Update README.md * docs: add agent-specific knowledge documentation and examples (#1811) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * fixing file paths for knowledge source * Fix interpolation for output_file in Task (#1803) (#1814) * fix: interpolate output_file attribute from YAML Co-Authored-By: Joe Moura <joao@crewai.com> * fix: add security validation for output_file paths Co-Authored-By: Joe Moura <joao@crewai.com> * fix: add _original_output_file private attribute to fix type-checker error Co-Authored-By: Joe Moura <joao@crewai.com> * fix: update interpolate_only to handle None inputs and remove duplicate attribute Co-Authored-By: Joe Moura <joao@crewai.com> * fix: improve output_file validation and error messages Co-Authored-By: Joe Moura <joao@crewai.com> * test: add end-to-end tests for output_file functionality Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * fix(manager_llm): handle coworker role name case/whitespace properly (#1820) * fix(manager_llm): handle coworker role name case/whitespace properly - Add .strip() to agent name and role comparisons in base_agent_tools.py - Add test case for varied role name cases and whitespace - Fix issue #1503 with manager LLM delegation Co-Authored-By: Joe Moura <joao@crewai.com> * fix(manager_llm): improve error handling and add debug logging - Add debug logging for better observability - Add sanitize_agent_name helper method - Enhance error messages with more context - Add parameterized tests for edge cases: - Embedded quotes - Trailing newlines - Multiple whitespace - Case variations - None values - Improve error handling with specific exceptions Co-Authored-By: Joe Moura <joao@crewai.com> * style: fix import sorting in base_agent_tools and test_manager_llm_delegation Co-Authored-By: Joe Moura <joao@crewai.com> * fix(manager_llm): improve whitespace normalization in role name matching Co-Authored-By: Joe Moura <joao@crewai.com> * style: fix import sorting in base_agent_tools and test_manager_llm_delegation Co-Authored-By: Joe Moura <joao@crewai.com> * fix(manager_llm): add error message template for agent tool execution errors Co-Authored-By: Joe Moura <joao@crewai.com> * style: fix import sorting in test_manager_llm_delegation.py Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * fix: add tiktoken as explicit dependency and document Rust requirement (#1826) * feat: add tiktoken as explicit dependency and document Rust requirement - Add tiktoken>=0.8.0 as explicit dependency to ensure pre-built wheels are used - Document Rust compiler requirement as fallback in README.md - Addresses issue #1824 tiktoken build failure Co-Authored-By: Joe Moura <joao@crewai.com> * fix: adjust tiktoken version to ~=0.7.0 for dependency compatibility - Update tiktoken dependency to ~=0.7.0 to resolve conflict with embedchain - Maintain compatibility with crewai-tools dependency chain - Addresses CI build failures Co-Authored-By: Joe Moura <joao@crewai.com> * docs: add troubleshooting section and make tiktoken optional Co-Authored-By: Joe Moura <joao@crewai.com> * Update README.md --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> Co-authored-by: João Moura <joaomdmoura@gmail.com> * Docstring, Error Handling, and Type Hints Improvements (#1828) * docs: add comprehensive docstrings to Flow class and methods - Added NumPy-style docstrings to all decorator functions - Added detailed documentation to Flow class methods - Included parameter types, return types, and examples - Enhanced documentation clarity and completeness Co-Authored-By: Joe Moura <joao@crewai.com> * feat: add secure path handling utilities - Add path_utils.py with safe path handling functions - Implement path validation and security checks - Integrate secure path handling in flow_visualizer.py - Add path validation in html_template_handler.py - Add comprehensive error handling for path operations Co-Authored-By: Joe Moura <joao@crewai.com> * docs: add comprehensive docstrings and type hints to flow utils (#1819) Co-Authored-By: Joe Moura <joao@crewai.com> * fix: add type annotations and fix import sorting Co-Authored-By: Joe Moura <joao@crewai.com> * fix: add type annotations to flow utils and visualization utils Co-Authored-By: Joe Moura <joao@crewai.com> * fix: resolve import sorting and type annotation issues Co-Authored-By: Joe Moura <joao@crewai.com> * fix: properly initialize and update edge_smooth variable Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> * feat: add docstring (#1819) Co-authored-by: João Moura <joaomdmoura@gmail.com> * fix: Include agent knowledge in planning process (#1818) * test: Add test demonstrating knowledge not included in planning process Issue #1703: Add test to verify that agent knowledge sources are not currently included in the planning process. This test will help validate the fix once implemented. - Creates agent with knowledge sources - Verifies knowledge context missing from planning - Checks other expected components are present Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Include agent knowledge in planning process Issue #1703: Integrate agent knowledge sources into planning summaries - Add agent_knowledge field to task summaries in planning_handler - Update test to verify knowledge inclusion - Ensure knowledge context is available during planning phase The planning agent now has access to agent knowledge when creating task execution plans, allowing for better informed planning decisions. Co-Authored-By: Joe Moura <joao@crewai.com> * style: Fix import sorting in test_knowledge_planning.py - Reorganize imports according to ruff linting rules - Fix I001 linting error Co-Authored-By: Joe Moura <joao@crewai.com> * test: Update task summary assertions to include knowledge field Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update ChromaDB mock path and fix knowledge string formatting Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Improve knowledge integration in planning process with error handling Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update task summary format for empty tools and knowledge - Change empty tools message to 'agent has no tools' - Remove agent_knowledge field when empty - Update test assertions to match new format - Improve test messages for clarity Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update string formatting for agent tools in task summary Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update string formatting for agent tools in task summary Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update string formatting for agent tools and knowledge in task summary Co-Authored-By: Joe Moura <joao@crewai.com> * fix: Update knowledge field formatting in task summary Co-Authored-By: Joe Moura <joao@crewai.com> * style: Fix import sorting in test_planning_handler.py Co-Authored-By: Joe Moura <joao@crewai.com> * style: Fix import sorting order in test_planning_handler.py Co-Authored-By: Joe Moura <joao@crewai.com> * test: Add ChromaDB mocking to test_create_tasks_summary_with_knowledge_and_tools Co-Authored-By: Joe Moura <joao@crewai.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> Co-authored-by: João Moura <joaomdmoura@gmail.com> * Suppressed userWarnings from litellm pydantic issues (#1833) * Suppressed userWarnings from litellm pydantic issues * change litellm version * Fix failling ollama tasks * Trying out timeouts * Trying out timeouts * trying next crew_test timeout * trying next crew_test timeout * timeout in crew_tests * timeout in crew_tests * more timeouts * more timeouts * crew_test changes werent applied * crew_test changes werent applied * revert uv.lock * revert uv.lock * add back in crewai tool dependencies and drop litellm version * add back in crewai tool dependencies and drop litellm version * tests should work now * tests should work now * more test changes * more test changes * Reverting uv.lock and pyproject * Reverting uv.lock and pyproject * Update llama3 cassettes * Update llama3 cassettes * sync packages with uv.lock * sync packages with uv.lock * more test fixes * fix tets * drop large file * final clean up * drop record new episodes --------- Signed-off-by: PJ Hagerty <pjhagerty@gmail.com> Co-authored-by: Thiago Moretto <168731+thiagomoretto@users.noreply.github.com> Co-authored-by: Thiago Moretto <thiago.moretto@gmail.com> Co-authored-by: Vini Brasil <vini@hey.com> Co-authored-by: Guilherme de Amorim <ggimenezjr@gmail.com> Co-authored-by: Tony Kipkemboi <iamtonykipkemboi@gmail.com> Co-authored-by: Eren Küçüker <66262604+erenkucuker@users.noreply.github.com> Co-authored-by: João Moura <joaomdmoura@gmail.com> Co-authored-by: Akesh kumar <155313882+akesh-0909@users.noreply.github.com> Co-authored-by: Lennex Zinyando <brizdigital@gmail.com> Co-authored-by: Shahar Yair <shya95@gmail.com> Co-authored-by: Eduardo Chiarotti <dudumelgaco@hotmail.com> Co-authored-by: Stephen Hankinson <shankinson@gmail.com> Co-authored-by: Muhammad Noman Fareed <60171953+shnoman97@users.noreply.github.com> Co-authored-by: dbubel <50341559+dbubel@users.noreply.github.com> Co-authored-by: Rip&Tear <84775494+theCyberTech@users.noreply.github.com> Co-authored-by: Rok Benko <115651717+rokbenko@users.noreply.github.com> Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com> Co-authored-by: Sam <sammcj@users.noreply.github.com> Co-authored-by: Maicon Peixinho <maiconpeixinho@icloud.com> Co-authored-by: Robin Wang <6220861+MottoX@users.noreply.github.com> Co-authored-by: C0deZ <c0dezlee@gmail.com> Co-authored-by: c0dez <li@vitablehealth.com> Co-authored-by: Gui Vieira <guilherme_vieira@me.com> Co-authored-by: Dev Khant <devkhant24@gmail.com> Co-authored-by: Deshraj Yadav <deshraj@gatech.edu> Co-authored-by: Gui Vieira <gui@crewai.com> Co-authored-by: Lorenze Jay <lorenzejaytech@gmail.com> Co-authored-by: Bob Conan <sufssl03@gmail.com> Co-authored-by: Andy Bromberg <abromberg@users.noreply.github.com> Co-authored-by: Bowen Liang <bowenliang@apache.org> Co-authored-by: Ivan Peevski <133036+ipeevski@users.noreply.github.com> Co-authored-by: Rok Benko <ksjeno@gmail.com> Co-authored-by: Javier Saldaña <cjaviersaldana@outlook.com> Co-authored-by: Ola Hungerford <olahungerford@gmail.com> Co-authored-by: Tom Mahler, PhD <tom@mahler.tech> Co-authored-by: Patcher <patcher@openlit.io> Co-authored-by: Feynman Liang <feynman.liang@gmail.com> Co-authored-by: Stephen <stephen-talari@users.noreply.github.com> Co-authored-by: Rashmi Pawar <168514198+raspawar@users.noreply.github.com> Co-authored-by: Frieda Huang <124417784+frieda-huang@users.noreply.github.com> Co-authored-by: Archkon <180910180+Archkon@users.noreply.github.com> Co-authored-by: Aviral Jain <avi.aviral140@gmail.com> Co-authored-by: lgesuellip <102637283+lgesuellip@users.noreply.github.com> Co-authored-by: fuckqqcom <9391575+fuckqqcom@users.noreply.github.com> Co-authored-by: xiaohan <fuck@qq.com> Co-authored-by: Piotr Mardziel <piotrm@gmail.com> Co-authored-by: Carlos Souza <caike@users.noreply.github.com> Co-authored-by: Paul Cowgill <pauldavidcowgill@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: Anmol Deep <anmol@getaidora.com> Co-authored-by: André Lago <andrelago.eu@gmail.com> Co-authored-by: Matt B <mattb@Matts-MacBook-Pro.local> Co-authored-by: Karan Vaidya <kaavee315@gmail.com> Co-authored-by: alan blount <alan@zeroasterisk.com> Co-authored-by: PJ <pjhagerty@gmail.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joe Moura <joao@crewai.com> Co-authored-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com> Co-authored-by: João Igor <joaoigm@hotmail.com> Co-authored-by: siddharth Sambharia <siddharth.s@portkey.ai> Co-authored-by: siddharthsambharia-portkey <siddhath.s@portkey.ai> Co-authored-by: Erick Amorim <73451993+ericklima-ca@users.noreply.github.com> Co-authored-by: Marco Vinciguerra <88108002+VinciGit00@users.noreply.github.com>
604 lines
27 KiB
Python
604 lines
27 KiB
Python
"""Test Knowledge creation and querying functionality."""
|
|
|
|
from pathlib import Path
|
|
from typing import List, Union
|
|
from unittest.mock import patch
|
|
|
|
import pytest
|
|
|
|
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
|
|
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
|
|
from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource
|
|
from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource
|
|
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
|
|
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
|
|
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def mock_vector_db():
|
|
"""Mock vector database operations."""
|
|
with patch("crewai.knowledge.storage.knowledge_storage.KnowledgeStorage") as mock:
|
|
# Mock the query method to return a predefined response
|
|
instance = mock.return_value
|
|
instance.query.return_value = [
|
|
{
|
|
"context": "Brandon's favorite color is blue and he likes Mexican food.",
|
|
"score": 0.9,
|
|
}
|
|
]
|
|
instance.reset.return_value = None
|
|
yield instance
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def reset_knowledge_storage(mock_vector_db):
|
|
"""Fixture to reset knowledge storage before each test."""
|
|
yield
|
|
|
|
|
|
def test_single_short_string(mock_vector_db):
|
|
# Create a knowledge base with a single short string
|
|
content = "Brandon's favorite color is blue and he likes Mexican food."
|
|
string_source = StringKnowledgeSource(
|
|
content=content, metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [string_source]
|
|
mock_vector_db.query.return_value = [{"context": content, "score": 0.9}]
|
|
# Perform a query
|
|
query = "What is Brandon's favorite color?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the results contain the expected information
|
|
assert any("blue" in result["context"].lower() for result in results)
|
|
# Verify the mock was called
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
# @pytest.mark.vcr(filter_headers=["authorization"])
|
|
def test_single_2k_character_string(mock_vector_db):
|
|
# Create a 2k character string with various facts about Brandon
|
|
content = (
|
|
"Brandon is a software engineer who lives in San Francisco. "
|
|
"He enjoys hiking and often visits the trails in the Bay Area. "
|
|
"Brandon has a pet dog named Max, who is a golden retriever. "
|
|
"He loves reading science fiction books, and his favorite author is Isaac Asimov. "
|
|
"Brandon's favorite movie is Inception, and he enjoys watching it with his friends. "
|
|
"He is also a fan of Mexican cuisine, especially tacos and burritos. "
|
|
"Brandon plays the guitar and often performs at local open mic nights. "
|
|
"He is learning French and plans to visit Paris next year. "
|
|
"Brandon is passionate about technology and often attends tech meetups in the city. "
|
|
"He is also interested in AI and machine learning, and he is currently working on a project related to natural language processing. "
|
|
"Brandon's favorite color is blue, and he often wears blue shirts. "
|
|
"He enjoys cooking and often tries new recipes on weekends. "
|
|
"Brandon is a morning person and likes to start his day with a run in the park. "
|
|
"He is also a coffee enthusiast and enjoys trying different coffee blends. "
|
|
"Brandon is a member of a local book club and enjoys discussing books with fellow members. "
|
|
"He is also a fan of board games and often hosts game nights at his place. "
|
|
"Brandon is an advocate for environmental conservation and volunteers for local clean-up drives. "
|
|
"He is also a mentor for aspiring software developers and enjoys sharing his knowledge with others. "
|
|
"Brandon's favorite sport is basketball, and he often plays with his friends on weekends. "
|
|
"He is also a fan of the Golden State Warriors and enjoys watching their games. "
|
|
)
|
|
string_source = StringKnowledgeSource(
|
|
content=content, metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [string_source]
|
|
mock_vector_db.query.return_value = [{"context": content, "score": 0.9}]
|
|
|
|
# Perform a query
|
|
query = "What is Brandon's favorite movie?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the results contain the expected information
|
|
assert any("inception" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_multiple_short_strings(mock_vector_db):
|
|
# Create multiple short string sources
|
|
contents = [
|
|
"Brandon loves hiking.",
|
|
"Brandon has a dog named Max.",
|
|
"Brandon enjoys painting landscapes.",
|
|
]
|
|
string_sources = [
|
|
StringKnowledgeSource(content=content, metadata={"preference": "personal"})
|
|
for content in contents
|
|
]
|
|
|
|
# Mock the vector db query response
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "Brandon has a dog named Max.", "score": 0.9}
|
|
]
|
|
|
|
mock_vector_db.sources = string_sources
|
|
|
|
# Perform a query
|
|
query = "What is the name of Brandon's pet?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any("max" in result["context"].lower() for result in results)
|
|
# Verify the mock was called
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_multiple_2k_character_strings(mock_vector_db):
|
|
# Create multiple 2k character strings with various facts about Brandon
|
|
contents = [
|
|
(
|
|
"Brandon is a software engineer who lives in San Francisco. "
|
|
"He enjoys hiking and often visits the trails in the Bay Area. "
|
|
"Brandon has a pet dog named Max, who is a golden retriever. "
|
|
"He loves reading science fiction books, and his favorite author is Isaac Asimov. "
|
|
"Brandon's favorite movie is Inception, and he enjoys watching it with his friends. "
|
|
"He is also a fan of Mexican cuisine, especially tacos and burritos. "
|
|
"Brandon plays the guitar and often performs at local open mic nights. "
|
|
"He is learning French and plans to visit Paris next year. "
|
|
"Brandon is passionate about technology and often attends tech meetups in the city. "
|
|
"He is also interested in AI and machine learning, and he is currently working on a project related to natural language processing. "
|
|
"Brandon's favorite color is blue, and he often wears blue shirts. "
|
|
"He enjoys cooking and often tries new recipes on weekends. "
|
|
"Brandon is a morning person and likes to start his day with a run in the park. "
|
|
"He is also a coffee enthusiast and enjoys trying different coffee blends. "
|
|
"Brandon is a member of a local book club and enjoys discussing books with fellow members. "
|
|
"He is also a fan of board games and often hosts game nights at his place. "
|
|
"Brandon is an advocate for environmental conservation and volunteers for local clean-up drives. "
|
|
"He is also a mentor for aspiring software developers and enjoys sharing his knowledge with others. "
|
|
"Brandon's favorite sport is basketball, and he often plays with his friends on weekends. "
|
|
"He is also a fan of the Golden State Warriors and enjoys watching their games. "
|
|
)
|
|
* 2, # Repeat to ensure it's 2k characters
|
|
(
|
|
"Brandon loves traveling and has visited over 20 countries. "
|
|
"He is fluent in Spanish and often practices with his friends. "
|
|
"Brandon's favorite city is Barcelona, where he enjoys the architecture and culture. "
|
|
"He is a foodie and loves trying new cuisines, with a particular fondness for sushi. "
|
|
"Brandon is an avid cyclist and participates in local cycling events. "
|
|
"He is also a photographer and enjoys capturing landscapes and cityscapes. "
|
|
"Brandon is a tech enthusiast and follows the latest trends in gadgets and software. "
|
|
"He is also a fan of virtual reality and owns a VR headset. "
|
|
"Brandon's favorite book is 'The Hitchhiker's Guide to the Galaxy'. "
|
|
"He enjoys watching documentaries and learning about history and science. "
|
|
"Brandon is a coffee lover and has a collection of coffee mugs from different countries. "
|
|
"He is also a fan of jazz music and often attends live performances. "
|
|
"Brandon is a member of a local running club and participates in marathons. "
|
|
"He is also a volunteer at a local animal shelter and helps with dog walking. "
|
|
"Brandon's favorite holiday is Christmas, and he enjoys decorating his home. "
|
|
"He is also a fan of classic movies and has a collection of DVDs. "
|
|
"Brandon is a mentor for young professionals and enjoys giving career advice. "
|
|
"He is also a fan of puzzles and enjoys solving them in his free time. "
|
|
"Brandon's favorite sport is soccer, and he often plays with his friends. "
|
|
"He is also a fan of FC Barcelona and enjoys watching their matches. "
|
|
)
|
|
* 2, # Repeat to ensure it's 2k characters
|
|
]
|
|
string_sources = [
|
|
StringKnowledgeSource(content=content, metadata={"preference": "personal"})
|
|
for content in contents
|
|
]
|
|
|
|
mock_vector_db.sources = string_sources
|
|
mock_vector_db.query.return_value = [{"context": contents[1], "score": 0.9}]
|
|
|
|
# Perform a query
|
|
query = "What is Brandon's favorite book?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any(
|
|
"the hitchhiker's guide to the galaxy" in result["context"].lower()
|
|
for result in results
|
|
)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_single_short_file(mock_vector_db, tmpdir):
|
|
# Create a single short text file
|
|
content = "Brandon's favorite sport is basketball."
|
|
file_path = Path(tmpdir.join("short_file.txt"))
|
|
with open(file_path, "w") as f:
|
|
f.write(content)
|
|
|
|
file_source = TextFileKnowledgeSource(
|
|
file_paths=[file_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [file_source]
|
|
mock_vector_db.query.return_value = [{"context": content, "score": 0.9}]
|
|
# Perform a query
|
|
query = "What sport does Brandon like?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the results contain the expected information
|
|
assert any("basketball" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_single_2k_character_file(mock_vector_db, tmpdir):
|
|
# Create a single 2k character text file with various facts about Brandon
|
|
content = (
|
|
"Brandon is a software engineer who lives in San Francisco. "
|
|
"He enjoys hiking and often visits the trails in the Bay Area. "
|
|
"Brandon has a pet dog named Max, who is a golden retriever. "
|
|
"He loves reading science fiction books, and his favorite author is Isaac Asimov. "
|
|
"Brandon's favorite movie is Inception, and he enjoys watching it with his friends. "
|
|
"He is also a fan of Mexican cuisine, especially tacos and burritos. "
|
|
"Brandon plays the guitar and often performs at local open mic nights. "
|
|
"He is learning French and plans to visit Paris next year. "
|
|
"Brandon is passionate about technology and often attends tech meetups in the city. "
|
|
"He is also interested in AI and machine learning, and he is currently working on a project related to natural language processing. "
|
|
"Brandon's favorite color is blue, and he often wears blue shirts. "
|
|
"He enjoys cooking and often tries new recipes on weekends. "
|
|
"Brandon is a morning person and likes to start his day with a run in the park. "
|
|
"He is also a coffee enthusiast and enjoys trying different coffee blends. "
|
|
"Brandon is a member of a local book club and enjoys discussing books with fellow members. "
|
|
"He is also a fan of board games and often hosts game nights at his place. "
|
|
"Brandon is an advocate for environmental conservation and volunteers for local clean-up drives. "
|
|
"He is also a mentor for aspiring software developers and enjoys sharing his knowledge with others. "
|
|
"Brandon's favorite sport is basketball, and he often plays with his friends on weekends. "
|
|
"He is also a fan of the Golden State Warriors and enjoys watching their games. "
|
|
) * 2 # Repeat to ensure it's 2k characters
|
|
file_path = Path(tmpdir.join("long_file.txt"))
|
|
with open(file_path, "w") as f:
|
|
f.write(content)
|
|
|
|
file_source = TextFileKnowledgeSource(
|
|
file_paths=[file_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [file_source]
|
|
mock_vector_db.query.return_value = [{"context": content, "score": 0.9}]
|
|
# Perform a query
|
|
query = "What is Brandon's favorite movie?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the results contain the expected information
|
|
assert any("inception" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_multiple_short_files(mock_vector_db, tmpdir):
|
|
# Create multiple short text files
|
|
contents = [
|
|
{
|
|
"content": "Brandon works as a software engineer.",
|
|
"metadata": {"category": "profession", "source": "occupation"},
|
|
},
|
|
{
|
|
"content": "Brandon lives in New York.",
|
|
"metadata": {"category": "city", "source": "personal"},
|
|
},
|
|
{
|
|
"content": "Brandon enjoys cooking Italian food.",
|
|
"metadata": {"category": "hobby", "source": "personal"},
|
|
},
|
|
]
|
|
file_paths = []
|
|
for i, item in enumerate(contents):
|
|
file_path = Path(tmpdir.join(f"file_{i}.txt"))
|
|
with open(file_path, "w") as f:
|
|
f.write(item["content"])
|
|
file_paths.append((file_path, item["metadata"]))
|
|
|
|
file_sources = [
|
|
TextFileKnowledgeSource(file_paths=[path], metadata=metadata)
|
|
for path, metadata in file_paths
|
|
]
|
|
mock_vector_db.sources = file_sources
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "Brandon lives in New York.", "score": 0.9}
|
|
]
|
|
# Perform a query
|
|
query = "What city does he reside in?"
|
|
results = mock_vector_db.query(query)
|
|
# Assert that the correct information is retrieved
|
|
assert any("new york" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_multiple_2k_character_files(mock_vector_db, tmpdir):
|
|
# Create multiple 2k character text files with various facts about Brandon
|
|
contents = [
|
|
(
|
|
"Brandon loves traveling and has visited over 20 countries. "
|
|
"He is fluent in Spanish and often practices with his friends. "
|
|
"Brandon's favorite city is Barcelona, where he enjoys the architecture and culture. "
|
|
"He is a foodie and loves trying new cuisines, with a particular fondness for sushi. "
|
|
"Brandon is an avid cyclist and participates in local cycling events. "
|
|
"He is also a photographer and enjoys capturing landscapes and cityscapes. "
|
|
"Brandon is a tech enthusiast and follows the latest trends in gadgets and software. "
|
|
"He is also a fan of virtual reality and owns a VR headset. "
|
|
"Brandon's favorite book is 'The Hitchhiker's Guide to the Galaxy'. "
|
|
"He enjoys watching documentaries and learning about history and science. "
|
|
"Brandon is a coffee lover and has a collection of coffee mugs from different countries. "
|
|
"He is also a fan of jazz music and often attends live performances. "
|
|
"Brandon is a member of a local running club and participates in marathons. "
|
|
"He is also a volunteer at a local animal shelter and helps with dog walking. "
|
|
"Brandon's favorite holiday is Christmas, and he enjoys decorating his home. "
|
|
"He is also a fan of classic movies and has a collection of DVDs. "
|
|
"Brandon is a mentor for young professionals and enjoys giving career advice. "
|
|
"He is also a fan of puzzles and enjoys solving them in his free time. "
|
|
"Brandon's favorite sport is soccer, and he often plays with his friends. "
|
|
"He is also a fan of FC Barcelona and enjoys watching their matches. "
|
|
)
|
|
* 2, # Repeat to ensure it's 2k characters
|
|
(
|
|
"Brandon is a software engineer who lives in San Francisco. "
|
|
"He enjoys hiking and often visits the trails in the Bay Area. "
|
|
"Brandon has a pet dog named Max, who is a golden retriever. "
|
|
"He loves reading science fiction books, and his favorite author is Isaac Asimov. "
|
|
"Brandon's favorite movie is Inception, and he enjoys watching it with his friends. "
|
|
"He is also a fan of Mexican cuisine, especially tacos and burritos. "
|
|
"Brandon plays the guitar and often performs at local open mic nights. "
|
|
"He is learning French and plans to visit Paris next year. "
|
|
"Brandon is passionate about technology and often attends tech meetups in the city. "
|
|
"He is also interested in AI and machine learning, and he is currently working on a project related to natural language processing. "
|
|
"Brandon's favorite color is blue, and he often wears blue shirts. "
|
|
"He enjoys cooking and often tries new recipes on weekends. "
|
|
"Brandon is a morning person and likes to start his day with a run in the park. "
|
|
"He is also a coffee enthusiast and enjoys trying different coffee blends. "
|
|
"Brandon is a member of a local book club and enjoys discussing books with fellow members. "
|
|
"He is also a fan of board games and often hosts game nights at his place. "
|
|
"Brandon is an advocate for environmental conservation and volunteers for local clean-up drives. "
|
|
"He is also a mentor for aspiring software developers and enjoys sharing his knowledge with others. "
|
|
"Brandon's favorite sport is basketball, and he often plays with his friends on weekends. "
|
|
"He is also a fan of the Golden State Warriors and enjoys watching their games. "
|
|
)
|
|
* 2, # Repeat to ensure it's 2k characters
|
|
]
|
|
file_paths = []
|
|
for i, content in enumerate(contents):
|
|
file_path = Path(tmpdir.join(f"long_file_{i}.txt"))
|
|
with open(file_path, "w") as f:
|
|
f.write(content)
|
|
file_paths.append(file_path)
|
|
|
|
file_sources = [
|
|
TextFileKnowledgeSource(file_paths=[path], metadata={"preference": "personal"})
|
|
for path in file_paths
|
|
]
|
|
mock_vector_db.sources = file_sources
|
|
mock_vector_db.query.return_value = [
|
|
{
|
|
"context": "Brandon's favorite book is 'The Hitchhiker's Guide to the Galaxy'.",
|
|
"score": 0.9,
|
|
}
|
|
]
|
|
# Perform a query
|
|
query = "What is Brandon's favorite book?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any(
|
|
"the hitchhiker's guide to the galaxy" in result["context"].lower()
|
|
for result in results
|
|
)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
@pytest.mark.vcr(filter_headers=["authorization"])
|
|
def test_hybrid_string_and_files(mock_vector_db, tmpdir):
|
|
# Create string sources
|
|
string_contents = [
|
|
"Brandon is learning French.",
|
|
"Brandon visited Paris last summer.",
|
|
]
|
|
string_sources = [
|
|
StringKnowledgeSource(content=content, metadata={"preference": "personal"})
|
|
for content in string_contents
|
|
]
|
|
|
|
# Create file sources
|
|
file_contents = [
|
|
"Brandon prefers tea over coffee.",
|
|
"Brandon's favorite book is 'The Alchemist'.",
|
|
]
|
|
file_paths = []
|
|
for i, content in enumerate(file_contents):
|
|
file_path = Path(tmpdir.join(f"file_{i}.txt"))
|
|
with open(file_path, "w") as f:
|
|
f.write(content)
|
|
file_paths.append(file_path)
|
|
|
|
file_sources = [
|
|
TextFileKnowledgeSource(file_paths=[path], metadata={"preference": "personal"})
|
|
for path in file_paths
|
|
]
|
|
|
|
# Combine string and file sources
|
|
mock_vector_db.sources = string_sources + file_sources
|
|
mock_vector_db.query.return_value = [{"context": file_contents[1], "score": 0.9}]
|
|
|
|
# Perform a query
|
|
query = "What is Brandon's favorite book?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any("the alchemist" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_pdf_knowledge_source(mock_vector_db):
|
|
# Get the directory of the current file
|
|
current_dir = Path(__file__).parent
|
|
# Construct the path to the PDF file
|
|
pdf_path = current_dir / "crewai_quickstart.pdf"
|
|
|
|
# Create a PDFKnowledgeSource
|
|
pdf_source = PDFKnowledgeSource(
|
|
file_paths=[pdf_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [pdf_source]
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "crewai create crew latest-ai-development", "score": 0.9}
|
|
]
|
|
|
|
# Perform a query
|
|
query = "How do you create a crew?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any(
|
|
"crewai create crew latest-ai-development" in result["context"].lower()
|
|
for result in results
|
|
)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
@pytest.mark.vcr(filter_headers=["authorization"])
|
|
def test_csv_knowledge_source(mock_vector_db, tmpdir):
|
|
"""Test CSVKnowledgeSource with a simple CSV file."""
|
|
|
|
# Create a CSV file with sample data
|
|
csv_content = [
|
|
["Name", "Age", "City"],
|
|
["Brandon", "30", "New York"],
|
|
["Alice", "25", "Los Angeles"],
|
|
["Bob", "35", "Chicago"],
|
|
]
|
|
csv_path = Path(tmpdir.join("data.csv"))
|
|
with open(csv_path, "w", encoding="utf-8") as f:
|
|
for row in csv_content:
|
|
f.write(",".join(row) + "\n")
|
|
|
|
# Create a CSVKnowledgeSource
|
|
csv_source = CSVKnowledgeSource(
|
|
file_paths=[csv_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [csv_source]
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "Brandon is 30 years old.", "score": 0.9}
|
|
]
|
|
|
|
# Perform a query
|
|
query = "How old is Brandon?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any("30" in result["context"] for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_json_knowledge_source(mock_vector_db, tmpdir):
|
|
"""Test JSONKnowledgeSource with a simple JSON file."""
|
|
|
|
# Create a JSON file with sample data
|
|
json_data = {
|
|
"people": [
|
|
{"name": "Brandon", "age": 30, "city": "New York"},
|
|
{"name": "Alice", "age": 25, "city": "Los Angeles"},
|
|
{"name": "Bob", "age": 35, "city": "Chicago"},
|
|
]
|
|
}
|
|
json_path = Path(tmpdir.join("data.json"))
|
|
with open(json_path, "w", encoding="utf-8") as f:
|
|
import json
|
|
|
|
json.dump(json_data, f)
|
|
|
|
# Create a JSONKnowledgeSource
|
|
json_source = JSONKnowledgeSource(
|
|
file_paths=[json_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [json_source]
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "Alice lives in Los Angeles.", "score": 0.9}
|
|
]
|
|
|
|
# Perform a query
|
|
query = "Where does Alice reside?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any("los angeles" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_excel_knowledge_source(mock_vector_db, tmpdir):
|
|
"""Test ExcelKnowledgeSource with a simple Excel file."""
|
|
|
|
# Create an Excel file with sample data
|
|
import pandas as pd
|
|
|
|
excel_data = {
|
|
"Name": ["Brandon", "Alice", "Bob"],
|
|
"Age": [30, 25, 35],
|
|
"City": ["New York", "Los Angeles", "Chicago"],
|
|
}
|
|
df = pd.DataFrame(excel_data)
|
|
excel_path = Path(tmpdir.join("data.xlsx"))
|
|
df.to_excel(excel_path, index=False)
|
|
|
|
# Create an ExcelKnowledgeSource
|
|
excel_source = ExcelKnowledgeSource(
|
|
file_paths=[excel_path], metadata={"preference": "personal"}
|
|
)
|
|
mock_vector_db.sources = [excel_source]
|
|
mock_vector_db.query.return_value = [
|
|
{"context": "Brandon is 30 years old.", "score": 0.9}
|
|
]
|
|
|
|
# Perform a query
|
|
query = "What is Brandon's age?"
|
|
results = mock_vector_db.query(query)
|
|
|
|
# Assert that the correct information is retrieved
|
|
assert any("30" in result["context"] for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_docling_source(mock_vector_db):
|
|
docling_source = CrewDoclingSource(
|
|
file_paths=[
|
|
"https://lilianweng.github.io/posts/2024-11-28-reward-hacking/",
|
|
],
|
|
)
|
|
mock_vector_db.sources = [docling_source]
|
|
mock_vector_db.query.return_value = [
|
|
{
|
|
"context": "Reward hacking is a technique used to improve the performance of reinforcement learning agents.",
|
|
"score": 0.9,
|
|
}
|
|
]
|
|
# Perform a query
|
|
query = "What is reward hacking?"
|
|
results = mock_vector_db.query(query)
|
|
assert any("reward hacking" in result["context"].lower() for result in results)
|
|
mock_vector_db.query.assert_called_once()
|
|
|
|
|
|
def test_multiple_docling_sources():
|
|
urls: List[Union[Path, str]] = [
|
|
"https://lilianweng.github.io/posts/2024-11-28-reward-hacking/",
|
|
"https://lilianweng.github.io/posts/2024-07-07-hallucination/",
|
|
]
|
|
docling_source = CrewDoclingSource(file_paths=urls)
|
|
|
|
assert docling_source.file_paths == urls
|
|
assert docling_source.content is not None
|
|
|
|
|
|
def test_file_path_validation():
|
|
"""Test file path validation for knowledge sources."""
|
|
current_dir = Path(__file__).parent
|
|
pdf_path = current_dir / "crewai_quickstart.pdf"
|
|
|
|
# Test valid single file_path
|
|
source = PDFKnowledgeSource(file_path=pdf_path)
|
|
assert source.safe_file_paths == [pdf_path]
|
|
|
|
# Test valid file_paths list
|
|
source = PDFKnowledgeSource(file_paths=[pdf_path])
|
|
assert source.safe_file_paths == [pdf_path]
|
|
|
|
# Test both file_path and file_paths provided (should use file_paths)
|
|
source = PDFKnowledgeSource(file_path=pdf_path, file_paths=[pdf_path])
|
|
assert source.safe_file_paths == [pdf_path]
|
|
|
|
# Test neither file_path nor file_paths provided
|
|
with pytest.raises(
|
|
ValueError,
|
|
match="file_path/file_paths must be a Path, str, or a list of these types",
|
|
):
|
|
PDFKnowledgeSource()
|