This commit fixes the validation error that occurred when using the
google-generativeai embedder provider with a flat configuration format.
Changes:
1. Made the 'config' field optional in GenerativeAiProviderSpec by adding
'total=False' and marking 'provider' as Required, consistent with other
provider specs like VertexAIProviderSpec.
2. Added normalization in the Crew class to automatically convert flat
embedder configs to nested format before validation. This allows users
to use either format:
- Flat: {'provider': 'google-generativeai', 'api_key': '...', 'model_name': '...'}
- Nested: {'provider': 'google-generativeai', 'config': {'api_key': '...', 'model_name': '...'}}
3. Updated the embedder factory to support both flat and nested config
formats by checking for the presence of 'config' key and extracting
config fields accordingly.
4. Added comprehensive tests to verify both formats work correctly:
- Test for flat config format (the issue reported in #3741)
- Test for nested config format (recommended format)
- Test for TypedDict validation
Fixes#3741
Co-Authored-By: João <joao@crewai.com>
- prefix provider env vars with embeddings_
- rename watson → watsonx in providers
- add deprecation warning and alias for legacy 'watson' key (to be removed in v1.0.0)
- introduce baseembeddingsprovider and helper for embedding functions
- add core embedding types and migrate providers, factory, and storage modules
- remove unused type aliases and fix pydantic schema error
- update providers with env var support and related fixes
- add batch_size field to baseragconfig (default=100)
- update chromadb/qdrant clients and factories to use batch_size
- extract and filter batch_size from embedder config in knowledgestorage
- fix large csv files exceeding embedder token limits (#3574)
- remove unneeded conditional for type
Co-authored-by: Vini Brasil <vini@hey.com>
- support nested config format with embedderconfig typeddict
- fix parsing for model/model_name compatibility
- add validation, typing_extensions, and improved type hints
- enhance embedding factory with env var injection and provider support
- add tests for openai, azure, and all embedding providers
- misc fixes: test file rename, updated mocking patterns
- Add limit and score_threshold to BaseRagConfig, propagate to clients
- Update default search params in RAG storage, knowledge, and memory (limit=5, threshold=0.6)
- Fix linting (ruff, mypy, PERF203) and refactor save logic
- Update tests for new defaults and ChromaDB behavior
- Sanitize ChromaDB collection names and use original dir naming
- Add persistent client with file locking to the ChromaDB factory
- Add upsert support to the ChromaDB client
- Suppress ChromaDB deprecation warnings for `model_fields`
- Extract `suppress_logging` into shared `logger_utils`
- Update tests to reflect upsert behavior
- Docs: add additional note
### RAG Config System
* Added ChromaDB client creation via config with sensible defaults
* Introduced optional imports and shared RAG config utilities/schema
* Enabled embedding function support with ChromaDB provider integration
* Refactored configs for immutability and stronger type safety
* Removed unused code and expanded test coverage
Add ChromaDB client implementation with async support
- Implement core collection operations (create, get_or_create, delete)
- Add search functionality with cosine similarity scoring
- Include both sync and async method variants
- Add type safety with NamedTuples and TypeGuards
- Extract utility functions to separate modules
- Default to cosine distance metric for text similarity
- Add comprehensive test coverage
TODO:
- l2, ip score calculations are not settled on