mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-04-09 04:28:16 +00:00
`_initialize_backend_batch` executes while holding `_batch_ready_cv`,
blocking any concurrent thread waiting for batch readiness (e.g. event
handlers). The previous retry logic could hold the lock for an
unacceptable amount of time:
Previous worst case (max_retries=2, all attempts timing out):
3 attempts × 30s HTTP timeout + 2 × 0.2s sleep = ~90.4s
Plus auth-fallback recursive call: up to ~180s total
New worst case (max_retries=1, all attempts timing out):
2 attempts × 30s HTTP timeout + 1 × 0.2s sleep = ~60.2s
Exceptions (timeout, connection error) now abort immediately:
1 attempt × 30s HTTP timeout = ~30s (no sleep, no retry)
Changes:
- Reduce max_retries from 2 to 1 (5xx responses still retry once)
- Move try/except outside the retry loop so any exception
(timeout, connection error) aborts immediately without retrying
or sleeping, capping the exception path at a single HTTP timeout
Addresses review comment on PR #4947 about retry sleeps holding the
batch initialization lock and adding unexpected latency.