* perf: reduce framework overhead for NVIDIA benchmarks
- Lazy initialize event bus thread pool and event loop on first emit()
instead of at import time (~200ms savings)
- Skip trace listener registration (50+ handlers) when tracing disabled
- Skip trace prompt in non-interactive contexts (isatty check) to avoid
20s timeout in CI/Docker/API servers
- Skip flush() when no events were emitted (avoids 30s timeout waste)
- Add _has_pending_events flag to track if any events were emitted
- Add _executor_initialized flag for lazy init double-checked locking
All existing behavior preserved when tracing IS enabled. No public APIs
changed - only conditional guards added.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: address PR review comments — tracing override, executor init order, stdin guard, unused import
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* style: fix ruff formatting in trace_listener.py and utils.py
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Iris Clawd <iris@crewai.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>