Commit Graph

10 Commits

Author SHA1 Message Date
Joao Moura
ef39974bd8 feat: add configurable case timeout for benchmarking and testing
- Introduced a case_timeout parameter in the benchmark and test functions to allow dynamic timeout settings.
- Updated the project configuration template to include a default case_timeout value of 90 seconds.
- Enhanced the handling of timeouts in benchmark results to reflect the configured case_timeout.
2026-05-14 00:28:44 -04:00
Joao Moura
2897535799 feat: enhance benchmarking and evaluation features
- Introduced a new judge tool for submitting evaluation scores with structured parameters.
- Added a function to parse judge results from various response formats.
- Updated the benchmark command to handle iterations more effectively, allowing configuration from the command line or config file.
- Implemented a method to save run results to a JSON file for better tracking of test outcomes.
- Enhanced progress display to show current iteration during benchmark runs.
- Updated project configuration template to clarify test iteration settings.
2026-05-14 00:23:32 -04:00
alex-clawd
48a861aa1a fix: resolve all CI failures — format, lint, mypy, and review comments
- Format: auto-reformat agent_tui.py, benchmark.py, coworker_tools.py via ruff
- Lint: 0 remaining errors after format pass
- Mypy: fix _NullPrinter to subclass Printer for type compatibility in
  executor.py, planning.py, and skill_builder.py; add isinstance(r, Message)
  guards in spawn_tools.py; annotate return types and fix dict type params
  and MCPToolResolver logger type in new_agent.py; add missing printer args
  to get_llm_response calls
- cli.py: fix _read_config to use sentinel so falsy values (0, false) are
  returned correctly instead of being treated as missing keys
- create_agent.py: replace regex-based JSONC comment stripper with a
  token-aware parser that preserves // inside quoted strings (e.g. URLs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 12:28:25 -07:00
alex-clawd
d744b37723 fix: deduplicate JSONC stripping, guard progress callback, and fix _read_config
- Extract `_strip_jsonc` as the single shared helper in `create_agent.py`,
  replacing the three duplicate implementations in `agent_tui.py`,
  `benchmark.py`, and the inline regex in `cli.py::_read_config`.
- Apply `_strip_jsonc` (including trailing-comma removal) inside
  `_read_config` so JSONC config.json files are parsed correctly.
- Add `if progress is not None:` guard inside `_make_progress_cb._cb`
  to prevent a `NoneType` call when running in verbose mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 12:28:25 -07:00
alex-clawd
94b5e2ea7b fix: address CI failures — ruff, mypy, mock OpenAI tests, JSONC support
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 12:28:25 -07:00
alex-clawd
2ddc348ad2 fix: resolve lint, type-check, and test failures
- B904: raise KeyboardInterrupt from err in cli_provider.py
- mypy: add TYPE_CHECKING import for SQLiteConversationStorage, annotate
  _initialized class var in TaskScheduler, fix Match type params and
  Returning Any in create_agent.py
- tests: mock aget_llm_response in 3 integration tests that fail when
  network is blocked but OPENAI_API_KEY is set
- flow.py: use asyncio.run_coroutine_threadsafe() instead of asyncio.run()
  when a loop is already running in ask() and say()
- cli.py: fix threshold=0.0 treated as falsy by using `is not None` check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 12:28:25 -07:00
Joao Moura
75651f962d feat: introduce room management and agent selection in TUI
- Added a `CreateRoomScreen` modal for creating new rooms with agent selection and engagement options.
- Updated the main TUI layout to include a sidebar for room management, allowing users to create and switch between rooms.
- Enhanced the configuration handling to support room definitions and engagement modes.
- Refactored existing code to accommodate new room functionalities and improve overall structure.

These changes enhance the user experience by enabling better organization and interaction with multiple agents in the CrewAI framework.
2026-05-13 12:28:25 -07:00
Joao Moura
fc85637e60 feat: enhance benchmark case loading and CLI threshold handling
- Introduced a new `LoadedCases` class to encapsulate benchmark cases and optional thresholds, improving data management.
- Updated `load_benchmark_cases` function to support loading cases from both bare arrays and object wrappers with a threshold.
- Modified CLI options to allow dynamic threshold configuration, defaulting to a value from `config.json` if not specified.
- Enhanced error handling for invalid benchmark case formats and added tests to validate new functionality.

These changes aim to improve the flexibility and usability of benchmark case management within the CrewAI framework.
2026-05-13 12:28:25 -07:00
Joao Moura
6cb29dce65 feat: enhance agent TUI and CLI with streaming responses and model selection improvements
- Added a `_safe_render` function to escape Rich markup and convert markdown to Rich format.
- Implemented token-by-token streaming for agent responses in the TUI, improving user experience during interactions.
- Updated the CLI to allow selection of LLM providers and models, enhancing flexibility in agent creation.
- Refactored benchmark case paths to use a `tests` directory instead of `benchmarks`.
- Introduced a `last_stream_result` property in the `NewAgent` class to retrieve the latest streaming response.

These changes aim to provide a more interactive and user-friendly experience in managing agents within the CrewAI framework.
2026-05-13 12:28:25 -07:00
Joao Moura
fe7f730546 feat: add interactive agent creation and TUI for multi-agent interaction
- Introduced a new `create_agent` command for interactive agent definition.
- Added `agent_tui.py` for a conversational TUI supporting multi-agent interactions.
- Updated CLI to support agent creation and training workflows.
- Enhanced `.gitignore` to exclude demo files and configuration artifacts.
- Implemented a benchmark runner for testing agent performance against defined cases.

This commit lays the groundwork for a more interactive and user-friendly experience in managing agents within the CrewAI framework.
2026-05-13 12:28:25 -07:00