fix(hitl): log pre-review failures and add learn_strict mode

Closes #5725.

In _pre_review_with_lessons (lib/crewai/src/crewai/flow/human_feedback.py),
a broad except Exception: return method_output silently swallowed any
LLM, network, auth, or structured-output failure during the pre-review
step. Callers could not distinguish pre-reviewed output from raw output,
and there was no log or event surfaced.

Changes:
- Add a module logger.
- Narrow the try/except in _pre_review_with_lessons so the mem is None
  and not matches short-circuits stay outside the failure path (those
  are not error cases).
- On recall or LLM pre-review failure, log a WARNING with exc_info=True
  so the silent fallback is observable. Continue to fall back to the raw
  method_output so the flow does not break.
- Add an opt-in learn_strict=True parameter on the human_feedback
  decorator and HumanFeedbackConfig that re-raises pre-review failures
  instead of falling back, for callers that need fail-closed behavior.
- Update the Graceful degradation docs section to reflect the new
  observable-by-default behavior and document learn_strict.

Tests (lib/crewai/tests/test_human_feedback_decorator.py):
- New TestHumanFeedbackPreReviewFailure class with 7 tests covering
  WARNING logging on LLM and recall failures, learn_strict propagation
  in both sync and async paths, and config introspection of the new
  learn_strict flag.

Co-Authored-By: João <joao@crewai.com>
This commit is contained in:
Devin AI
2026-05-06 04:25:15 +00:00
parent ec8a522c2c
commit d633aaa5fa
3 changed files with 336 additions and 11 deletions

View File

@@ -682,6 +682,7 @@ class ArticleReviewFlow(Flow):
| Parameter | Default | Description |
|-----------|---------|-------------|
| `learn` | `False` | Enable HITL learning |
| `learn_strict` | `False` | If `True`, pre-review failures are re-raised instead of falling back to raw output |
| `learn_limit` | `5` | Max past lessons to recall for pre-review |
### Key Design Decisions
@@ -689,7 +690,8 @@ class ArticleReviewFlow(Flow):
- **Same LLM for everything**: The `llm` parameter on the decorator is shared by outcome collapsing, lesson distillation, and pre-review. No need to configure multiple models.
- **Structured output**: Both distillation and pre-review use function calling with Pydantic models when the LLM supports it, falling back to text parsing otherwise.
- **Non-blocking storage**: Lessons are stored via `remember_many()` which runs in a background thread -- the flow continues immediately.
- **Graceful degradation**: If the LLM fails during distillation, nothing is stored. If it fails during pre-review, the raw output is shown. Neither failure blocks the flow.
- **Observable graceful degradation**: If the LLM fails during distillation, nothing is stored. If it fails during pre-review, the raw output is shown to the human and the failure is logged at `WARNING` level with the full traceback (`exc_info=True`) under the `crewai.flow.human_feedback` logger -- so the silent fallback is detectable. Neither failure blocks the flow.
- **Strict mode**: Pass `learn_strict=True` to make pre-review fail closed -- failures (LLM error, network/auth error, structured-output parse error, memory `recall` error) propagate out of the flow method instead of being swallowed. Use this when downstream code must be able to assume that pre-review actually executed.
- **No scope/categories needed**: When storing lessons, only `source` is passed. The encoding pipeline infers scope, categories, and importance automatically.
<Note>