fix: ensure lite agents course-correct on validation errors
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled

* fix: ensure lite agents course-correct on validation errors

* chore: update cassettes and test expectations

* fix: ensure multiple guardrails propogate
This commit is contained in:
Greyson LaLonde
2025-11-05 19:02:11 -05:00
committed by GitHub
parent 61ad1fb112
commit 7e6171d5bc
28 changed files with 8125 additions and 10295 deletions

View File

@@ -1,35 +1,37 @@
interactions:
- request:
body: '{"messages": [{"role": "system", "content": "You are Scorer. You''re an
expert scorer, specialized in scoring titles.\nYour personal goal is: Score
the title\nTo give my best complete final answer to the task use the exact following
body: '{"messages":[{"role":"system","content":"You are Scorer. You''re an expert
scorer, specialized in scoring titles.\nYour personal goal is: Score the title\nTo
give my best complete final answer to the task respond using the exact following
format:\n\nThought: I now can give a great answer\nFinal Answer: Your final
answer must be the great and the most complete as possible, it must be outcome
described.\n\nI MUST use these formats, my job depends on it!"}, {"role": "user",
"content": "\nCurrent Task: Give me an integer score between 1-5 for the following
title: ''The impact of AI in the future of work''\n\nThis is the expect criteria
for your final answer: The score of the title.\nyou MUST return the actual complete
content as the final answer, not a summary.\n\nBegin! This is VERY important
to you, use the tools available and give your best Final Answer, your job depends
on it!\n\nThought:"}], "model": "gpt-4o"}'
described.\n\nI MUST use these formats, my job depends on it!"},{"role":"user","content":"\nCurrent
Task: Give me an integer score between 1-5 for the following title: ''The impact
of AI in the future of work''\n\nThis is the expected criteria for your final
answer: The score of the title.\nyou MUST return the actual complete content
as the final answer, not a summary.\nEnsure your final answer strictly adheres
to the following OpenAPI schema: {\n \"properties\": {\n \"score\": {\n \"title\":
\"Score\",\n \"type\": \"integer\"\n }\n },\n \"required\": [\n \"score\"\n ],\n \"title\":
\"ScoreOutput\",\n \"type\": \"object\",\n \"additionalProperties\": false\n}\n\nDo
not include the OpenAPI schema in the final output. Ensure the final output
does not include any code block markers like ```json or ```python.\n\nBegin!
This is VERY important to you, use the tools available and give your best Final
Answer, your job depends on it!\n\nThought:"}],"model":"gpt-4.1-mini"}'
headers:
accept:
- application/json
accept-encoding:
- gzip, deflate
- gzip, deflate, zstd
connection:
- keep-alive
content-length:
- '915'
- '1394'
content-type:
- application/json
cookie:
- __cf_bm=9.8sBYBkvBR8R1K_bVF7xgU..80XKlEIg3N2OBbTSCU-1727214102-1.0.1.1-.qiTLXbPamYUMSuyNsOEB9jhGu.jOifujOrx9E2JZvStbIZ9RTIiE44xKKNfLPxQkOi6qAT3h6htK8lPDGV_5g;
_cfuvid=lbRdAddVWV6W3f5Dm9SaOPWDUOxqtZBSPr_fTW26nEA-1727213194587-0.0.1.1-604800000
host:
- api.openai.com
user-agent:
- OpenAI/Python 1.47.0
- OpenAI/Python 1.109.1
x-stainless-arch:
- arm64
x-stainless-async:
@@ -39,29 +41,32 @@ interactions:
x-stainless-os:
- MacOS
x-stainless-package-version:
- 1.47.0
x-stainless-raw-response:
- 'true'
- 1.109.1
x-stainless-read-timeout:
- '600'
x-stainless-retry-count:
- '0'
x-stainless-runtime:
- CPython
x-stainless-runtime-version:
- 3.11.7
- 3.12.9
method: POST
uri: https://api.openai.com/v1/chat/completions
response:
content: "{\n \"id\": \"chatcmpl-AB7gN2SDetZsIJf8dMDl2RwE5Qyvp\",\n \"object\":
\"chat.completion\",\n \"created\": 1727214503,\n \"model\": \"gpt-4o-2024-05-13\",\n
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
\"assistant\",\n \"content\": \"Thought: I now can give a great answer\\nFinal
Answer: 4\",\n \"refusal\": null\n },\n \"logprobs\": null,\n
\ \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
186,\n \"completion_tokens\": 15,\n \"total_tokens\": 201,\n \"completion_tokens_details\":
{\n \"reasoning_tokens\": 0\n }\n },\n \"system_fingerprint\": \"fp_52a7f40b0b\"\n}\n"
body:
string: !!binary |
H4sIAAAAAAAAAwAAAP//jFLBbpwwEL3zFaM5LxGw7GbDraoUpZGqtmoOTbMRcswAToxt2SZpu9p/
rwybhbSJlAsS8+Y9vzczuwgARYUFIG+Z552R8cfrJvly+e3Hkj4/XC8Ntz//nIrvbXZxcfX1HBeB
oe/uiftn1gnXnZHkhVYjzC0xT0E1PV1ny02yXq0HoNMVyUBrjI/zkzTuhBJxlmSrOMnjND/QWy04
OSzgJgIA2A3fYFRV9AsLSBbPlY6cYw1hcWwCQKtlqCBzTjjPlMfFBHKtPKnB+1Wr+6b1BXwCpZ+A
MwWNeCRg0IQAwJR7IrtV50IxCR+GvwJ2W3RcW9pike/nypbq3rEQT/VSzgCmlPYsjGfIdHtA9scU
UjfG6jv3DxVroYRrS0vMaRUcO68NDug+ArgdptW/GAAaqzvjS68faHguO8tHPZy2NKHp5gB67Zmc
6ss0W7yiV1bkmZBuNm/kjLdUTdRpOayvhJ4B0Sz1/25e0x6TC9W8R34COCfjqSqNpUrwl4mnNkvh
iN9qO055MIyO7KPgVHpBNmyiopr1crwsdL+dp66shWrIGivG86pNmfNss0rrzTrDaB/9BQAA//8D
AJ9ashhtAwAA
headers:
CF-Cache-Status:
- DYNAMIC
CF-RAY:
- 8c85fa763ef91cf3-GRU
- REDACTED-RAY
Connection:
- keep-alive
Content-Encoding:
@@ -69,37 +74,54 @@ interactions:
Content-Type:
- application/json
Date:
- Tue, 24 Sep 2024 21:48:23 GMT
- Wed, 05 Nov 2025 22:10:56 GMT
Server:
- cloudflare
Set-Cookie:
- __cf_bm=REDACTED;
path=/; expires=Wed, 05-Nov-25 22:40:56 GMT; domain=.api.openai.com; HttpOnly;
Secure; SameSite=None
- _cfuvid=REDACTED;
path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
Strict-Transport-Security:
- max-age=31536000; includeSubDomains; preload
Transfer-Encoding:
- chunked
X-Content-Type-Options:
- nosniff
access-control-expose-headers:
- X-Request-ID
alt-svc:
- h3=":443"; ma=86400
cf-cache-status:
- DYNAMIC
openai-organization:
- crewai-iuxna1
- user-hortuttj2f3qtmxyik2zxf4q
openai-processing-ms:
- '194'
- '770'
openai-project:
- proj_fL4UBWR1CMpAAdgzaSKqsVvA
openai-version:
- '2020-10-01'
strict-transport-security:
- max-age=31536000; includeSubDomains; preload
x-envoy-upstream-service-time:
- '796'
x-openai-proxy-wasm:
- v0.1
x-ratelimit-limit-requests:
- '10000'
- '500'
x-ratelimit-limit-tokens:
- '30000000'
- '200000'
x-ratelimit-remaining-requests:
- '9999'
- '499'
x-ratelimit-remaining-tokens:
- '29999781'
- '199687'
x-ratelimit-reset-requests:
- 6ms
- 120ms
x-ratelimit-reset-tokens:
- 0s
- 93ms
x-request-id:
- req_5345a8fffc6276bb9d0a23edecd063ff
http_version: HTTP/1.1
status_code: 200
- req_REDACTED
status:
code: 200
message: OK
version: 1