interactions:
- request:
    body: '{"messages": [{"role": "system", "content": "You are a precise assistant
      that creates structured summaries of agent conversations. You preserve critical
      context needed for seamless task continuation."}, {"role": "user", "content":
      "Analyze the following conversation and create a structured summary that preserves
      all information needed to continue the task seamlessly.\n\n<conversation>\n[USER]:
      Research the latest developments in large language models. Focus on architecture
      improvements and training techniques.\n\n[ASSISTANT]: I''ll research the latest
      developments in large language models. Based on my knowledge, recent advances
      include:\n1. Mixture of Experts (MoE) architectures\n2. Improved attention mechanisms
      like Flash Attention\n3. Better training data curation techniques\n4. Constitutional
      AI and RLHF improvements\n\n[USER]: Can you go deeper on the MoE architectures?
      What are the key papers?\n\n[ASSISTANT]: Key papers on Mixture of Experts:\n-
      Switch Transformers (Google, 2021) - simplified MoE routing\n- GShard - scaling
      to 600B parameters\n- Mixtral (Mistral AI) - open-source MoE model\nThe main
      advantage is computational efficiency: only a subset of experts is activated
      per token.\n</conversation>\n\nCreate a summary with these sections:\n1. **Task
      Overview**: What is the agent trying to accomplish?\n2. **Current State**: What
      has been completed so far? What step is the agent on?\n3. **Important Discoveries**:
      Key facts, data, tool results, or findings that must not be lost.\n4. **Next
      Steps**: What should the agent do next based on the conversation?\n5. **Context
      to Preserve**: Any specific values, names, URLs, code snippets, or details referenced
      in the conversation.\n\nWrap your entire summary in <summary> tags.\n\n<summary>\n[Your
      structured summary here]\n</summary>"}], "stream": false, "temperature": 0}'
    headers:
      Accept:
      - application/json
      Connection:
      - keep-alive
      Content-Length:
      - '1849'
      Content-Type:
      - application/json
      User-Agent:
      - X-USER-AGENT-XXX
      accept-encoding:
      - ACCEPT-ENCODING-XXX
      api-key:
      - X-API-KEY-XXX
      authorization:
      - AUTHORIZATION-XXX
      x-ms-client-request-id:
      - X-MS-CLIENT-REQUEST-ID-XXX
    method: POST
    uri: https://fake-azure-endpoint.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-12-01-preview
  response:
    body:
      string: '{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"protected_material_code":{"filtered":false,"detected":false},"protected_material_text":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"finish_reason":"stop","index":0,"logprobs":null,"message":{"annotations":[],"content":"\u003csummary\u003e\n1.
        **Task Overview**: The user has requested research on the latest developments
        in large language models, specifically focusing on architecture improvements
        and training techniques.\n\n2. **Current State**: The assistant has provided
        an initial overview of recent advances in large language models, including
        Mixture of Experts (MoE) architectures, improved attention mechanisms, better
        training data curation techniques, and advancements in Constitutional AI and
        Reinforcement Learning from Human Feedback (RLHF).\n\n3. **Important Discoveries**:
        \n   - Recent advances in large language models include:\n     1. Mixture
        of Experts (MoE) architectures\n     2. Improved attention mechanisms like
        Flash Attention\n     3. Better training data curation techniques\n     4.
        Constitutional AI and RLHF improvements\n   - Key papers on Mixture of Experts:\n     -
        Switch Transformers (Google, 2021) - simplified MoE routing\n     - GShard
        - scaling to 600B parameters\n     - Mixtral (Mistral AI) - open-source MoE
        model\n   - The main advantage of MoE architectures is computational efficiency,
        as only a subset of experts is activated per token.\n\n4. **Next Steps**:
        The assistant should delve deeper into the Mixture of Experts architectures,
        potentially summarizing the key findings and implications from the identified
        papers.\n\n5. **Context to Preserve**: \n   - Key papers: \n     - Switch
        Transformers (Google, 2021)\n     - GShard\n     - Mixtral (Mistral AI)\n   -
        Focus on computational efficiency of MoE architectures.\n\u003c/summary\u003e","refusal":null,"role":"assistant"}}],"created":1770849953,"id":"chatcmpl-D8DFx1H1zzEerW5H0BWfuwmio2sz1","model":"gpt-4o-mini-2024-07-18","object":"chat.completion","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"jailbreak":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"system_fingerprint":"fp_f97eff32c5","usage":{"completion_tokens":328,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens":368,"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0},"total_tokens":696}}

        '
    headers:
      Content-Length:
      - '2786'
      Content-Type:
      - application/json
      Date:
      - Wed, 11 Feb 2026 22:45:56 GMT
      Strict-Transport-Security:
      - STS-XXX
      apim-request-id:
      - APIM-REQUEST-ID-XXX
      azureml-model-session:
      - AZUREML-MODEL-SESSION-XXX
      x-accel-buffering:
      - 'no'
      x-content-type-options:
      - X-CONTENT-TYPE-XXX
      x-ms-client-request-id:
      - X-MS-CLIENT-REQUEST-ID-XXX
      x-ms-deployment-name:
      - gpt-4o-mini
      x-ms-rai-invoked:
      - 'true'
      x-ms-region:
      - X-MS-REGION-XXX
      x-ratelimit-limit-requests:
      - X-RATELIMIT-LIMIT-REQUESTS-XXX
      x-ratelimit-limit-tokens:
      - X-RATELIMIT-LIMIT-TOKENS-XXX
      x-ratelimit-remaining-requests:
      - X-RATELIMIT-REMAINING-REQUESTS-XXX
      x-ratelimit-remaining-tokens:
      - X-RATELIMIT-REMAINING-TOKENS-XXX
      x-request-id:
      - X-REQUEST-ID-XXX
    status:
      code: 200
      message: OK
version: 1