Skip to content

Python: [BREAKING] Python: Fix orchestration outputs so as_agent() returns the final answer only. Align other orchestration outputs.#5301

Open
moonbox3 wants to merge 13 commits intomicrosoft:mainfrom
moonbox3:improve-orchestration-outputs
Open

Python: [BREAKING] Python: Fix orchestration outputs so as_agent() returns the final answer only. Align other orchestration outputs.#5301
moonbox3 wants to merge 13 commits intomicrosoft:mainfrom
moonbox3:improve-orchestration-outputs

Conversation

@moonbox3
Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 commented Apr 16, 2026

Motivation and Context

Important

Breaking changes are scoped to the still-experimental agent-framework-orchestrations package. Core agent-framework workflow code is non-breaking — the new AgentExecutor.emit_data_events kwarg defaults to False (no behavior change), and WorkflowAgent additively consumes data events it previously ignored.

workflow.as_agent().run(prompt) for SequentialBuilder returned the user's input plus every agent's reply instead of the last agent's answer. The same class of bug affected Concurrent, GroupChat, Magentic, and Handoff: they all yielded list[Message] conversation dumps as output events, and intermediate agents emitted output events indistinguishable from the final answer when intermediate_outputs=True.

An ADR (#4799) proposed adding an is_run_completed flag on WorkflowEvent to discriminate final from intermediate outputs. This PR takes a simpler path that uses primitives the framework already has — no new event types or discriminator flags.

Design

Two layers of discrimination, each at the right boundary:

Event layer (workflow consumers): output events carry the workflow's final answer; data events carry intermediate participant activity. Orchestrations restrict output_executors to the terminator, so only the answer surfaces on the output channel. Intermediate participants surface (when intermediate_outputs=True) on the data channel via AgentExecutor(emit_data_events=True), which mirrors yield_output to WorkflowEvent.emit. Workflow consumers branch on event.type.

Content layer (as_agent() consumers): WorkflowAgent rewrites text content from data events as text_reasoning Content blocks before merging into the final AgentResponse. Consumers iterate msg.contents and branch on content.type — the same rendering path they already use for Claude thinking and OpenAI reasoning. No new field on Message/AgentResponse/WorkflowEvent.

Per-orchestration contract

Orchestration Terminal output event
Sequential (agent terminator) Last agent's AgentResponse (streaming chunks — the last AgentExecutor itself is the workflow's output executor)
Sequential (custom-executor terminator) The executor's list[Message] (unchanged)
Concurrent (default aggregator) AgentResponse with one assistant message per participant (no user prompt prepended)
Concurrent (custom aggregator) Whatever the aggregator yields
GroupChat AgentResponse with the orchestrator's completion message
Magentic AgentResponse with the manager's synthesized final answer
Handoff No terminal yield — per-agent responses already surface as output events as agents speak

Behavior changes (orchestrations package)

  • workflow.as_agent().run(prompt) now returns only the meaningful answer for each orchestration, matching the agent contract.
  • ConcurrentBuilder default aggregator output changed from list[Message] (user + per-agent) to AgentResponse (per-agent only). Callers consuming output_events[0].data will see an AgentResponse instead of a list.
  • SequentialBuilder, GroupChat, and Magentic terminal output changed from list[Message] (full conversation) to AgentResponse (answer only).
  • Handoff no longer emits a synthetic terminal event. The last agent's output event is the end of the stream.
  • intermediate_outputs=True no longer floods the output channel — intermediate participants surface on the data channel and arrive at as_agent() consumers as text_reasoning content.

Review feedback addressed

  • Flag renamed emit_intermediate_dataemit_data_events (mechanically accurate; orchestration semantics live at the orchestration layer).
  • data events are not "re-introducing the removed AgentResponsesEvent" — they're the unified event model's documented intermediate-data channel (WorkflowEvent.emit). The new behavior is the content rewrite at the WorkflowAgent boundary, not a new event class.
  • Mutation safety: the streaming branch now constructs fresh AgentResponseUpdate instances instead of mutating shared payloads (regression test added).
  • Concurrent duplication: emit_data_events is suppressed when the default aggregator is in use; only enabled with custom aggregators where intermediate visibility is genuinely separate from the aggregator's transformed answer.
  • HIL terminator: SequentialBuilder now correctly recognizes AgentApprovalExecutor as a valid terminator and routes its inner output to the parent via allow_direct_output=True.
  • HIL data forwarding: AgentApprovalExecutor now forwards inner data events through to the parent so wrapped intermediate participants don't silently drop their reasoning.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@moonbox3 moonbox3 self-assigned this Apr 16, 2026
Copilot AI review requested due to automatic review settings April 16, 2026 06:28
@github-actions github-actions bot changed the title [BREAKING] Python: Fix orchestration outputs so as_agent() returns the final answer only. Align other orchestration outputs. Python: [BREAKING] Python: Fix orchestration outputs so as_agent() returns the final answer only. Align other orchestration outputs. Apr 16, 2026
@moonbox3
Copy link
Copy Markdown
Contributor Author

moonbox3 commented Apr 16, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/_workflows
   _agent.py3807380%66, 74–80, 116–117, 210, 271, 284, 353, 364, 366, 426, 432, 442–443, 450, 452, 458, 520–521, 530, 544, 570, 572, 605–607, 609, 611, 613, 618, 623, 684, 714, 716, 733, 772–775, 781, 787, 791–792, 795–801, 805–806, 814, 921, 928, 934–935, 946, 978, 985, 1006, 1015, 1019, 1021–1023, 1030
   _agent_executor.py2061791%171, 197, 238, 262, 282–283, 363–365, 367, 377–378, 425, 501–502, 574, 580
   _workflow_executor.py1903183%94, 404, 464, 487, 489, 497–498, 503, 505, 510, 512, 565, 600–606, 610–612, 620, 625, 636, 646, 650, 656, 660, 670, 674
packages/orchestrations/agent_framework_orchestrations
   _base_group_chat_orchestrator.py1681292%109, 277, 292, 326–328, 332, 351, 413, 459–461
   _concurrent.py1442681%51, 60–61, 69–70, 89–90, 95, 113, 118, 123–124, 145, 155, 162, 230, 246, 249, 306, 336, 338–339, 341, 352, 365, 369
   _group_chat.py3297078%173, 336, 343, 372, 383–384, 390, 395, 416, 420, 433–434, 447, 462–463, 465, 481, 508–513, 515, 549–552, 554, 559–563, 651, 654, 693, 696, 699, 702, 710, 722–723, 725–726, 728–729, 731, 736, 739, 748, 754, 798–799, 803–804, 818–819, 821–822, 853–854, 920, 939, 947, 952–954, 961, 971
   _handoff.py3355284%104–105, 107, 160–170, 172, 174, 176, 181, 312, 337, 364, 390, 454, 497, 505, 509–510, 541–543, 548–550, 670, 673, 686, 748, 753, 760, 770, 772, 791, 793, 875–876, 908–909, 1010, 1017, 1089–1090, 1092
   _magentic.py5929184%64–73, 78, 82–93, 258, 269, 273, 293, 354, 363, 365, 407, 424, 433–434, 436–438, 440, 451, 594, 596, 636, 686, 722–724, 726, 736, 744–745, 812–815, 906, 912, 918, 960, 998, 1030, 1047, 1058, 1113–1114, 1118–1120, 1144, 1168–1169, 1182, 1198, 1220, 1265–1266, 1304–1305, 1469, 1472, 1481, 1484, 1489, 1540–1541, 1582–1583, 1631, 1661, 1719, 1733, 1744
   _orchestration_request_info.py570100% 
   _sequential.py92693%52, 164, 175, 181, 228, 263
TOTAL28351329488% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5672 30 💤 0 ❌ 0 🔥 1m 32s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes Python orchestration output semantics so workflow.as_agent().run(prompt) returns only the final “answer” (as an AgentResponse) while intermediate participant activity is surfaced via data events (instead of additional output events). It adds a single new knob (AgentExecutor(emit_intermediate_data=...)) and updates orchestrations, tests, and a sample to align with the clarified contract.

Changes:

  • Add AgentExecutor.emit_intermediate_data to emit parallel data events for AgentResponse / AgentResponseUpdate.
  • Update Sequential/Concurrent/GroupChat/Magentic/Handoff orchestrations to reserve output for terminal answers and use data for intermediate agent activity.
  • Update orchestration tests and a sample to validate/illustrate the new terminal vs intermediate event behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
python/samples/03-workflows/agents/sequential_workflow_as_agent.py Updates sample to reflect as_agent() returning only the final agent response and notes intermediate observation via data.
python/packages/orchestrations/tests/test_sequential.py Rewrites sequential tests to assert terminal output is last agent only and intermediate activity is data when enabled.
python/packages/orchestrations/tests/test_magentic.py Updates Magentic tests to expect AgentResponse terminal outputs and data events for intermediate updates.
python/packages/orchestrations/tests/test_handoff.py Updates Handoff tests to reflect no synthetic terminal output and per-agent outputs as the stream result.
python/packages/orchestrations/tests/test_group_chat.py Updates GroupChat tests to expect a single terminal AgentResponse from the orchestrator and intermediate agent updates as data.
python/packages/orchestrations/tests/test_concurrent.py Updates Concurrent tests to expect default aggregator returns AgentResponse with assistant messages only (no user prompt).
python/packages/orchestrations/agent_framework_orchestrations/_sequential.py Changes sequential wiring so the last agent executor is the workflow output executor; uses data events for intermediate participants.
python/packages/orchestrations/agent_framework_orchestrations/_orchestration_request_info.py Plumbs emit_intermediate_data through AgentApprovalExecutor into the inner AgentExecutor.
python/packages/orchestrations/agent_framework_orchestrations/_magentic.py Changes Magentic terminal output to AgentResponse; adds participant emit_intermediate_data plumbing; restricts output executor to orchestrator.
python/packages/orchestrations/agent_framework_orchestrations/_handoff.py Removes terminal yield; relies on per-agent output events as the observable result.
python/packages/orchestrations/agent_framework_orchestrations/_group_chat.py Changes GroupChat completion to yield an AgentResponse; uses participant emit_intermediate_data; restricts outputs to orchestrator.
python/packages/orchestrations/agent_framework_orchestrations/_concurrent.py Changes default aggregator to yield AgentResponse (assistant replies only) and uses participant emit_intermediate_data for intermediate observation.
python/packages/orchestrations/agent_framework_orchestrations/_base_group_chat_orchestrator.py Updates base orchestrator termination/max-round outputs to yield AgentResponse completion messages.
python/packages/core/tests/workflow/test_agent_executor.py Adds core tests asserting emit_intermediate_data produces data events in both streaming and non-streaming modes.
python/packages/core/agent_framework/_workflows/_agent_executor.py Implements emit_intermediate_data by emitting WorkflowEvent.emit(...)/type='data' alongside existing outputs.
python/packages/core/agent_framework/_workflows/_agent.py Updates WorkflowAgent to consume data events carrying AgentResponse / AgentResponseUpdate as part of agent responses.

Comment thread python/samples/03-workflows/agents/sequential_workflow_as_agent.py Outdated
Comment thread python/packages/orchestrations/agent_framework_orchestrations/_concurrent.py Outdated
Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 82%

✓ Correctness

This PR refactors orchestration output from list[Message] to AgentResponse and introduces emit_intermediate_data on AgentExecutor to surface intermediate participants via data events. The core mechanics are sound: output_executors is now always set (not conditioned on intermediate_outputs), intermediate agents emit data events, and the _agent.py adapter layer correctly processes both output and data events. The existing unresolved review comments remain valid (AgentApprovalExecutor as last participant in sequential, sample file stale imports/docstrings, data-event propagation through WorkflowExecutor, and docstring ordering in concurrent aggregator). I found no additional correctness bugs beyond those already raised.

✓ Security Reliability

This PR refactors orchestration outputs from list[Message] to AgentResponse and introduces emit_intermediate_data on AgentExecutor to surface intermediate participants as data events while reserving output events for the terminal answer. The design is clean and the output_executors / data-event split is well-implemented. The key security/reliability issues already identified in previous review comments (AgentApprovalExecutor as terminal participant producing no output, docstring ordering inaccuracy, data event forwarding in WorkflowExecutor) remain unresolved and still apply. One new concern: the handoff workflow removes all terminal yield_output calls, meaning a termination condition that fires before any agent speaks (e.g., from user input alone) will cause the workflow to go idle with zero output events — consumers relying on at least one output would see silent completion.

✓ Test Coverage

The PR introduces a significant refactoring of orchestration output contracts (from list[Message] to AgentResponse) and adds emit_intermediate_data wiring to surface per-agent responses as data events. Test coverage for sequential, group-chat, and magentic workflows is solid, with good new tests for non-streaming, streaming, intermediate outputs, and as_agent scenarios. However, there are notable gaps: ConcurrentBuilder's intermediate_outputs=True path is completely untested despite new wiring, the handoff async-termination test was weakened to a bare IDLE-state check, and sequential intermediate_outputs is only tested in non-streaming mode.


Automated review by moonbox3's agents

Comment thread python/packages/orchestrations/tests/test_concurrent.py
Copilot and others added 7 commits April 16, 2026 08:39
1. Sample cleanup: Remove commented-out FoundryChatClient block and update
   prerequisites to reference OPENAI_CHAT_MODEL_ID instead of FOUNDRY_* vars.

2. Sequential approval output: Change _EndWithConversation.end_with_agent_executor_response
   from a no-op sink to yield response.agent_response. When the last participant is
   AgentApprovalExecutor (via with_request_info), _EndWithConversation is the output
   executor so the yield produces the terminal answer. When the last participant is a
   regular AgentExecutor, _EndWithConversation is not in output_executors so the yield
   is silently filtered out.

3. Forward data events through WorkflowExecutor: _process_workflow_result now also
   forwards 'data' events from sub-workflows so that emit_intermediate_data=True on
   AgentExecutor works correctly when wrapped in AgentApprovalExecutor.

4. Concurrent docstring: Update _AggregateAgentConversations docstring to say
   'deterministic participant order' instead of 'completion order'.

5. Add test_concurrent_intermediate_outputs_emits_data_events verifying that
   ConcurrentBuilder(intermediate_outputs=True) emits per-participant data events
   alongside the single aggregated output event.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…outputs (microsoft#5301)

Address PR review comments 2, 3, and 5:

- Add test_sequential_request_info_last_participant_emits_output:
  Verifies that when the last participant is wrapped via with_request_info()
  (AgentApprovalExecutor), the workflow still emits a terminal output after
  approval, exercising the _EndWithConversation.end_with_agent_executor_response
  fallback path.

- Add test_sequential_request_info_with_intermediate_outputs_emits_data_events:
  Verifies that emit_intermediate_data=True works correctly through
  AgentApprovalExecutor wrapping—WorkflowExecutor._process_result already
  forwards data events from sub-workflows, so intermediate agent responses
  surface as data events in the parent workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#5301)

Update cast() calls in _group_chat.py and _magentic.py to use
WorkflowContext[Never, AgentResponse] instead of the old
WorkflowContext[Never, list[Message]], matching the updated method
signatures in _base_group_chat_orchestrator.py.

Fix _sequential.py _EndWithConversation.end_with_agent_executor_response
to declare WorkflowContext[Any, AgentResponse] so yield_output accepts
AgentResponse[None].

Fix _workflow_executor.py data event forwarding to handle nullable
executor_id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract event.data into a typed local variable before the isinstance
check to avoid pyright narrowing it to AgentResponse[Unknown].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#5301)

Add pyright: ignore[reportMissingImports] to orjson imports that are
already guarded by try/except ImportError, matching the existing pattern
used elsewhere in the samples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit on the env var names, otherwise good to go

Comment thread python/samples/03-workflows/agents/sequential_workflow_as_agent.py Outdated
Comment thread python/samples/03-workflows/agents/sequential_workflow_as_agent.py Outdated
Comment thread python/samples/03-workflows/agents/sequential_workflow_as_agent.py
Reverts the mistaken switch from FoundryChatClient to OpenAIChatClient
in the sequential workflow as agent sample.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lokitoth lokitoth added the needs_port_to_dotnet Indicate this item needs to also be done for .Net label Apr 16, 2026
Comment thread python/packages/core/agent_framework/_workflows/_agent_executor.py Outdated
Comment thread python/packages/core/agent_framework/_workflows/_agent.py
Comment thread python/packages/core/agent_framework/_workflows/_agent.py
… reasoning conversion

Layered on top of the prior review-feedback work in this branch.

Renames:
- AgentExecutor.emit_intermediate_data -> emit_data_events (mechanical
  rename; orchestration semantics live at the orchestration layer, not
  the general-purpose executor). Forwarded through MagenticAgentExecutor,
  AgentApprovalExecutor, and all orchestration call sites.
- HandoffAgentExecutor._check_terminate_and_yield -> _should_terminate
  (pure predicate; no longer yields anything). HandoffBuilder docstring
  rewritten to describe the new per-agent AgentResponse output contract.

WorkflowAgent reasoning-content conversion:
- Add _rewrite_text_to_reasoning(contents) and _msg_as_reasoning(msg)
  helpers; the as_agent() path now reframes text content from data events
  as text_reasoning Content blocks before merging into the AgentResponse.
- Consumers iterate msg.contents and branch on content.type — same path
  they already use for Claude thinking and OpenAI reasoning. No new
  field on Message/AgentResponse/WorkflowEvent.
- Streaming branch constructs fresh AgentResponseUpdate instances instead
  of mutating shared payloads (regression test added).
- Helper _msg_maybe_reasoning consolidates the conditional rewrite at
  three call sites in the non-streaming conversion.

Tests:
- TestWorkflowAgentReasoningHelpers + TestWorkflowAgentDataEventReasoningConversion
  add 9 new tests covering helpers, non-streaming, streaming, mixed content,
  already-reasoning passthrough, and mutation-safety regression.
- Updated test_sequential_as_agent_with_intermediate_outputs_includes_chain
  to assert text_reasoning content for intermediate agents.
@moonbox3 moonbox3 requested a review from TaoChenOSU April 17, 2026 04:05
The streaming conversion path narrowed event.data via isinstance against
generic AgentResponse, producing AgentResponse[Unknown] and tripping
reportUnknownVariableType/reportUnknownMemberType. Binding data: Any
before the check keeps runtime behavior identical while restoring a fully
known type for downstream access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs_port_to_dotnet Indicate this item needs to also be done for .Net python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants