Skip to content

workaround: task use-after-free on MSVC due to symmetric transfer codegen bug#2020

Closed
lixin-wei wants to merge 1 commit intoNVIDIA:mainfrom
lixin-wei:fix_msvc
Closed

workaround: task use-after-free on MSVC due to symmetric transfer codegen bug#2020
lixin-wei wants to merge 1 commit intoNVIDIA:mainfrom
lixin-wei:fix_msvc

Conversation

@lixin-wei
Copy link
Copy Markdown
Contributor

@lixin-wei lixin-wei commented Apr 12, 2026

The Problem

I met a segment fault on MSVC (_MSC_VER 1944)

The root cause seems to be a bug of MSVC's code generation in symmetric transfer. It generates something like:

// MSVC's compiler-generated pseudocode:
handle = awaiter.await_suspend(current_coro);
current_coro.frame->__temp = handle;   // store into suspended frame
handle.resume();                        // regular call, not tail-jump

That intermediate write back into the suspended coroutine's frame is the problem. In stdexec's task, await_suspend calls __completed(), which destroys the coroutine frame (via __sink(task)) as part of its cleanup. When await_suspend returns, the frame is already freed — but MSVC still writes the returned handle into it, causing a write-after-free crash.

I've searched around but this bug seems not fixed yet.

The Workaround

This PR makes a workaround by disabling symmetric transfer in MSVC.
But this will cause the test_task_awaits_inline_sndr_without_stack_overflow to fail, which tests symmetric transfer. So I also disabled this test in MSVC.

MSVC does not implement symmetric transfer as a true tail call (DevCom
#10454102).  When await_suspend returns coroutine_handle<>, MSVC writes
the returned handle back into the suspended coroutine's frame before
transferring to it.  In task<T>'s __completed_awaiter, __completed()
frees the coroutine frame (via __sink, or indirectly through the
set_value/set_error/set_stopped completion chain), so that write hits
freed memory -- a use-after-free that causes crashes under concurrent
task completions.

Fix: on MSVC, use the void-returning await_suspend overload and call
resume() explicitly.  This avoids the stale write entirely.

Trade-off: this loses MSVC's (already non-tail-call) symmetric transfer
in __completed_awaiter, so deeply nested co_await chains of task<T>
within task<T> will grow the call stack O(N) instead of O(1).  The
stack overflow test iteration count is reduced on MSVC accordingly.

Also adds two regression tests that exercise concurrent task completion
paths (flat spawn_future + recursive tree) to cover the bug.

Made-with: Cursor
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@lixin-wei
Copy link
Copy Markdown
Contributor Author

it's fixed in 1950, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant