gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652
gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652mjbommar wants to merge 1 commit intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Trivial self-reference is broken. Pushing fix in a moment |
8b1fa09 to
7c214ea
Compare
|
@serhiy-storchaka - should I force push to trigger a retest? There was a flakey upstream apt failure |
|
I re-ran it. |
|
Try to load |
Good catch! I found a few other weird cases that regress:
|
|
There are many other examples, it was just the simplest case. It seems you searched examples by some kind of fuzz generator (or just asked LLM?). You cannot be sure that that you will cover all cases. But they can be found by simple combinatorics. I'll add more tests. Some of cases can be created in Python, others (like self-containing tuple) can only be artificially constructed marshal data. We should support the former, because they can occur in real program. We should prevent crashes and creation of inconsistent data structures when loading the latter. I am not sure that changing |
I ran a simple test with
I'm beginning to appreciate your last sentence here...I think a proper fix for this would likely end up causing a performance impact. If you want a PR to expand the test coverage so no one makes the same mistake I did, I'd be happy to submit it for now and defer the design questions to you all. |
|
See #148700 |
PR summary
Fix a deterministic SIGSEGV in
marshal.loads()caused byTYPE_TUPLE,TYPE_LIST,TYPE_DICT, andTYPE_SETregistering containers inp->refsviaR_REF()before their slots are populated. A crafted payload with aTYPE_REFback-reference to the partial container can reach a hashing site (PySet_Addcallingtuplehash), which callsPyObject_Hash(NULL)on unfilled slots.The fix adopts the existing two-phase
r_ref_reserve()/r_ref_insert()pattern already used byTYPE_FROZENSET,TYPE_CODE, andTYPE_SLICE. ThePy_Noneplaceholder inp->refsis detected by the existingTYPE_REFhandler at line 1675, which raisesValueError("bad marshal data (invalid reference)")instead of crashing.16-byte reproducer:
Includes regression tests for tuple, list, set, and dict self-reference payloads.
Originally filed as GHSA-m7gv-g5p9-9qqq; PSRT assessed as outside the security threat model (marshal.loads is documented as not secure against malicious data). Converting to public issue + PR per their guidance.
First time committer introduction
I use Python extensively in data science and legal tech. I found this while building an automated vulnerability scanner seeded from prior CPython CVE fix diffs (CVE-2018-20406, CVE-2022-48560). Happy to iterate on the patch.
AI Disclosure
Claude Code (Anthropic) assisted with grepping the codebase, drafting the byte-stream reproducer, and constructing the regression tests. I reviewed and understand all code changes. The fix follows the existing
r_ref_reserve/r_ref_inserttwo-phase pattern already used by TYPE_FROZENSET/TYPE_CODE/TYPE_SLICE in the same file.