feat(rust): add configurable size guardrails by ayush00git · Pull Request #3579 · apache/fory

ayush00git · 2026-04-16T18:01:20Z

Why?

To prevent excessive allocation from malicious untrusted payloads in the Rust runtime.

What does this PR do?

This brings the Rust implementation into parity with the C++ runtime by introducing configurable guardrails for binary sizes and collection counts.

Related issues

#3409

AI Contribution Checklist

AI Usage Disclosure (only when substantial AI assistance = yes):

If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.

The implementation looks solid, matches the C++ reference implementation accurately, and is ready for the main branch. I've reviewed your code changes across config.rs, context.rs, and the serializer module, verifying edge cases and consistency.

Here are the key aspects that verify the correctness of the Rust implementation:

Guardrail Logic Parity with C++:

max_collection_size correctly intercepts untrusted payload lengths before large iterations or memory allocations occur in Vec, HashMap, BTreeMap, and other collection variants. This perfectly encapsulates the C++ read_collection_data_slow patterns.
max_binary_size accurately targets raw-byte slice operations across primitive_list.rs without inappropriately restricting types like String, mirroring how C++ omits this check inside string_serializer.h.
Accurate Error Propagation:

The addition of Error::SizeLimitExceeded works brilliantly with the new Fory configuration model. You're properly halting deserialization and passing clear payload dimensions to Error::size_limit_exceeded, ensuring graceful failure without panics or memory exhaustion.
Validation and Pre-Allocation Consistency:

Checking max_collection_size just prior to enforcing buffer limits inside check_collection_len and check_map_len is well-ordered. By prioritizing the runtime limit checks, you avoid false OOMs or deep iterations if Fory receives heavily corrupted payload headers.

Does this PR introduce any user-facing change?

Does this PR introduce any public API change? ((Yes, adds Fory configuration methods max_binary_size() and max_collection_size(), as well as Error::SizeLimitExceeded))
Does this PR introduce any binary protocol compatibility change?

Benchmark

ayush00git · 2026-04-16T18:12:13Z

Hey @chaokunyang
Have a look at the implementation and let me know the changes.
I don't write rust, but these issues (including streaming deserialization support) were pending from a long time, so I used AI to code but I have reviewed it.
Duplicate PR - #3421

chaokunyang · 2026-04-17T02:41:55Z

@ayush00git Could you run benchmarks/rust and compare with main branch?

ayush00git · 2026-04-17T05:44:17Z

main branch -

feat/rust-sizeguards branch -

ayush00git · 2026-04-17T05:47:45Z

some areas like MediaContentList serialization/deserialization, Sample serialization and StructList deserialization are showing regressions averagely of around 20%, i'll investigate these ones. most probably this is due to field type validations.

ayush00git · 2026-04-17T08:14:17Z

StructList and MediaContentList serialize calls still shows around 10% regression

feat/rust-sizeguards

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 68.2      | 122.5         | fory    |
| Struct           | Deserialize | 37.9      | 64.8          | fory    |
| Sample           | Serialize   | 102.9     | 566.3         | fory    |
| Sample           | Deserialize | 162.6     | 868.7         | fory    |
| MediaContent     | Serialize   | 219.4     | 332.2         | fory    |
| MediaContent     | Deserialize | 280.4     | 599.8         | fory    |
| StructList       | Serialize   | 192.0     | 606.2         | fory    |
| StructList       | Deserialize | 143.2     | 444.3         | fory    |
| SampleList       | Serialize   | 391.3     | 4002.1        | fory    |
| SampleList       | Deserialize | 1279.0    | 4939.9        | fory    |
| MediaContentList | Serialize   | 856.0     | 2501.9        | fory    |
| MediaContentList | Deserialize | 1676.1    | 3206.9        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,665,552 | 8,161,267    | fory    |
| Struct           | Deserialize | 26,369,222 | 15,434,242   | fory    |
| Sample           | Serialize   | 9,721,007  | 1,765,880    | fory    |
| Sample           | Deserialize | 6,151,575  | 1,151,198    | fory    |
| MediaContent     | Serialize   | 4,558,716  | 3,010,144    | fory    |
| MediaContent     | Deserialize | 3,565,952  | 1,667,167    | fory    |
| StructList       | Serialize   | 5,208,605  | 1,649,702    | fory    |
| StructList       | Deserialize | 6,985,191  | 2,250,883    | fory    |
| SampleList       | Serialize   | 2,555,323  | 249,869      | fory    |
| SampleList       | Deserialize | 781,861    | 202,433      | fory    |
| MediaContentList | Serialize   | 1,168,170  | 399,696      | fory    |
| MediaContentList | Deserialize | 596,623    | 311,828      | fory    |

main

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 67.5      | 123.3         | fory    |
| Struct           | Deserialize | 38.3      | 63.4          | fory    |
| Sample           | Serialize   | 101.4     | 561.7         | fory    |
| Sample           | Deserialize | 165.6     | 919.2         | fory    |
| MediaContent     | Serialize   | 213.0     | 332.2         | fory    |
| MediaContent     | Deserialize | 281.9     | 568.0         | fory    |
| StructList       | Serialize   | 175.2     | 678.8         | fory    |
| StructList       | Deserialize | 141.8     | 453.0         | fory    |
| SampleList       | Serialize   | 448.6     | 3831.5        | fory    |
| SampleList       | Deserialize | 1347.9    | 4977.6        | fory    |
| MediaContentList | Serialize   | 759.1     | 2429.7        | fory    |
| MediaContentList | Deserialize | 1665.3    | 3674.4        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,815,693 | 8,109,642    | fory    |
| Struct           | Deserialize | 26,132,177 | 15,766,902   | fory    |
| Sample           | Serialize   | 9,864,852  | 1,780,215    | fory    |
| Sample           | Deserialize | 6,040,471  | 1,087,903    | fory    |
| MediaContent     | Serialize   | 4,695,056  | 3,009,782    | fory    |
| MediaContent     | Deserialize | 3,547,861  | 1,760,563    | fory    |
| StructList       | Serialize   | 5,707,437  | 1,473,231    | fory    |
| StructList       | Deserialize | 7,052,684  | 2,207,652    | fory    |
| SampleList       | Serialize   | 2,229,008  | 260,994      | fory    |
| SampleList       | Deserialize | 741,895    | 200,900      | fory    |
| MediaContentList | Serialize   | 1,317,402  | 411,573      | fory    |
| MediaContentList | Deserialize | 600,492    | 272,153      | fory    |

…ed buffer space

ayush00git · 2026-04-17T14:02:06Z

@chaokunyang
The benches are now fine, just MediaContentList on serialization calls is showing a regression of around 6%, rest every parameter is either in noise or improved.

updated feat/rust-sizeguards bench -

Timing Results (nanoseconds)

Datatype	Operation	fory (ns)	protobuf (ns)	Fastest
Struct	Serialize	74.2	138.3	fory
Struct	Deserialize	38.4	69.8	fory
Sample	Serialize	102.5	578.6	fory
Sample	Deserialize	164.8	881.1	fory
MediaContent	Serialize	229.7	330.2	fory
MediaContent	Deserialize	287.9	523.5	fory
StructList	Serialize	172.4	652.9	fory
StructList	Deserialize	144.6	413.2	fory
SampleList	Serialize	399.5	3725.5	fory
SampleList	Deserialize	1269.1	5017.5	fory
MediaContentList	Serialize	808.5	2440.5	fory
MediaContentList	Deserialize	1646.0	3546.2	fory

Throughput Results (ops/sec)

Datatype	Operation	fory TPS	protobuf TPS	Fastest
Struct	Serialize	13,468,195	7,231,181	fory
Struct	Deserialize	26,035,565	14,319,262	fory
Sample	Serialize	9,754,194	1,728,250	fory
Sample	Deserialize	6,069,066	1,134,996	fory
MediaContent	Serialize	4,352,936	3,028,376	fory
MediaContent	Deserialize	3,473,790	1,910,220	fory
StructList	Serialize	5,801,474	1,531,581	fory
StructList	Deserialize	6,917,064	2,419,960	fory
SampleList	Serialize	2,503,317	268,420	fory
SampleList	Deserialize	787,960	199,302	fory
MediaContentList	Serialize	1,236,889	409,752	fory
MediaContentList	Deserialize	607,533	281,992	fory

ayush00git · 2026-04-17T14:10:23Z

@chaokunyang
Do we plan to roll out this PR and the streaming deserialization support (which is still pending) in the v0.17.0 release ?

feat(rust): add configurable size guardrails

7e31257

ayush00git requested review from chaokunyang and theweipeng as code owners April 16, 2026 18:01

feat: added size guard tests

8e058cf

fix: moved size check errors to a cold helper

a4f38ad

fix: move config.clone() to creation closure and defined a pre-reserv…

91c120d

…ed buffer space

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust): add configurable size guardrails#3579

feat(rust): add configurable size guardrails#3579
ayush00git wants to merge 4 commits intoapache:mainfrom
ayush00git:feat/rust-sizeguards

ayush00git commented Apr 16, 2026 •

edited

Loading

Uh oh!

ayush00git commented Apr 16, 2026

Uh oh!

chaokunyang commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ayush00git commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

Related issues

AI Contribution Checklist

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

ayush00git commented Apr 16, 2026

Uh oh!

chaokunyang commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

ayush00git commented Apr 17, 2026

Timing Results (nanoseconds)

Throughput Results (ops/sec)

Uh oh!

ayush00git commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ayush00git commented Apr 16, 2026 •

edited

Loading