Skip to content

Convert heavy queries from 5xx to 4xx#7374

Open
eeldaly wants to merge 21 commits intocortexproject:masterfrom
eeldaly:query-4xx
Open

Convert heavy queries from 5xx to 4xx#7374
eeldaly wants to merge 21 commits intocortexproject:masterfrom
eeldaly:query-4xx

Conversation

@eeldaly
Copy link
Copy Markdown
Contributor

@eeldaly eeldaly commented Mar 24, 2026

What this PR does:
This PR introduces a timeout in querier (default 59s) to timeout before we hit timeout. Once this is hit, we convert queries that took longer than X (default 40s) PromQL evaluation time from 5XX to 4XX. This conversion and 1s earlier timeout is disabled by default.

Default new configs:
querier.timeout-classification-enabled: false
querier.timeout-classification-deadline: 59s
querier.timeout-classification-eval-threshold: 40s

Response Outputs:

Current output on timeout:

'Response code: 504\n'
{'Date': 'Thu, 26 Mar 2026 20:59:06 GMT', 'Content-Type': 'text/plain', 'Content-Length': '24', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'ee98f602-651c-4783-946e-f547a8d88e32', 'server': 'amazon'}
''
upstream request timeout

New output on timeout (Less than 40s PromQL evaluation time):

'Response code: 504\n'
{'Date': 'Thu, 26 Mar 2026 22:21:05 GMT', 'Content-Type': 'application/json', 'Content-Length': '75', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'ec88ba9b-7d21-45f4-8996-418b2e7b812b', 'server': 'amazon', 'vary': 'Origin'}
''
{"status":"error","errorType":"timeout","error":"upstream request timeout"}

New output on timeout (More than 40s PromQL evaluation time):

'Response code: 422\n'
{'Date': 'Thu, 26 Mar 2026 21:05:46 GMT', 'Content-Type': 'application/json', 'Content-Length': '138', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'bf5b69b7-8d11-4e9e-95be-09fbfa017bbd', 'server': 'amazon', 'vary': 'Origin'}
''
{"status":"error","errorType":"execution","error":"query timed out: query spent too long in evaluation - consider simplifying your query"}

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
eeldaly added 2 commits March 26, 2026 15:24
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
eeldaly added 8 commits March 30, 2026 15:59
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
@eeldaly eeldaly marked this pull request as ready for review March 31, 2026 16:59
@dosubot dosubot bot added component/querier type/feature type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating. labels Mar 31, 2026
eeldaly added 4 commits March 31, 2026 11:11
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
# The total time before the querier proactively cancels a query for timeout
# classification.
# CLI flag: -querier.timeout-classification-deadline
[timeout_classification_deadline: <duration> | default = 59s]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can align some of the defaults with the default engine timeout in Cortex?

Copy link
Copy Markdown
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last ask from me is to add a doc talking about the usecase and how users can use this feature

eeldaly and others added 6 commits April 14, 2026 09:44
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <60357054+eeldaly@users.noreply.github.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/querier size/XL type/feature type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants