fix(arun_many): implement total_timeout and global dispatcher watchdog to prevent hangs (#1914) by DevAbdullah90 · Pull Request #1923 · unclecode/crawl4ai

DevAbdullah90 · 2026-04-16T11:43:36Z

Problem

Issue #1914 reported that arun_many() could hang indefinitely despite having a page_timeout set. Investigation revealed that page_timeout only covers the initial navigation (page.goto), but does not protect against hangs in:

Long-running JavaScript execution (js_code).
Complex CSS extraction or structural parsing.
Occasional browser/context hangs during post-processing.

Because the dispatchers were simply awaiting the arun() task, any hang within a single URL would block a concurrency slot indefinitely and could eventually stall the entire batch.

Solution

This PR introduces a global watchdog at the dispatcher level and a new configuration parameter to provide strict timeout guarantees for every URL in a batch.

New Configuration: Added total_timeout (ms) to CrawlerRunConfig. This defines the maximum allowed time for the entire crawl operation (navigation + JS + extraction).
Dispatcher Watchdog:
- Updated both MemoryAdaptiveDispatcher and SemaphoreDispatcher to wrap the arun call in an asyncio.wait_for watchdog.
- Fallback Logic: If total_timeout is not explicitly provided, the watchdog defaults to page_timeout + 30 seconds. This ensures that even legacy configurations benefit from this safety margin.
Graceful Error Handling:
- Catch asyncio.TimeoutError at the dispatcher level.
- Log a clear TIMEOUT status with self.crawler.logger.error_status.
- Return a failed CrawlResult with a descriptive message: "Crawl task exceeded total timeout of X seconds".
- This allows the rest of the batch to continue processing uninterrupted.

Changes

Modified crawl4ai/async_configs.py: Added total_timeout parameter.
Modified crawl4ai/async_dispatcher.py: Implemented asyncio.wait_for in both dispatchers.
Added tests/unit/test_dispatcher_timeout.py: New unit tests for explicit and fallback timeout enforcement.
Updated pyproject.toml: Moved pytest and pytest-asyncio to the dev dependency group.

Verification Results

Reproduction Case: Verified with a script that injected a while(true){} hang in js_code.
- Before: Hung indefinitely.
- After: Timed out gracefully according to the configured total_timeout and returned the summary correctly.
Unit Tests: Ran the new test suite; both tests passed.

Fixes #1914

uv run python -m pytest tests/unit/test_dispatcher_timeout.py
# 2 passed in 6.65s

…g to prevent hangs (Issue unclecode#1914)

fix(arun_many): implement total_timeout and global dispatcher watchdo…

4e5511c

…g to prevent hangs (Issue unclecode#1914)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(arun_many): implement total_timeout and global dispatcher watchdog to prevent hangs (#1914)#1923

fix(arun_many): implement total_timeout and global dispatcher watchdog to prevent hangs (#1914)#1923
DevAbdullah90 wants to merge 1 commit intounclecode:mainfrom
DevAbdullah90:fix/arun-many-timeout

DevAbdullah90 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DevAbdullah90 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Verification Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevAbdullah90 commented Apr 16, 2026 •

edited

Loading