-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(brightdata): add Bright Data integration with 8 tools #4183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
6191590
feat(brightdata): add Bright Data integration with 8 tools
waleedlatif1 d710533
fix(brightdata): address PR review feedback
waleedlatif1 12fed93
lint
waleedlatif1 703058c
fix(agiloft): change bgColor to white; fix docs truncation
waleedlatif1 bf0286a
fix(brightdata): avoid inner quotes in description to fix docs genera…
waleedlatif1 eb50282
fix(brightdata): disable incompatible DuckDuckGo and Yandex URL params
waleedlatif1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| --- | ||
| title: Bright Data | ||
| description: Scrape websites, search engines, and extract structured data | ||
| --- | ||
|
|
||
| import { BlockInfoCard } from "@/components/ui/block-info-card" | ||
|
|
||
| <BlockInfoCard | ||
| type="brightdata" | ||
| color="#FFFFFF" | ||
| /> | ||
|
|
||
| ## Usage Instructions | ||
|
|
||
| Integrate Bright Data into the workflow. Scrape any URL with Web Unlocker, search Google and other engines with SERP API, discover web content ranked by intent, or trigger pre-built scrapers for structured data extraction. | ||
|
|
||
|
|
||
|
|
||
| ## Tools | ||
|
|
||
| ### `brightdata_scrape_url` | ||
|
|
||
| Fetch content from any URL using Bright Data Web Unlocker. Bypasses anti-bot protections, CAPTCHAs, and IP blocks automatically. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `zone` | string | Yes | Web Unlocker zone name from your Bright Data dashboard \(e.g., "web_unlocker1"\) | | ||
| | `url` | string | Yes | The URL to scrape \(e.g., "https://example.com/page"\) | | ||
| | `format` | string | No | Response format: "raw" for HTML or "json" for parsed content. Defaults to "raw" | | ||
| | `country` | string | No | Two-letter country code for geo-targeting \(e.g., "us", "gb"\) | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `content` | string | The scraped page content \(HTML or JSON depending on format\) | | ||
| | `url` | string | The URL that was scraped | | ||
| | `statusCode` | number | HTTP status code of the response | | ||
|
|
||
| ### `brightdata_serp_search` | ||
|
|
||
| Search Google, Bing, DuckDuckGo, or Yandex and get structured search results using Bright Data SERP API. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `zone` | string | Yes | SERP API zone name from your Bright Data dashboard \(e.g., "serp_api1"\) | | ||
| | `query` | string | Yes | The search query \(e.g., "best project management tools"\) | | ||
| | `searchEngine` | string | No | Search engine to use: "google", "bing", "duckduckgo", or "yandex". Defaults to "google" | | ||
| | `country` | string | No | Two-letter country code for localized results \(e.g., "us", "gb"\) | | ||
| | `language` | string | No | Two-letter language code \(e.g., "en", "es"\) | | ||
| | `numResults` | number | No | Number of results to return \(e.g., 10, 20\). Defaults to 10 | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `results` | array | Array of search results | | ||
| | ↳ `title` | string | Title of the search result | | ||
| | ↳ `url` | string | URL of the search result | | ||
| | ↳ `description` | string | Snippet or description of the result | | ||
| | ↳ `rank` | number | Position in search results | | ||
| | `query` | string | The search query that was executed | | ||
| | `searchEngine` | string | The search engine that was used | | ||
|
|
||
| ### `brightdata_discover` | ||
|
|
||
| AI-powered web discovery that finds and ranks results by intent. Returns up to 1,000 results with optional cleaned page content for RAG and verification. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `query` | string | Yes | The search query \(e.g., "competitor pricing changes enterprise plan"\) | | ||
| | `numResults` | number | No | Number of results to return, up to 1000. Defaults to 10 | | ||
| | `intent` | string | No | Describes what the agent is trying to accomplish, used to rank results by relevance \(e.g., "find official pricing pages and change notes"\) | | ||
| | `includeContent` | boolean | No | Whether to include cleaned page content in results | | ||
| | `format` | string | No | Response format: "json" or "markdown". Defaults to "json" | | ||
| | `language` | string | No | Search language code \(e.g., "en", "es", "fr"\). Defaults to "en" | | ||
| | `country` | string | No | Two-letter ISO country code for localized results \(e.g., "us", "gb"\) | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `results` | array | Array of discovered web results ranked by intent relevance | | ||
| | ↳ `url` | string | URL of the discovered page | | ||
| | ↳ `title` | string | Page title | | ||
| | ↳ `description` | string | Page description or snippet | | ||
| | ↳ `relevanceScore` | number | AI-calculated relevance score for intent-based ranking | | ||
| | ↳ `content` | string | Cleaned page content in the requested format \(when includeContent is true\) | | ||
| | `query` | string | The search query that was executed | | ||
| | `totalResults` | number | Total number of results returned | | ||
|
|
||
| ### `brightdata_sync_scrape` | ||
|
|
||
| Scrape URLs synchronously using a Bright Data pre-built scraper and get structured results directly. Supports up to 20 URLs with a 1-minute timeout. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `datasetId` | string | Yes | Dataset scraper ID from your Bright Data dashboard \(e.g., "gd_l1viktl72bvl7bjuj0"\) | | ||
| | `urls` | string | Yes | JSON array of URL objects to scrape, up to 20 \(e.g., \[\{"url": "https://example.com/product"\}\]\) | | ||
| | `format` | string | No | Output format: "json", "ndjson", or "csv". Defaults to "json" | | ||
| | `includeErrors` | boolean | No | Whether to include error reports in results | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `data` | array | Array of scraped result objects with fields specific to the dataset scraper used | | ||
| | `snapshotId` | string | Snapshot ID returned if the request exceeded the 1-minute timeout and switched to async processing | | ||
| | `isAsync` | boolean | Whether the request fell back to async mode \(true means use snapshot ID to retrieve results\) | | ||
|
|
||
| ### `brightdata_scrape_dataset` | ||
|
|
||
| Trigger a Bright Data pre-built scraper to extract structured data from URLs. Supports 660+ scrapers for platforms like Amazon, LinkedIn, Instagram, and more. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `datasetId` | string | Yes | Dataset scraper ID from your Bright Data dashboard \(e.g., "gd_l1viktl72bvl7bjuj0"\) | | ||
| | `urls` | string | Yes | JSON array of URL objects to scrape \(e.g., \[\{"url": "https://example.com/product"\}\]\) | | ||
| | `format` | string | No | Output format: "json" or "csv". Defaults to "json" | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `snapshotId` | string | The snapshot ID to retrieve results later | | ||
| | `status` | string | Status of the scraping job \(e.g., "triggered", "running"\) | | ||
|
|
||
| ### `brightdata_snapshot_status` | ||
|
|
||
| Check the progress of an async Bright Data scraping job. Returns status: starting, running, ready, or failed. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `snapshotId` | string | Yes | The snapshot ID returned when the collection was triggered \(e.g., "s_m4x7enmven8djfqak"\) | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `snapshotId` | string | The snapshot ID that was queried | | ||
| | `datasetId` | string | The dataset ID associated with this snapshot | | ||
| | `status` | string | Current status of the snapshot: "starting", "running", "ready", or "failed" | | ||
|
|
||
| ### `brightdata_download_snapshot` | ||
|
|
||
| Download the results of a completed Bright Data scraping job using its snapshot ID. The snapshot must have ready status. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `snapshotId` | string | Yes | The snapshot ID returned when the collection was triggered \(e.g., "s_m4x7enmven8djfqak"\) | | ||
| | `format` | string | No | Output format: "json", "ndjson", "jsonl", or "csv". Defaults to "json" | | ||
| | `compress` | boolean | No | Whether to compress the results | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `data` | array | Array of scraped result records | | ||
| | `format` | string | The content type of the downloaded data | | ||
| | `snapshotId` | string | The snapshot ID that was downloaded | | ||
|
|
||
| ### `brightdata_cancel_snapshot` | ||
|
|
||
| Cancel an active Bright Data scraping job using its snapshot ID. Terminates data collection in progress. | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `apiKey` | string | Yes | Bright Data API token | | ||
| | `snapshotId` | string | Yes | The snapshot ID of the collection to cancel \(e.g., "s_m4x7enmven8djfqak"\) | | ||
|
|
||
| #### Output | ||
|
|
||
| | Parameter | Type | Description | | ||
| | --------- | ---- | ----------- | | ||
| | `snapshotId` | string | The snapshot ID that was cancelled | | ||
| | `cancelled` | boolean | Whether the cancellation was successful | | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
| "attio", | ||
| "box", | ||
| "brandfetch", | ||
| "brightdata", | ||
| "browser_use", | ||
| "calcom", | ||
| "calendly", | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.