High performance API for web scraping to scrape any website and get the HTML (browser rendering) with browser instructions
If you need to scrape a website, Piloterr’s Website Crawler API is the perfect solution. It’s a high-performance API that can scrape any website and return the HTML content. You can use it to scrape any website, including those with JavaScript rendering, and bypass Cloudflare, Akamai, PerimeterX, and DataDome. The API is easy to use and requires only a website URL and an API key. You can also set parameters to control the API’s behavior, such as following redirects, waiting for JavaScript to render, and waiting for specific elements to appear in the DOM. The API is perfect for web scraping, data extraction, and content analysis.
For security and ethical reasons, Piloterr implements a comprehensive filtering system for websites accessible through this endpoint. This measure helps prevent misuse and ensures compliance with legal and ethical standards. If you wish to use this endpoint, please contact our support team to request the addition of your domain to our allowlist. We’ll review each request carefully, considering factors such as content appropriateness, legal compliance, and potential impact on our infrastructure. This process helps maintain the quality and reliability of our service for all users.
This parameter specifies the private key you’ll need for Piloterr access.
A website URL with either the http
or https
protocol.
Some code-heavy websites need time to fully “render”. To direct Piloterr to wait before it returns the fully rendered HTML, use the wait parameter with a value in seconds between 0
and 30
. The Piloterr headless browsers will then wait the duration of the time set in seconds before returning the page’s HTML.
It’s sometimes necessary to wait for a particular element to appear in the DOM before Piloterr returns the HTML content. Our headless browsers will wait for the CSS / Xpath selector passed in the parameter before returning the HTML. For example:
wait_for=#loading-container
wait_for=.content-loaded
wait_for=div.main-content#user-profile
By default, Piloterr does not block ads. Set this parameter to true
to block ads.
Set this parameter to the maximum number of seconds Piloterr will wait for the page to load.
An array of browser navigation instructions to execute during page rendering. This allows you to control scrolling and other browser actions to trigger dynamic content loading or simulate human behavior.
Currently supports two instruction types:
scroll
scroll_to_bottom
Please refer to the Browser Instructions section for more details.
This functionality allows executing navigation instructions in the browser to control scrolling and other actions.
scroll
)Allows scrolling the page by a precise number of pixels.
Parameters:
type
: "scroll"
x
: Number of pixels to scroll horizontally (positive = right, negative = left)y
: Number of pixels to scroll vertically (positive = down, negative = up)duration
: (optional) Duration in seconds for smooth scrolling (default: 0 = instant)wait_time_s
: (optional) Wait time in seconds after the instruction (default: 0)Example:
scroll_to_bottom
)Automatically scrolls to the bottom of the page.
Parameters:
type
: "scroll_to_bottom"
duration
: (optional) Duration in seconds for smooth scrolling (default: 0 = instant)wait_time_s
: (optional) Wait time in seconds after the instruction (default: 0)Example:
Add the browser_instructions
parameter to your request with a list of instructions:
The duration
parameter controls the smoothness of scrolling:
duration: 0
(default): Instant scroll, immediate movement to final positionduration: > 0
: Progressive scroll over the specified duration
timeout
if using significant durationsduration > 0
) takes more time but better simulates human behavior