Crawl and distill page trees

Start from one URL, limit depth, preserve parent-child structure, and inspect cleaned content page by page.

New Job

Create crawl

The root page Skrapp should start from.

Depth 0 keeps only the root page.

Upper bound for accepted pages.

Optional settings

Restrict crawling to one subtree. Leave empty to infer from the start URL.

Comma-separated prefixes to skip during discovery.

Overview

How Skrapp works

1

Discover pages

Runs a BFS crawl, keeps accepted pages in scope, and stores the page tree.

2

Extract raw content

Renders each page, captures the main content, and preserves the raw markdown for review.

3

Distill content

Scores blocks generically by page type, position, density, and repetition across the crawl.

Recent Jobs

Latest activity

Job Status URL Discovered Succeeded Created
No jobs yet.