Quickly scan a website, extract entire content (text, html or markdown) or specific divs or spans, save as csv or json
Version: 4.4.1WebScraper uses the Integrity v6 Engine to quickly scan a website, and can output the data (currently) as csv or json.
License: Free To Try $12.00
Operating System: Mac OS X
The output can include various meta data, the entire content of each page (as text, html or markdown) and can extract parts of the pages (currently a named class, id or itemprop of divs, spans, dd's or p's).
Webscraper is new. Please use it for free and please get in touch with any requests, bug reports or observations.
Easy to scan a site - just enter the starting url and press Go Easy to export - checkboxes for the columns you want Plenty of options / configuration
Configuration of various limits on the crawl and the output file size
Version 4.4.1: dark-mode-ready Fixes bug that could result in column information (complex setup) becoming misaligned after dragging and dropping to reorder the columns. Other small fixes
Version 4.4.0: dark-mode-ready Adds 'crawl above starting directory' control Fixes some issues with markdown generation Improves the scan 'blacklist / whitelist rules' in the UI Other small fixes
Version 4.1.1: Adds capability of downloading images to a folder during the scan. Adds option to filter output file Allows editing of your table columns Also allows re-ordering of columns Unifies the helper windows
Version 2.0.3: Fixes problem causing scan to sometimes continue with the previous scan Fixes a problem with scanning locally (file://) Fixes helpers not working with local html files Fixes problem with scrolling within the class helper Adds field 'information page' alongside blacklist and whitelist fields
Version 2.0.2: Improved navigation Adds save/load a project More efficient programme flow, can scan larger sites Many smaller fixes and enhancements
Version 1.4.3: Important fix to the crawling engine around auto-detection of whether starting url is a page or directory in ambiguous cases (this affects the scope of the scan)
Version 1.4.2: - Changes to the interface, neater and more user-friendly - Fixes crash/hang when app is unlicensed and fewer than 5 pages have been scanned - Fixes not always recognising an end comment - Fixes occasional issue, changes to starting url field not being recognised right away