theArchivist is a web-crawler software that enables users to download and archive web content for future reference.
The program starts with a particular URL and retrieves the web page. It then scans it for links and makes attempts to retrieve all files linked to the page in a repetitive manner. This process continues until one of several stop criteria is reached.
Moreover, TheArchivist includes an advanced function that allows users to rewrite absolute URLs relative to the download hierarchy, making the archive a self-sufficient unit. This function ensures that all your files are organized and accessible.
In the latest release, TheArchivist addresses several issues. It has corrected parsing of javascript function references and added php and asp to recognized "html" file extensions. Additionally, the software fixed a bug that truncated crawls done without the "legal servers" restriction, making it more reliable and efficient to use.
Version 2.1x (b10): N/A