A configurable web traversal engine
Version: 0.025WWW::Robot is a configurable web traversal engine (for web robots & agents).
License: Perl Artistic License
Operating System: Linux
$robot = new WWW::Robot(
'NAME' => 'MyRobot',
'VERSION' => '1.000',
'EMAIL' => 'email@example.com'
# ... configure the robot's operation ...
$robot->run( 'http://www.foobar.com/' );
This module implements a configurable web traversal engine, for a robot or other web agent. Given an initial web page (URL), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit.
Features of the Robot module include:
* Follows the Robot Exclusion Protocol.
* Supports the META element proposed extensions to the Protocol.
* Implements many of the Guidelines for Robot Writers.
* Builds on standard Perl 5 modules for WWW, HTTP, HTML, etc.
A particular application (robot instance) has to configure the engine using hooks, which are perl functions invoked by the Robot engine at specific points in the control loop.
The robot engine obeys the Robot Exclusion protocol, as well as a proposed addition. See "SEE ALSO" for references to documents describing the Robot Exclusion protocol and web robots.