WWWGrab is a web scraper software that extracts and generates data from web pages. It utilizes DTBuild transformations to parse information and performs URL scans and SQL database operations. The tool is designed to facilitate database generation from URL lists.
With the ability to handle sequences of URL scans and SQL database operations, WWWGrab allows for multiple passes over data generated at runtime. It is noteworthy that WWWGrab parsers for data transformation can be created using DTBuild data transformation workshop. When WWWGrab gets a web page, it sends it to the DTBuild engine, which transforms the page with the specified parser.
WWWGrab is exceedingly user-friendly and is governed by a list of tasks specified in a database. The tool comes with two types of tasks: scanning a URL list and executing an SQL list. As such, the user has the liberty to combine multiple URL scans and SQL executions in a task list for a more automated and simplified approach.
One possible example of the combined flexibility of WWWGrab and DTBuild is as follows: scan an initial list of URLs, generate a new list of URLs, modify the generated URL list with SQL, scan the generated+modified URL list, generate another URL list, and so on.
The features of WWWGrab/DTBuild are abundant and practical, which enhance its overall functionality. The tool boasts recursive capabilities enabling parsing of nested HTML/XML tags, and comments, among others. Additionally, its Wide-string (Unicode) input/output capability, ODBC interface displaying database layout information to the user, and trace mode for debugging make it a top-drawer web scraping tool.
It is essential to note that the customizable User-defined function interface of WWWGrab permits the execution of custom DLL code. Moreover, configuration assistance is available to make the tool even more accessible and user-friendly. For more information about WWWGrab's potential, consult DTBuild help.
Version 1.33: N/A
Version 1.31: New options, bug fixes
Version 1.27: New options, bug fixes