scRUBYt! is a web extraction framework in Ruby that is both user-friendly and potent. It is effortless to learn and use, making it an excellent choice for anyone looking to gather data from websites.
ScRUBYt! is a powerful web extraction framework written in Ruby. It is easy to learn and use, allowing you to navigate through web pages, extract data of interest, query, transform, and save it using the concise DSL provided by this framework.
About Ruby:
Ruby is a popular dynamic, reflective, general-purpose programming language that originated in Japan in the mid-1990s. Developed and designed by Yukihiro "Matz" Matsumoto, Ruby combines syntax inspired by Perl with Smalltalk-like features. It supports multiple programming paradigms, including functional, object-oriented and imperative, and features a dynamic type system and automatic memory management. Ruby is similar to other programming languages such as Python, Perl, Lisp, Dylan, and CLU. It is currently implemented in C as a single-pass interpreted language.
What's new in this release:
This release introduces several new features and improvements to ScRUBYt!:
- Script pattern: the ability to evaluate custom functions on the input of the pattern
- Constant pattern: the possibility to add constant patterns with the syntax: pattern 'Hello world', :type => :constant
- Text pattern: this forms the foundation for the new output method to_flat_xml, which allows for the creation of feed-like flat XMLs instead of hierarchical ones. To_flat_xml with spec delimiters splits up the concatenated hash results.
- Change in semantics for "div[stuff]" style examples: divs that contain "stuff" (rather than their whole text is "stuff") are matched. Generalization is false by default.
- Possibility to define arbitrary delimiters for to_hash (used when the result contains commas)
- Changes in the logging module: logging has been extracted into a separate class to allow for filtering, and the logger can now be set to nil (to disable logging). Logging must now be explicitly enabled.
- Changes in the download pattern: possibility to specify an array of files that should be ignored during the downloading (e.g. 'nopicture.gif'), handling of timeouts during downloads to prevent crashing, and fixed downloading for more URL types that were not working before.
- Entirely new test suite using rcov with continuous additions to achieve full coverage.
- Fixed the infamous regex bug that caused the pricegrabber scenario to fail.
- Do not evaluate the detail pattern twice.
- Fixed dependencies (namely parse_tree_reloaded).
Requirements:
ScRUBYt! requires Ruby to run.
Conclusion:
ScRUBYt! is a reliable and efficient software framework for web extraction, offering an easy-to-use DSL and a range of features and improvements to enhance the user experience. This latest release introduces several new capabilities that further enhance the platform's usefulness, reliability, and versatility.
Version 0.3.4: N/A