TagSoup is a SAX2 parser written in Java.
Version: 1.0.5TagSoup is a SAX2 parser written in Java that, instead of parsing well-formed or valid XML. Tag Soup parses HTML as it is found in the wild: nasty and brutish, though quite often far from short.
Operating System: Linux
By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. It is a parser, not a whole application; it isn't intended to permanently clean up bad HTML, as HTML Tidy does, only to parse it on the fly.
The following options are understood:
· Java 1.4.2 or later
What's New in This Release:
· The main issue was with HTML comments, which were very badly broken: any > character would terminate one, so commenting out elements did not work properly.
· Everything should now be correct.
· Everyone should update who possibly can.
· Additionally, nnnn (with capital X) now works, some debugging code was removed from PYXWriter, a Unicode BOM at the beginning of a document is skipped, and the new version of Saxon is supported as an XSLT processor.
· Documentation has been added on SAX features and properties specific to TagSoup.