HTML2fo is a converter that transforms HTML into XSL:FO format.
Some of you may be wondering why you would need a separate tool to accomplish this when you can simply use an XSLT. However, HTML2fo is particularly useful because it can convert documents that are not XML conforming. This matters because it can be difficult to edit or manipulate HTML files if they are not properly formatted.
HTML2fo supports several features including non-well-formed HTML-code. While the code will not be processed correctly, you will still get an output. This can be particularly helpful if you use a bad WYSIWYM-editor like Word for editing HTML-files. The feature may not work at all depending on the severity of the issue, and in some cases, it may result in a core dump.
It also supports tables including colspans with an automatic column width setting. If a non-"colspan"ed cell has a width setting, the corresponding column gets the width. Within the second run, it tries to calculate the width from col-spanned cells. The remaining space is divided through the rest of the columns - this will happen for tables without a column with information. Rowspans are fully supported, including in combination with colspans. Borders aren't supported for individual cells in HTML, but you can decide whether every cell has a border or none. Background color is also supported.
Regarding font information, HTML2fo supports size, style (bold, italic, underline), and color. Links, both internal and external, are also supported. A combination like referrered_file.html#marker is converted to an external reference. A reference to an .htm or .html file is converted to .pdf unless the basename is the same as the converted file.
The latest release of HTML2fo is primarily a bug-fixing release. Changes were made for the page-break and html lang properties as well as the img tag. A memory access violation bug and some other bugs were also fixed.
Version 0.4.2: N/A