Python and Tidy.
» Tagged as: Python
yum install tidy
yum install python-tidyThen it's time to visit the documentation page, which isn't really very useful. Here is how you filter a document through tidy.
options = dict(output_xhtml=1, add_xml_decl=1, indent=1, tidy_mark=0) tidyDoc = tidy.parse(basedir + file, **options) Tidy's output can be used as input to create a DOM Document. domDoc = parseString(tidyDoc.__str__()) One thing I found rather annoying about tidy is that it doesn't recognize the <dt> tag.