Html文档解析器 HtmlCleaner

jopen 12年前
     <a href="/misc/goto?guid=4959499543027097061"> <img border="0" alt="Html文档解析器 HtmlCleaner" src="https://simg.open-open.com/show/8f7be556bdf74662b8e12a915d9deeb6.jpg" width="198" height="53" /> </a>    <br /> HtmlCleaner是一个开源的Java语言的Html文档解析器。HtmlCleaner能够重新整理HTML文档的每个元素并生成结构良好 (Well-Formed)的 HTML 文档。默认它遵循的规则是类似于大部份web浏览器为创文档对象模型所使用的规则。然而,用户可以提供自定义tag和规则组来进行过滤和匹配。    <br />    <h3>功能特性:</h3>    <ul>     <li>HtmlCleaner parses input HTML and generates tree-structure suitable for programmatic manipulation.</li>     <li>Serializers are responsible for outputting the DOM structure to XML, HTML, DOM or JDom.</li>     <li>Parsing phase relies on tag descriptions which can be customized by the user.</li>     <li>HtmlClaner's behaviour can be configured through number of parameters.</li>     <li>HtmlClaner is thread safe, meaning that single instance can clean multiple html sources at the same time.</li>     <li>HtmlClaner can be used from Java code, from command line or as Ant task.</li>     <li>HtmlClaner requires JRE 1.5+.</li>    </ul>    <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324371733999" target="_blank">http://www.open-open.com/lib/view/home/1324371733999</a></p>