jsoup 1.8.1 发布，极大的性能提升！

jopen 12年前

jsoup 1.8.1 发布啦！

jsoup 1.8.1 显著提升了文本和树序列化的性能；可以选择 HTML 或者 XML 输出；还有大量的功能改进和 bug 修复。此版本现已提供下载。

更新内容如下：

改进

可以选择 HTML 或者 XML 输出，默认是 HTML
Element.text() 性能改进
Element.html() 性能改进
缩短文件读的时间，同时也改进了文件解析器，提升大概 10% 的速度
添加 Element.cssSelector()
Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec.
如果禁用了 pretty-print，将不会去除 Element.html() 以外的空格
HTML Cleaner 中允许基础白名单中带有 span 标签，relaxed whitelist 中带有 span 和 div 标签
放松 doctype 验证，可以不指定名称
CSS Selectors 支持 quoted 属性值

Bug 修复

Fixed an issue where <svg><img/></svg> was parsed as <svg><image/></svg>
Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify a charset, and the HTML body did, leading to the head contents incorrectly being parsed into the body. Changed the behavior so that when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with.
Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element.
Fixed an issue for Java 1.5 / Android 2.2 compatibility, and verify it doesn't regress.
Fixed an issue that would throw an NPE when trying to set invalid HTML into a title element.
Fixed support for nth-of-type selectors with unknown tags.
Added support for application/*+xml mimetypes.
Fixed support for allowing script tags in cleaner whitelists.

更多内容请看发行说明。

jsoup 是一款 Java 的HTML 解析器，可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API，可通过DOM，CSS以及类似于JQuery的操作方法来取出和操作数据。

jsoup的主要功能如下：

jsoup是基于MIT协议发布的，可放心使用于商业项目。

来自：http://www.oschina.net/news/55684/jsoup-1-8-1