JS的HTML/XML/RSS解析器 - NodeHtmlParser

jopen 12年前

node-htmlparser 是一个 JavaScript 的 HTML/XML/RSS 解析器。

A forgiving HTML/XML/RSS parser written in JS for both the browser and NodeJS (yes, despite the name it works just fine in any modern browser). The parser can handle streams (chunked data) and supports custom handlers for writing custom DOMs/output。

示例代码:

var htmlparser = require("htmlparser");  var rawHtml = "Xyz <script language= javascript>var foo = '<<bar>>';< /  script><!--<!-- Waah! -- -->";  var handler = new htmlparser.DefaultHandler(function (error, dom) {      if (error)          [...do something for errors...]      else          [...parsing done, do something...]  });  var parser = new htmlparser.Parser(handler);  parser.parseComplete(rawHtml);  sys.puts(sys.inspect(handler.dom, false, null));

项目主页:http://www.open-open.com/lib/view/home/1338389924948