Skip to content
Felix Boehm edited this page Jan 4, 2024 · 25 revisions

Usage

const parser = new htmlparser.Parser(handler /*: Object */, options /*?: Object */);

Streaming

parser.write(chunk);
// ...
parser.write(chunk);
/ ...
parser.end();

Note: It is recommended to never disable the decodeEntities option (as described below) to the constructor; see #105 for a potential issue.

Note: When streaming non-ASCII data, make sure to use a StringDecoder in order to prevent certain non ASCII characters from being chopped apart.

Events

Names for the keys of the handler object. Only functions are valid values (the parser will break otherwise).

  • onopentag(name /*: string */, attributes /*: { [attributeName: string]: string } */)
  • onopentagname(name /*: string */)
  • onattribute(name /*: string */, value /*: string */)
  • ontext(text /*: string */)
  • onclosetag(name /*: string */)
  • onprocessinginstruction(name /*: string */, data /*: string */)
  • oncomment(data /*: string */)
  • oncommentend()
  • oncdatastart()
  • oncdataend()
  • onerror(error /*: Error */)
  • onreset()
  • onend()

Methods

write (alias: parseChunk)

Parses a chunk of data and calls the corresponding callbacks.

end (alias: done)

Parses the end of the buffer and clears the stack, calls onend.

reset

Resets buffer & stack, calls onreset.

parseComplete

Resets the parser, parses the data & calls end.

Option: xmlMode

Indicates whether special tags (<script> and <style>) should get special treatment and if "empty" tags (eg. <br>) can have children. If false, the content of special tags will be text only.

For feeds and other XML content (documents that don't consist of HTML), set this to true. Default: false.

Option: decodeEntities

If set to true, entities within the document will be decoded. Defaults to true.

Option: lowerCaseTags

If set to true, all tags will be lowercased. If xmlMode is disabled, this defaults to true.

Option: lowerCaseAttributeNames

If set to true, all attribute names will be lowercased. If xmlMode is disabled, this defaults to true.

Option: recognizeCDATA

If set to true, CDATA sections will be recognized as text even if the xmlMode option is not enabled. NOTE: If xmlMode is set to true then CDATA sections will always be recognized as text.

Option: recognizeSelfClosing

If set to true, self-closing tags will trigger the onclosetag event even if xmlMode is not set to true. NOTE: If xmlMode is set to true then self-closing tags will always be recognized.