Parse HTML in Java

This code example shows how to parse HTML in Java by using jsoup. As there are many libraries for various purposes, there are a lot of html parser in Java. A lot of developers wonder which one is the best before they made a decision on an HTML parser. Jsoup is a very good start.

The following Java code accepts a url, finds elements by class name and finds all available links in the page.

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
 
public class Main{
	public static void main(String[] args) throws IOException {
 
		Document doc = Jsoup.connect("http://www.programcreek.com").get();
		Elements titles = doc.select(".entrytitle");
 
		//print all titles in main page
		for(Element e: titles){
			System.out.println("text: " +e.text());
			System.out.println("html: "+ e.html());
		}	
 
		//print all available links on page
		Elements links = doc.select("a[href]");
		for(Element l: links){
			System.out.println("link: " +l.attr("abs:href"));
		}
 
	}
}

You can download the jsoup Java html parser by simply google searching “jsoup”.

5 thoughts on “Parse HTML in Java”

  1. Richard Dickinson. This is because your class path is not correct. i follow the same steps and got this error. i was running project with this command java -cp target/htmlLParser-1.0-SNAPSHOT.jar com.fatBas.com.Main i was getting error because of -cp was not defined. then i run the class from main .java by right clicking on main.java . it work . hope this help

  2. ClassNotFoundException: org.jsoup.Jsoup …. easy solution, download JSoup (search google), and add it as a library in your project.

  3. I’ve probably made an error compiling but when I try this I get these errors:

    java Main
    Exception in thread “main” java.lang.NoClassDefFoundError: org/jsoup/Jsoup
    at Main.main(Main.java:34)
    Caused by: java.lang.ClassNotFoundException: org.jsoup.Jsoup
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    … 1 more

    any ideas?

Leave a Comment