【jsoup】html解析


Java HTML Parser

字符串解析为xml文档,作用输入是什么样子的片断,输出业务什么样子的

Document doc = Jsoup.parse(html, "", Parser.xmlParser());
System.out.println(doc.html());

片断

hello

Document doc = Jsoup.parse(html, "", Parser.xmlParser());结果
hello
Document doc = Jsoup.parse(html);结果
hello

字符串解析为文档

String html = "First html parse

Parsed HTML into a doc.

"; Document doc = Jsoup.parse(html); System.out.println(doc.html());

字符串解析为片断

String html = "

Lorem ipsum.

"; Document doc = Jsoup.parseBodyFragment(html); Element body = doc.body(); System.out.println(body.html());

从url加载文档

Document doc = Jsoup.connect("http://www.lianhu.gov.cn/").get();
String title = doc.title();
System.out.println(title);
构建特殊请求
Document doc = Jsoup.connect("http://www.lianhu.gov.cn/")
        .data("query", "Java")
        .userAgent("Mozilla")
        .cookie("auth", "token")
        .timeout(3000)
        .post();

从文件加载文档

File input = new File("D:/deya/vhost/zizhou/index.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
System.out.println(doc.html());