java基础之“在后端使用爬虫Jsoup工具根据标签id获取字符串中的标签html代码(java后端实现前端根据标签id获取标签对象)”
一.场景
在电商项目中产品描述时必不可少的存在,每个不同的项目所需的描述不同,不能一概而论
在产品的描述中的部分数据是我们所需要的,如价格,尺码表等
如何在不依靠前端的前提下,完成数据的提取就成了问题
二.思路
首先看产品描述的存储方式:我这边是直接整个以字符串存储在表字段中,
尽然是字符串,那我们就能使用Jsoup工具类来获取Document对象(也可以用其他的方案)
再用getElementById("标签id")方法获取标签对象
因为我这里是直接要标签对象(包括html标签)
所以我直接toString()既可,如果是要内部的内容,不要html标签,就用test()方法
三.需要获取的结果
三.代码
/**
* 功能描述: 实现在java中根据字符串中的标签id获取对应的标签对象
*
* @author 王子威
*/
@Test public void extractChart() { // 产品描述:假数据 String desc = "\n" + "啊啊啊啊
\n" + "\n" + "\n" + "
\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "
Size Label Size Bust Waist Length Height 100 56cm/22.0 23cm/9.1 11cm/4.3 36cm/14.2 36cm/14.2 100 56cm/22.0 23cm/9.1 11cm/4.3 36cm/14.2 36cm/14.2
\n" + "
\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "\n" + "
Size:100 Label Size:56cm/22.0 Bust:23cm/9.1 Waist:11cm/4.3 Length:36cm/14.2 Height:36cm/14.2 Size:100 Label Size:56cm/22.0 Bust:23cm/9.1 Waist:11cm/4.3 Length:36cm/14.2 Height:36cm/14.2 \n" + "\n" + "
\n" + ""; // 获取Document对象 Document doc = Jsoup.parse(desc); // 根据
标签中的id获取标签对象 Element elementById1 = doc.getElementById("sizechart-template1"); Element elementById2 = doc.getElementById("sizechart-template2"); // 标签转String String a = elementById1.toString(); System.out.println("a = " + a); String b = elementById2.toString(); System.out.println("b = " + b); // 获取内容 String text = elementById1.text(); System.out.println("text = " + text); }