java 从 HTML 表中提取数据并转换为 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27560039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract data from a HTML table and convert to JSON
提问by Miller
I have a HTML table that I want to parse and convert to JSON.
我有一个要解析并转换为 JSON 的 HTML 表。
<table cellspacing="0" style="height: 24px;">
<tr class="tr-hover">
<th rowspan="15" scope="row">Network</th>
<td class="ttl"><a href="network-bands.php3">Technology</a></td>
<td class="nfo"><a href="#" class="link-network-detail collapse">GSM</a></td>
</tr>
<tr class="tr-toggle">
<td class="ttl"><a href="network-bands.php3">2G bands</a></td>
<td class="nfo">GSM 900 / 1800 - SIM 1 & SIM 2</td>
</tr>
<tr class="tr-toggle">
<td class="ttl"><a href="glossary.php3?term=gprs">GPRS</a></td>
<td class="nfo">Class 12</td>
</tr>
<tr class="tr-toggle">
<td class="ttl"><a href="glossary.php3?term=edge">EDGE</a></td>
<td class="nfo">Yes</td>
</tr>
</table>
In the above table
在上表中
<th rowspan="15" scope="row">Network</th>
JSON array name should be "Network".
JSON 数组名称应为“网络”。
<td class="ttl"><a href="network-bands.php3">Technology</a></td>
Technology is a subheading of Network, so it must be a JSON element inside the JSON array. The values coming inside Technology array should be the values from
Technology 是 Network 的子标题,因此它必须是 JSON 数组中的 JSON 元素。Technology 数组中的值应该是来自
<td class="nfo"><a href="#" class="link-network-detail collapse">GSM</a></td>
I hope my question is clear. How can i do that?
我希望我的问题很清楚。我怎样才能做到这一点?
回答by Jared Rummler
Here is an answer using Jsoupand JSONas dependencies:
final String HTML = "<table cellspacing=\"0\" style=\"height: 24px;\">\r\n<tr class=\"tr-hover\">\r\n<th rowspan=\"15\" scope=\"row\">Network</th>\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">Technology</a></td>\r\n<td class=\"nfo\"><a href=\"#\" class=\"link-network-detail collapse\">GSM</a></td>\r\n</tr>\r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">2G bands</a></td>\r\n<td class=\"nfo\">GSM 900 / 1800 - SIM 1 & SIM 2</td>\r\n</tr> \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=gprs\">GPRS</a></td>\r\n<td class=\"nfo\">Class 12</td>\r\n</tr> \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=edge\">EDGE</a></td>\r\n<td class=\"nfo\">Yes</td>\r\n</tr>\r\n</table>";
Document document = Jsoup.parse(HTML);
Element table = document.select("table").first();
String arrayName = table.select("th").first().text();
JSONObject jsonObj = new JSONObject();
JSONArray jsonArr = new JSONArray();
Elements ttls = table.getElementsByClass("ttl");
Elements nfos = table.getElementsByClass("nfo");
JSONObject jo = new JSONObject();
for (int i = 0, l = ttls.size(); i < l; i++) {
String key = ttls.get(i).text();
String value = nfos.get(i).text();
jo.put(key, value);
}
jsonArr.put(jo);
jsonObj.put(arrayName, jsonArr);
System.out.println(jsonObj.toString());
Output (formatted):
输出(格式化):
{
"Network": [
{
"2G bands": "GSM 900 / 1800 - SIM 1 & SIM 2",
"Technology": "GSM",
"GPRS": "Class 12",
"EDGE": "Yes"
}
]
}
回答by Vaishnav Raghunathan
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
public class Test1
{
public static void main(String[] args)
{
TableElements("https://www.w3schools.com/html/html_tables.asp","customers");
}
public static void TableElements(String link,String id) `<br>
{
StringBuilder b = new StringBuilder();
//Provide the ChromeDriver location
System.setProperty("webdriver.chrome.driver", "C:/Users/xyz/Desktop/Ecllipse/chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get(link);
WebElement element;
//Getting the Table code
` try
{
element = driver.findElement(By.id(id));
`}
`catch(Exception e)
`{
`element = driver.findElement(By.className(id));
`}
`String html = element.getAttribute("innerHTML");
//Formatting to Table code to Html
`b.append(html);
`b.insert(0,"<html><body><table>");
`b.append("</table></body></html>");
`Document document = Jsoup.parse(b.toString());
`Element table = document.select("table").first();
//Selecting Th and Td
`JSONArray jsonArr = new JSONArray();
`Elements ttls = table.getElementsByTag("th");
`Elements nfos = table.getElementsByTag("td");
`String key = "";
`String value = "";
//Adding Td to Th in JSON array
`for(int i=0,j=0;i<nfos.size();i++,j++)
`{`<br>
`if(j<ttls.size())
`{`<br>
`key = ttls.get(j).text();
`value = nfos.get(i).text();
`}
`else
`{
`j=0;
`key = ttls.get(j).text();
`value = nfos.get(i).text();
`}`<br>
`JSONObject jo = new JSONObject();
`try
`{
`jo.put(key, value);
`}`<br>
`catch (JSONException e)
`{
`System.out.println("Unable to add objects to Json Array!");
`}`<br>
`jsonArr.put(jo);`<br>
`}`<br>
`String ji = "";`<br>
`int j = 0;`<br>
//Converting JSON array to Character array and removing unwanted characters
`for (char ch: jsonArr.toString().toCharArray()) `<br>
`{`<br>
`if(ch == '}')`<br>
`{`<br>
`j++;`<br>
`if(j%ttls.size() != 0)`<br>
`ch = ' ';`<br>
`}`<br>
`else if(ch == '{')`<br>
`{`<br>
`if(j%ttls.size() != 0)`<br>
`ch = ' ';`<br>
`}`<br>
`ji+=ch;`<br>
`}`<br>
`System.out.println(ji);`<br>
`driver.close();`<br>
`}`<br>
}