C# Html Agility Pack 循环遍历表的行和列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14968729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Html Agility Pack loop through table rows and columns
提问by mpora
I have a table like this
我有一张这样的桌子
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<th>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<th>Age: 92
</td>
</tr>
</table>
And want to use HTML Agility Pack to parse it. I have tried this code to no avail:
并想使用 HTML Agility Pack 来解析它。我试过这段代码无济于事:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
{
foreach (HtmlNode col in row.SelectNodes("//td"))
{
Response.Write(col.InnerText);
}
}
What am I doing wrong?
我究竟做错了什么?
采纳答案by mpora
I had to provide the full xpath. I got the full xpath by using Firebug from a suggestion by @Coda (https://stackoverflow.com/a/3104048/1238850) and I ended up with this code:
我必须提供完整的 xpath。我从@Coda ( https://stackoverflow.com/a/3104048/1238850)的建议中使用 Firebug 获得了完整的 xpath,最后得到了以下代码:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr/td/table[@id='table2']/tbody/tr"))
{
HtmlNodeCollection cells = row.SelectNodes("td");
for (int i = 0; i < cells.Count; ++i)
{
if (i == 0)
{ Response.Write("Person Name : " + cells[i].InnerText + "<br>"); }
else {
Response.Write("Other attributes are: " + cells[i].InnerText + "<br>");
}
}
}
I am sure it can be written way better than this but it is working for me now.
我相信它可以写得比这更好,但它现在对我有用。
回答by agentnega
Why don't you just select the td
s directly?
为什么不td
直接选择s?
foreach (HtmlNode col in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))
Response.Write(col.InnerText);
Alternately, if you really need the tr
s separately for some other processing, drop the //
and do:
或者,如果您确实需要tr
单独使用s 进行其他处理,请删除//
并执行以下操作:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
foreach (HtmlNode col in row.SelectNodes("td"))
Response.Write(col.InnerText);
Of course that will only work if the td
s are direct children of the tr
s but they should be, right?
当然,这只有在td
s 是 s 的直接子级时才有效,tr
但它们应该是,对吧?
EDIT:
编辑:
var cols = doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
string name = cols[ii].InnerText.Trim();
int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}
There's probably a more impressive way to do this with LINQ.
使用 LINQ 可能有一种更令人印象深刻的方法来做到这一点。
回答by Cristian Lupascu
I've run the code and it displays only the Names, which is correct, because the Agesare defined using invalid HTML: <th></td>
(probably a typo).
我已经运行了代码,它只显示了Names,这是正确的,因为Ages是使用无效的 HTML 定义的:(<th></td>
可能是一个错字)。
By the way, the code can be simplified to only one loop:
顺便说一句,代码可以简化为只有一个循环:
foreach (var cell in doc.DocumentNode.SelectNodes("//table[@id='table2']/tr/td"))
{
Response.Write(cell.InnerText);
}
Here's the code I used to test: http://pastebin.com/euzhUAAh
这是我用来测试的代码:http: //pastebin.com/euzhUAAh
回答by Nader Vaghari
I did the same project with this:
我做了同样的项目:
private List<PhrasalVerb> ExtractVerbsFromMainPage(string content)
{
var verbs =new List<PhrasalVerb>(); ;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(content);
var rows = doc.DocumentNode.SelectNodes("//table[@class='idioms-table']//tr");
rows.RemoveAt(0); //remove header
foreach (var row in rows)
{
var cols = row.SelectNodes("td");
verbs.Add(new PhrasalVerb {
Uid = Guid.NewGuid(),
Name = cols[0].InnerHtml,
Definition = cols[1].InnerText,
Count =int.TryParse(cols[2].InnerText,out _) == true ? Convert.ToInt32(cols[2].InnerText) : 0
});
}
return verbs;
}