C# Html Agility Pack 循环遍历表的行和列

Question

提问by mpora

I have a table like this

我有一张这样的桌子

<table border="0" cellpadding="0" cellspacing="0" id="table2">
    <tr>
        <th>Name
        </th>
        <th>Age
        </th>
    </tr>
        <tr>
        <td>Mario
        </td>
        <th>Age: 78
        </td>
    </tr>
            <tr>
        <td>Jane
        </td>
        <td>Age: 67
        </td>
    </tr>
            <tr>
        <td>James
        </td>
        <th>Age: 92
        </td>
    </tr>
</table>

And want to use HTML Agility Pack to parse it. I have tried this code to no avail:

并想使用 HTML Agility Pack 来解析它。我试过这段代码无济于事：

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
{
    foreach (HtmlNode col in row.SelectNodes("//td"))
    { 
        Response.Write(col.InnerText); 
    }
}

What am I doing wrong?

我究竟做错了什么？

Answer 1

采纳答案by mpora

I had to provide the full xpath. I got the full xpath by using Firebug from a suggestion by @Coda (https://stackoverflow.com/a/3104048/1238850) and I ended up with this code:

我必须提供完整的 xpath。我从@Coda ( https://stackoverflow.com/a/3104048/1238850)的建议中使用 Firebug 获得了完整的 xpath，最后得到了以下代码：

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr/td/table[@id='table2']/tbody/tr"))
{
    HtmlNodeCollection cells = row.SelectNodes("td");
    for (int i = 0; i < cells.Count; ++i)
    {
        if (i == 0)
        { Response.Write("Person Name : " + cells[i].InnerText + "<br>"); }
        else {
            Response.Write("Other attributes are: " + cells[i].InnerText + "<br>"); 
        }
    }
}

I am sure it can be written way better than this but it is working for me now.

我相信它可以写得比这更好，但它现在对我有用。

Answer 2

回答by agentnega

Why don't you just select the tds directly?

为什么不td直接选择s？

foreach (HtmlNode col in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))
    Response.Write(col.InnerText);

Alternately, if you really need the trs separately for some other processing, drop the //and do:

或者，如果您确实需要tr单独使用s 进行其他处理，请删除//并执行以下操作：

foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
    foreach (HtmlNode col in row.SelectNodes("td"))
        Response.Write(col.InnerText);

Of course that will only work if the tds are direct children of the trs but they should be, right?

当然，这只有在tds 是 s 的直接子级时才有效，tr但它们应该是，对吧？

EDIT:

编辑：

var cols = doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
    string name = cols[ii].InnerText.Trim();
    int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}

There's probably a more impressive way to do this with LINQ.

使用 LINQ 可能有一种更令人印象深刻的方法来做到这一点。

Answer 3

回答by Cristian Lupascu

I've run the code and it displays only the Names, which is correct, because the Agesare defined using invalid HTML: <th></td>(probably a typo).

我已经运行了代码，它只显示了Names，这是正确的，因为Ages是使用无效的 HTML 定义的：（<th></td>可能是一个错字）。

By the way, the code can be simplified to only one loop:

顺便说一句，代码可以简化为只有一个循环：

foreach (var cell in doc.DocumentNode.SelectNodes("//table[@id='table2']/tr/td"))
{
    Response.Write(cell.InnerText);
}

Here's the code I used to test: http://pastebin.com/euzhUAAh

这是我用来测试的代码：http: //pastebin.com/euzhUAAh

Answer 4

回答by Nader Vaghari

I did the same project with this:

我做了同样的项目：

        private List<PhrasalVerb> ExtractVerbsFromMainPage(string content)
    {
        var verbs =new List<PhrasalVerb>(); ;
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(content);
        var rows = doc.DocumentNode.SelectNodes("//table[@class='idioms-table']//tr");
        rows.RemoveAt(0); //remove header
        foreach (var row in rows)
        {
            var cols = row.SelectNodes("td");
            verbs.Add(new PhrasalVerb { 
            Uid = Guid.NewGuid(),
            Name = cols[0].InnerHtml,
            Definition = cols[1].InnerText,
            Count =int.TryParse(cols[2].InnerText,out _) == true ? Convert.ToInt32(cols[2].InnerText) : 0
            });
        }
        return verbs;
    }

C# Html Agility Pack 循环遍历表的行和列

提问by mpora

采纳答案by mpora

回答by agentnega

回答by Cristian Lupascu

回答by Nader Vaghari

相关推荐

最近更新

标签

C# Html Agility Pack 循环遍历表的行和列

提问by mpora

采纳答案by mpora

回答by agentnega

回答by Cristian Lupascu

回答by Nader Vaghari

相关推荐

C# 从另一个中减去一个通用列表

C# 带有制表符分隔文本文件的 StreamReader

C# FileStream.close() 不会为其他进程释放文件

使用 C# 从 Google Chrome 获取当前标签的 URL

相关推荐

最近更新

标签