C# HtmlAgilityPack 并选择节点和子节点

Question

提问by The Hyman

Hope somebody can help me.

希望有人可以帮助我。

Let′s say I have a html document that contains multiple divs like this example:

假设我有一个包含多个 div 的 html 文档，如下例所示：

<div class="search_hit">

    <span prop="name">Richard Winchester</span>
    <span prop="company">Kodak</span>
    <span prop="street">Arlington Road 1</span>

</div>
<div class="search_hit">

    <span prop="name">Ted Mosby</span>
    <span prop="company">HP</span>
    <span prop="street">Arlington Road 2</span>

</div>

I′m using HtmlAgilityPack to get the html document. What i need to know is how can i get the spans for each "search_hit"-div?

我正在使用 HtmlAgilityPack 来获取 html 文档。我需要知道的是如何获得每个“search_hit”-div 的跨度？

My first thought was something like this:

我的第一个想法是这样的：

foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
     foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("//span[@prop]"))
     {

     }
}

Each div should be a object with the included spans as properties. I. e.

每个 div 应该是一个包含跨度作为属性的对象。IE。

public class Record
    {
        public string Name { get; set; }
        public string company { get; set; }
        public string street { get; set; }
    }

And this List shall be filled then:

然后应填写此列表：

public List<Record> Results = new List<Record>();

But the XPATH i′m using is not doing a search in the subnode as it should do. It seams that it searches the whole document again and again.

但是我使用的 XPATH 并没有像它应该做的那样在子节点中进行搜索。它接缝一次又一次地搜索整个文档。

I mean I already got it working in that way that i just get the the spans of the whole page. But then i have no relation between the spans and divs. Means: I don′t know anymore which span is related to which div.

我的意思是我已经让它以这种方式工作，我只是得到了整个页面的跨度。但是我在跨度和 div 之间没有关系。意思是：我不知道哪个跨度与哪个 div 相关。

Does somebody know a solution? I already played around that much that i′m totally confused now :)

有人知道解决方案吗？我已经玩了那么多，我现在完全糊涂了:)

Any help is appreciated!

任何帮助表示赞赏！

Answer 1

采纳答案by shriek

The following works for me. The important bit is just as BeniBela noted to add a dot in second call to 'SelectNodes'.

以下对我有用。重要的一点正如 BeniBela 指出的那样，在第二次调用“SelectNodes”时添加一个点。

List<Record> lstRecords=new List<Record>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
  Record record=new Record();
  foreach (HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
  {
    string attributeValue = node2.GetAttributeValue("prop", "");
    if (attributeValue == "name")
    {
      record.Name = node2.InnerText;
    }
    else if (attributeValue == "company")
    {
      record.company = node2.InnerText;
    }
    else if (attributeValue == "street")
    {
      record.street = node2.InnerText;
    }
  }
  lstRecords.Add(record);
}

Answer 2

回答by BeniBela

If you use //, it searches from the document begin.

如果使用//，它将从文档开始搜索。

Use .//to search all from the current node

用于.//从当前节点搜索所有

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))

Or drop the prefix entirely to search just for direct children:

或者完全删除前缀以仅搜索直接子项：

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("span[@prop]"))

Answer 3

回答by Oscar Mederos

First of all, take a look at this: Html Agility Pack - Problem selecting subnode

首先看看这个：Html Agility Pack - Problem selection subnode

Here is a full working solution for your question:

这是您问题的完整工作解决方案：

IList<Record> results = new List<Record>();
foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var record = new Record();
    record.Name = node.SelectSingleNode(".//span[@prop='name']").InnerText;
    record.company = node.SelectSingleNode(".//span[@prop='company']").InnerText;
    record.street = node.SelectSingleNode(".//span[@prop='street']").InnerText;
    results.Add(record);
}

If you read the question I pointed you to, you will see that doing ./span[@prop='name']is exactly the same, since those spannodes are (direct) children of the divnode.

如果您阅读了我向您指出的问题，您会发现这样做./span[@prop='name']是完全相同的，因为这些span节点是该div节点的（直接）子节点。

If the spannodes do not have those propattributes, and you want to assign them depending on the order they appear, you can do:

如果span节点没有这些prop属性，并且您想根据它们出现的顺序分配它们，您可以执行以下操作：

foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var spanNodes = node.SelectNodes("./span");
    var record = new Record();
    record.Name = spanNodes[0].InnerText;
    record.company = spanNodes[1].InnerText;
    record.street = spanNodes[2].InnerText;
    results.Add(record);
}

Answer 4

回答by The Hyman

Shame on me :)

为我感到羞耻:)

All of you were right.

你们都是对的。

I found the problem. This NullReferenceException kept nagging me so I spent more time to look at it in detail. In between all those divs there was one div with the same "class='search-hit'" attribute but without the spans inside. Thats why it throughs an error at the second loop.

我发现了问题。这个 NullReferenceException 一直困扰着我，所以我花了更多的时间来详细研究它。在所有这些 div 之间，有一个 div 具有相同的“class='search-hit'”属性，但内部没有跨度。这就是为什么它在第二个循环中出现错误的原因。

foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//span[@prop]/ancestor::div[@class='search_hit']"))
   {
        Record rec = new Record();
        foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
           {
           }
           rList.Results.Add(rec);
   }

The code above is working.

上面的代码正在工作。

Thank you guys for your time and help!

谢谢你们的时间和帮助！

Answer 5

回答by ibrahim ozboluk

I used that. class convert id

我用过那个。类转换id

  HtmlNodeCollection nodes = dokuman.DocumentNode.SelectNodes("//div[@id='search_hit']//span[@prop]");


            for (int i = 0; i < nodes .Count; i++)
        {
            var record = new Record();


                record.Name = links[i].InnerText;   results.Add(record);  }

C# HtmlAgilityPack 并选择节点和子节点

提问by The Hyman

采纳答案by shriek

回答by BeniBela

回答by Oscar Mederos

回答by The Hyman

回答by ibrahim ozboluk

相关推荐

最近更新

标签

C# HtmlAgilityPack 并选择节点和子节点

提问by The Hyman

采纳答案by shriek

回答by BeniBela

回答by Oscar Mederos

回答by The Hyman

回答by ibrahim ozboluk

相关推荐

C# 如何找到数组中的最大差异

C# 解决“最大请求长度超出”和 FileUpload 单次上传

C# 异常。文件正被另一个进程使用

C# 将项目添加到字典中的列表

相关推荐

最近更新

标签