在 C# 中解析 HTML 表格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13005098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 01:18:42  来源:igfitidea点击:

Parsing HTML Table in C#

c#parsinghtml-agility-packhtml-table

提问by user1764351

I have an html page which contains a table and i want to parse that table in C# windows form

我有一个包含表格的 html 页面,我想以 C# windows 形式解析该表格

http://www.mufap.com.pk/payout-report.php?tab=01

http://www.mufap.com.pk/payout-report.php?tab=01

this is the webpage i want to parse i have tried

这是我想解析的网页,我试过了

> Foreach(Htmlnode a in document.getelementbyname("tr"))
{
    richtextbox1.text=a.innertext;
}

i have tried some thing like this but it wont give me in tabular form as i am simply printing all trs so please help me regarding this thanx sorry for my english.

我已经尝试过这样的事情,但它不会以表格形式给我,因为我只是打印所有 trs,所以请帮助我解决这个问题,谢谢我的英语。

采纳答案by L.B

Using Html Agility Pack

使用Html 敏捷包

WebClient webClient = new WebClient();
string page = webClient.DownloadString("http://www.mufap.com.pk/payout-report.php?tab=01");

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);

List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='mydata']")
            .Descendants("tr")
            .Skip(1)
            .Where(tr=>tr.Elements("td").Count()>1)
            .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
            .ToList();

回答by Nour Sabouny

Do you mean something like this ?

你的意思是这样的吗?

foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
    ///This is the table.    
    foreach (HtmlNode row in table.SelectNodes("tr")) {
    ///This is the row.
        foreach (HtmlNode cell in row.SelectNodes("th|td")) {
            ///This the cell.
        }
    }
}

回答by grayhat

Late on this, but a way to do what you ask using plain vanilla C# code may be the following

迟到了,但是使用普通的普通 C# 代码来完成您所要求的操作的方法可能如下

/// <summary>
/// parses a table and returns a list containing all the data with columns separated by tabs
/// e.g.: records = getTable(doc, 0);
/// </summary>
/// <param name="doc">HtmlDocument to work with</param>
/// <param name="number">table index (base 0)</param>
/// <returns>list containing the table data</returns>
public List<string> getTableData(HtmlDocument doc, int number)
{
  HtmlElementCollection tables = doc.GetElementsByTagName("table");
  int idx=0;
  List<string> data = new List<string>();

  foreach (HtmlElement tbl in tables)
  {
    if (idx++ == number)
    {
      data = getTableData(tbl);
      break;
    }
  }
  return data;
}

/// <summary>
/// parses a table and returns a list containing all the data with columns separated by tabs
/// e.g.: records = getTable(getElement(doc, "table", "id", "table1"));
/// </summary>
/// <param name="tbl">HtmlElement table to work with</param>
/// <returns>list containing the table data</returns>
public List<string> getTableData(HtmlElement tbl)
{
  int nrec = 0;
  List<string> data = new List<string>();
  string rowBuff;

  HtmlElementCollection rows = tbl.GetElementsByTagName("tr");
  HtmlElementCollection cols;
  foreach (HtmlElement tr in rows)
  {
    cols = tr.GetElementsByTagName("td");
    nrec++;
    rowBuff = nrec.ToString();
    foreach (HtmlElement td in cols)
    {
      rowBuff += "\t" + WebUtility.HtmlDecode(td.InnerText);
    }
    data.Add(rowBuff);
  }

  return data;
}

the above will allow you to extract data from a table either by using the table "index" inside the page (useful for unnamed tables) or by passing the "table" HtmlElement to the function (faster but only useful for named tables); notice that I choose to return a "List" as the result and separating the various columns data using a tab character; you may easily change the code to return the data in whatever other format you prefer

以上将允许您通过使用页面内的表“索引”(对未命名表有用)或通过将“表”HtmlElement 传递给函数(更快但仅对命名表有用)来从表中提取数据;请注意,我选择返回“列表”作为结果并使用制表符分隔各个列数据;您可以轻松更改代码,以您喜欢的任何其他格式返回数据