C# 从 HTML 表格中获取数据到数据表格中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10513529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 14:09:20  来源:igfitidea点击:

Getting data from HTML table into a datatable

c#htmllinqxpathhtml-agility-pack

提问by Hyman Eker

Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. (C#) The code I am currently using is:

好的,所以我需要查询实时网站以从表中获取数据,将此 HTML 表放入 DataTable 中,然后使用此数据。到目前为止,我已经设法使用 Html Agility Pack 和 XPath 来访问我需要的表中的每一行,但我知道必须有一种方法可以将其解析为 DataTable。(C#) 我目前使用的代码是:

string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();

//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}

The HTML table on the website I am querying looks like this:

我查询的网站上的 HTML 表格如下所示:

<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>

I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! Thanks for any help in advance :)

我不确定使用 LINQ + HAP 或 XPath + HAP 是否更好/更容易获得所需的结果,正如您可能看到的那样,我尝试了这两种方法都取得了有限的成功。这是我第一次制作一个程序来查询网站甚至以任何方式与网站交互,所以我现在非常不确定!提前感谢您的任何帮助:)

采纳答案by jessehouwing

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. There's samples out therethat do XML to Datatable from Linq-to-XML. These can be re-worked into what you need.

HTML Agility Pack 中没有这种开箱即用的方法,但创建一个应该不会太难。有一些示例可以从 Linq-to-XML 将 XML 转换为数据表。这些可以重新加工成您需要的东西。

If needed I can help out creating the whole method, but not today :).

如果需要,我可以帮助创建整个方法,但不是今天:)。

See also:

也可以看看:

回答by Hyman Eker

This is my solution. May be a bit messy but it is working perfectly at the moment :D

这是我的解决方案。可能有点乱,但目前它运行良好:D

string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

DataTable dt = new DataTable();
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Value", typeof(decimal));

int count = 0;
decimal rowValue = 0;
bool isDecimal = false;
foreach (var row in doc.DocumentNode.SelectNodes("//table[@summary='Table Name']/tbody/tr"))
{
DataRow dr = dt.NewRow();
foreach (var cell in row.SelectNodes("td"))
{
if ((count % 2 == 0))
{
dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
}
else
{
isDecimal = decimal.TryParse((cell.InnerText.Replace(".", "")).Replace(",", "."), out rowValue);
if (isDecimal)
{
dr["Value"] = rowValue;
}
dt.Rows.Add(dr);
}
count++;
}
}

回答by Abide Masaraure

Using some of Hyman Eker's code above and some code from Mark Gravell (see post here) , I managed to come with a solution. This code snippet is used to obtain the public holidays for the year of 2012 in South Africa as of writing this article

使用上面 Hyman Eker 的一些代码和 Mark Gravell 的一些代码(请参阅此处的帖子),我设法找到了解决方案。在撰写本文时,此代码片段用于获取南非 2012 年的公共假期

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;



namespace WindowsFormsApplication
{
    public partial class Form1 : Form
    {
        private DataTable dt;
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string htmlCode = "";
            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
            }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(htmlCode);

            dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Value", typeof(string));

            int count = 0;


            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {

                foreach (HtmlNode row in table.SelectNodes("tr"))
                {

                    if (table.Id == "table2")
                    {
                        DataRow dr = dt.NewRow();

                        foreach (var cell in row.SelectNodes("td"))
                        {
                            if ((count % 2 == 0))
                            {
                                dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
                            }
                            else
                            {

                                dr["Value"] = cell.InnerText.Replace("&nbsp;", " ");

                                dt.Rows.Add(dr);
                            }
                            count++;

                        }


                    }

                }


                dataGridView1.DataSource = dt;

            }
        }

    }
}

回答by Shankar Acharya

Simple logic to convert a htmltable to datatable :

将 htmltable 转换为数据表的简单逻辑:

//Define your webtable
public static HtmlTable table
            {
                get
                {
                    HtmlTable var = new HtmlTable(parent);
                    var.SearchProperties.Add("id", "searchId");
                    return var;
                }
            }

//Convert a webtable to datatable
public static DataTable getTable
            {
                get
                {
                    DataTable dtTable= new DataTable("TableName");
                    UITestControlCollection rows = table.Rows;
                    UITestControlCollection headers = rows[0].GetChildren();
                    foreach (HtmlHeaderCell header in headers)
                    {
                        if (header.InnerText != null)
                            dtTable.Columns.Add(header.InnerText);
                    }
                    for (int i = 1; i < rows.Count; i++)
                    {
                        UITestControlCollection cells = rows[i].GetChildren();
                        string[] data = new string[cells.Count];
                        int counter = 0;
                        foreach (HtmlCell cell in cells)
                        {
                            if (cell.InnerText != null)
                                data[counter] = cell.InnerText;
                            counter++;
                        }
                        dtTable.Rows.Add(data);
                    }
                    return dtTable;
                }
            }

回答by Kent Ong

You can try

你可以试试

    DataTable.Rows[i].Cells[j].InnerText;

Where DataTable is the id of your table, i is the row and j is the cells.

其中 DataTable 是表格的 id,i 是行,j 是单元格。