php 从 HTML 表格行列中提取数据

Question

提问by Sourav

How to extract data from HTML table in PHP. The data is in this format

如何从 PHP 中的 HTML 表格中提取数据。数据是这种格式

Table 1

表格1

<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>

Table 2

表 2

<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>

Table 3

表3

<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>

I want to get the Data& Data_Text or (Data_Text_1 & Data_Text_2)from the 3 tables.
I've used

我想从3 个表中获取Data& Data_Text 或 (Data_Text_1 & Data_Text_2)。我用过

$html = file_get_contents($link);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes  = $xpath->query('//td[]');
$nodes2 = $xpath->query('//td[]');

But it cant show any data !

但它不能显示任何数据！

I'll offer bounty for this question on day after tomorrow

我将在后天为这个问题提供赏金

Answer 1

回答by pdizz

Using simplehtmldom.php...

使用simplehtmldom.php...

<?php

include 'simple_html_dom.php';

$html = file_get_html('thetable.html');

$rows = $html->find('tr');
foreach($rows as $row) {
    echo $row->plaintext;
}

?>

or use 'td'...

或使用'td'...

<?php

include 'simple_html_dom.php';

$html = file_get_html('thetable.html');

$cells = $html->find('td');
foreach($cells as $cell) {
    echo $cell->plaintext;
}

?>

Answer 2

回答by Nicolás Ozimica

Given an HTML document called xpathTables.htmllike this:

给定一个xpathTables.html像这样调用的 HTML 文档：

<html>
  <body>
    <table>
      <tbody>
        <tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
      </tbody> 
    </table>

    <table>
      <tbody>
        <tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
      </tbody>
    </table>

    <table>
      <tbody>
        <tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
      </tbody>
    </table>
  </body>
</html>

And this PHP script:

这个 PHP 脚本：

<?php

$link = "xpathTables.html";

$html = file_get_contents($link);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$tables = $doc->getElementsByTagName('table');

$nodes  = $xpath->query('.//tbody/tr/td/a/b', $tables->item(0));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td[@class="body"]', $tables->item(0));
var_dump($nodes->item(1)->nodeValue);

$nodes  = $xpath->query('.//tbody/tr/th/div[@id="Data"]', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(1)->nodeValue);

$nodes  = $xpath->query('.//tbody/tr/td/a', $tables->item(2));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(2));
var_dump($nodes->item(1)->nodeValue);

You get this output:

你得到这个输出：

string(4) "DATA"
string(9) "Data_Text"
string(4) "Data"
string(11) "Data_Text_1"
string(11) "Data_Text_2"
string(4) "DATA"
string(9) "Data_Text"

I didn't understood well your question, so I made this example in order to show all the text nodes your tables had. If you are only interested in some of those nodes, you should pick the XPath queries that do the job.

我不太明白你的问题，所以我做了这个例子是为了显示你的表的所有文本节点。如果您只对这些节点中的一些感兴趣，您应该选择完成这项工作的 XPath 查询。

I included the tags tableand tbody, just to make the example more HTML like.

我包含了标签table和tbody，只是为了使示例更像 HTML。

Answer 3

回答by Dimitre Novatchev

Use this single XPath expression:

使用这个单一的 XPath 表达式：

/*/table/tr//text()[normalize-space()]

This selects any text-node that consists not only odf white-space characters and that is a descendant of any trelement that is a child of a tableelement that is a child of the top element of the document.

这将选择任何文本节点，它不仅包含 odf 空白字符，而且是任何tr元素的后代，该元素是table文档顶部元素的子元素的子元素。

XSLT - based verification:

基于 XSLT 的验证：

 <xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/table/tr//text()[normalize-space()]"/>

. . . . . . .
  <xsl:for-each select=
    "/*/table/tr//text()[normalize-space()]">
    "<xsl:copy-of select="."/>"
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied against the following XML document:

当此转换应用于以下 XML 文档时：

<html>
 <table>
    <tr>
        <td class="body" valign="top">
            <a href="example">
                <b>DATA</b>
            </a>
        </td>
        <td class="body" valign="top">Data_Text</td>
    </tr>
 </table>

 <table>
    <tr>
        <th>
            <div id="Data">Data</div>
        </th>
        <td>Data_Text_1</td>
        <td>Data_Text_2</td>
    </tr>
 </table>

 <table>
    <tr>
        <td width="120">
            <a href="example" target="_blank">DATA</a>
        </td>
        <td>Data_Text</td>
    </tr>
 </table>
</html>

the XPath expression is evaluated and the selected text nodes are output(twice -- once as the result of the evaluation and they appear concatenated, the second time each selected node is output on a separate line and surrounded by quotes):

计算 XPath 表达式并输出选定的文本节点（两次 -- 一次作为计算结果并且它们出现连接，第二次每个选定节点在单独的行上输出并用引号括起来）：

DATAData_TextDataData_Text_1Data_Text_2DATAData_Text

. . . . . . .

"DATA"

"Data_Text"

"Data"

"Data_Text_1"

"Data_Text_2"

"DATA"

"Data_Text"

php 从 HTML 表格行列中提取数据

提问by Sourav

回答by pdizz

回答by Nicolás Ozimica

回答by Dimitre Novatchev

相关推荐

最近更新

标签

php 从 HTML 表格行列中提取数据

提问by Sourav

回答by pdizz

回答by Nicolás Ozimica

回答by Dimitre Novatchev

相关推荐

如何放宽 PHP 的 open_basedir 限制？

php 制作一个临时表并从中选择

使用 PHP 进行实时视频流

php 多个 if 语句？

相关推荐

最近更新

标签