php 从 HTML 表格行列中提取数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10369350/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract data from HTML table row column
提问by Sourav
How to extract data from HTML table in PHP. The data is in this format
如何从 PHP 中的 HTML 表格中提取数据。数据是这种格式
Table 1
表格1
<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
Table 2
表 2
<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
Table 3
表3
<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
I want to get the Data& Data_Text or (Data_Text_1 & Data_Text_2)from the 3 tables.
I've used
我想从3 个表中获取Data& Data_Text 或 (Data_Text_1 & Data_Text_2)。
我用过
$html = file_get_contents($link);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//td[]');
$nodes2 = $xpath->query('//td[]');
But it cant show any data !
但它不能显示任何数据!
I'll offer bounty for this question on day after tomorrow
我将在后天为这个问题提供赏金
回答by pdizz
Using simplehtmldom.php...
使用simplehtmldom.php...
<?php
include 'simple_html_dom.php';
$html = file_get_html('thetable.html');
$rows = $html->find('tr');
foreach($rows as $row) {
echo $row->plaintext;
}
?>
or use 'td'...
或使用'td'...
<?php
include 'simple_html_dom.php';
$html = file_get_html('thetable.html');
$cells = $html->find('td');
foreach($cells as $cell) {
echo $cell->plaintext;
}
?>
回答by Nicolás Ozimica
Given an HTML document called xpathTables.htmllike this:
给定一个xpathTables.html像这样调用的 HTML 文档:
<html>
<body>
<table>
<tbody>
<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
</tbody>
</table>
<table>
<tbody>
<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
</tbody>
</table>
<table>
<tbody>
<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
</tbody>
</table>
</body>
</html>
And this PHP script:
这个 PHP 脚本:
<?php
$link = "xpathTables.html";
$html = file_get_contents($link);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$tables = $doc->getElementsByTagName('table');
$nodes = $xpath->query('.//tbody/tr/td/a/b', $tables->item(0));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td[@class="body"]', $tables->item(0));
var_dump($nodes->item(1)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/th/div[@id="Data"]', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(1)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td/a', $tables->item(2));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(2));
var_dump($nodes->item(1)->nodeValue);
You get this output:
你得到这个输出:
string(4) "DATA"
string(9) "Data_Text"
string(4) "Data"
string(11) "Data_Text_1"
string(11) "Data_Text_2"
string(4) "DATA"
string(9) "Data_Text"
string(4) "DATA"
string(9) "Data_Text"
string(4) "Data"
string(11) "Data_Text_1"
string(11) "Data_Text_2"
string(4) "DATA"
string(9) "Data_Text"
I didn't understood well your question, so I made this example in order to show all the text nodes your tables had. If you are only interested in some of those nodes, you should pick the XPath queries that do the job.
我不太明白你的问题,所以我做了这个例子是为了显示你的表的所有文本节点。如果您只对这些节点中的一些感兴趣,您应该选择完成这项工作的 XPath 查询。
I included the tags tableand tbody, just to make the example more HTML like.
我包含了标签table和tbody,只是为了使示例更像 HTML。
回答by Dimitre Novatchev
Use this single XPath expression:
使用这个单一的 XPath 表达式:
/*/table/tr//text()[normalize-space()]
This selects any text-node that consists not only odf white-space characters and that is a descendant of any trelement that is a child of a tableelement that is a child of the top element of the document.
这将选择任何文本节点,它不仅包含 odf 空白字符,而且是任何tr元素的后代,该元素是table文档顶部元素的子元素的子元素。
XSLT - based verification:
基于 XSLT 的验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/table/tr//text()[normalize-space()]"/>
. . . . . . .
<xsl:for-each select=
"/*/table/tr//text()[normalize-space()]">
"<xsl:copy-of select="."/>"
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied against the following XML document:
当此转换应用于以下 XML 文档时:
<html>
<table>
<tr>
<td class="body" valign="top">
<a href="example">
<b>DATA</b>
</a>
</td>
<td class="body" valign="top">Data_Text</td>
</tr>
</table>
<table>
<tr>
<th>
<div id="Data">Data</div>
</th>
<td>Data_Text_1</td>
<td>Data_Text_2</td>
</tr>
</table>
<table>
<tr>
<td width="120">
<a href="example" target="_blank">DATA</a>
</td>
<td>Data_Text</td>
</tr>
</table>
</html>
the XPath expression is evaluated and the selected text nodes are output(twice -- once as the result of the evaluation and they appear concatenated, the second time each selected node is output on a separate line and surrounded by quotes):
计算 XPath 表达式并输出选定的文本节点(两次 -- 一次作为计算结果并且它们出现连接,第二次每个选定节点在单独的行上输出并用引号括起来):
DATAData_TextDataData_Text_1Data_Text_2DATAData_Text
. . . . . . .
. . . . . . .
"DATA"
"Data_Text"
"Data"
"Data_Text_1"
"Data_Text_2"
"DATA"
"Data_Text"

