使用 PHP 从网页中提取特定数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11567632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extracting specific data from a web page using PHP
提问by Daniel Silva
Possible Duplicate:
HTML Scraping in Php
可能的重复:
PHP 中的 HTML Scraping
I would like to know if is there any way to get from a webpage a specific string of text wich is updated every now and then using PHP. I′ve searched "all over the internet" and have found nothing. Just saw that preg_match could do it, but I didn't understand how to use it.
我想知道是否有任何方法可以从网页中获取特定的文本字符串,该字符串会时不时地使用 PHP 进行更新。我搜索了“整个互联网”,但一无所获。刚刚看到 preg_match 可以做到,但我不明白如何使用它。
imagine that a webpage contains this:
想象一个网页包含以下内容:
<div name="changeable_text">**GET THIS TEXT**</div>
How can I do it using PHP, after having used file_get_contentsto put the page in a variable?
在过去file_get_contents将页面放入变量后,如何使用 PHP来实现?
Thanks in advance :)
提前致谢 :)
回答by nickb
You can use DOMDocument, like this:
你可以使用DOMDocument,像这样:
$html = file_get_contents( $url);
libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
// A name attribute on a <div>???
$node = $xpath->query( '//div[@name="changeable_text"]')->item( 0);
echo $node->textContent; // This will print **GET THIS TEXT**
回答by Kai Mattern
You might want to have a look at the
你可能想看看
Simple HTML DOM Library
简单的 HTML DOM 库
There is a little tutorial here: http://www.developertutorials.com/tutorials/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/
这里有一个小教程:http: //www.developertutorials.com/tutorials/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/
That one is a screen scraping API that lets you feed html to it and then get parts of it in a jQuery similiar language.
那是一个屏幕抓取 API,可让您将 html 提供给它,然后以 jQuery 类似语言获取其中的一部分。
回答by Celeritas
You're talking about data scraping: the act of extracting data from a human readable output. In your case this is whatever is between the <div>tags. Use PHP DOM's extension to get to the tag you want and extract data. Google search for a PHP DOM tutorial.
你说的是数据抓取:从人类可读的输出中提取数据的行为。在您的情况下,这是<div>标签之间的任何内容。使用PHP DOM的扩展来获取您想要的标签并提取数据。谷歌搜索 PHP DOM 教程。
回答by spiralclick
$delements= file_get_html('url will go here');
foreach($elements->find('element') as $ele) {
? //traverse according to your preferences
}
//return or output

