使用 PHP 从网页中提取特定数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11567632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 00:47:15  来源:igfitidea点击:

Extracting specific data from a web page using PHP

phphtmlscreen-scraping

提问by Daniel Silva

Possible Duplicate:
HTML Scraping in Php

可能的重复:
PHP 中的 HTML Scraping

I would like to know if is there any way to get from a webpage a specific string of text wich is updated every now and then using PHP. I′ve searched "all over the internet" and have found nothing. Just saw that preg_match could do it, but I didn't understand how to use it.

我想知道是否有任何方法可以从网页中获取特定的文本字符串,该字符串会时不时地使用 PHP 进行更新。我搜索了“整个互联网”,但一无所获。刚刚看到 preg_match 可以做到,但我不明白如何使用它。

imagine that a webpage contains this:

想象一个网页包含以下内容:

<div name="changeable_text">**GET THIS TEXT**</div>

How can I do it using PHP, after having used file_get_contentsto put the page in a variable?

在过去file_get_contents将页面放入变量后,如何使用 PHP来实现?

Thanks in advance :)

提前致谢 :)

回答by nickb

You can use DOMDocument, like this:

你可以使用DOMDocument,像这样:

$html = file_get_contents( $url);

libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);

// A name attribute on a <div>???
$node = $xpath->query( '//div[@name="changeable_text"]')->item( 0);

echo $node->textContent; // This will print **GET THIS TEXT**

回答by Kai Mattern

You might want to have a look at the

你可能想看看

Simple HTML DOM Library

简单的 HTML DOM 库

There is a little tutorial here: http://www.developertutorials.com/tutorials/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/

这里有一个小教程:http: //www.developertutorials.com/tutorials/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/

That one is a screen scraping API that lets you feed html to it and then get parts of it in a jQuery similiar language.

那是一个屏幕抓取 API,可让您将 html 提供给它,然后以 jQuery 类似语言获取其中的一部分。

回答by Celeritas

You're talking about data scraping: the act of extracting data from a human readable output. In your case this is whatever is between the <div>tags. Use PHP DOM's extension to get to the tag you want and extract data. Google search for a PHP DOM tutorial.

你说的是数据抓取:从人类可读的输出中提取数据的行为。在您的情况下,这是<div>标签之间的任何内容。使用PHP DOM的扩展来获取您想要的标签并提取数据。谷歌搜索 PHP DOM 教程。

回答by spiralclick

$delements= file_get_html('url will go here'); 

foreach($elements->find('element') as $ele) {

 ?  //traverse according to your preferences

} 

//return or output