php 使用 CURL 从外部网页中选择特定的 div
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2559440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting a specific div from a extern webpage using CURL
提问by Paul
Hi can anyone help me how to select a specific div from the content of a webpage.
嗨,任何人都可以帮助我如何从网页内容中选择特定的 div。
Let's say i want to get the div with id="wrapper_content"from webpage http://www.test.com/page3.php.
假设我想id="wrapper_content"从网页中 获取 div http://www.test.com/page3.php。
My current code looks something like this: (not working)
我当前的代码看起来像这样:(不工作)
//REG EXP.
$s_searchFor = '@^/.dont know what to put here..@ui';
//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
$file_contents = curl_exec($ch);
}
curl_close($ch);
// display file
echo $file_contents;
So i'd like to know how i can use reg expressions to find a specific div and how to unsetthe rest of the webpage so that $file_contentonly contains the div.
所以我想知道如何使用 reg 表达式来查找特定的 div 以及如何取消设置网页的其余部分,以便$file_content只包含 div。
回答by Yacoby
HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOMor DOM
HTML 不是常规的,所以你不应该使用正则表达式。相反,我会推荐一个 HTML 解析器,例如Simple HTML DOM或DOM
If you were going to use Simple HTML DOM you would do something like the following:
如果您打算使用简单的 HTML DOM,您将执行以下操作:
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.
即使您使用了正则表达式,您的代码仍然无法正常工作。您需要先获取页面的内容,然后才能使用正则表达式。
//wrong
if(!preg_match($s_searchFor, $ch)){
$file_contents = curl_exec($ch);
}
//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements
回答by Amit Garg
include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Download simple_html_dom.php
回答by imightbeinatree at Cloudspace
check our hpricot, it lets you elegantly select sections
检查我们的 hpricot,它可以让您优雅地选择部分
first you would use curl to get the document, then use hpricot to get the part you need
首先你会使用 curl 来获取文档,然后使用 hpricot 来获取你需要的部分

