php 使用 CURL 从外部网页中选择特定的 div

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2559440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 06:56:40  来源:igfitidea点击:

Selecting a specific div from a extern webpage using CURL

phpregexhtmlcurl

提问by Paul

Hi can anyone help me how to select a specific div from the content of a webpage.

嗨,任何人都可以帮助我如何从网页内容中选择特定的 div。

Let's say i want to get the div with id="wrapper_content"from webpage http://www.test.com/page3.php.

假设我想id="wrapper_content"从网页中 获取 div http://www.test.com/page3.php

My current code looks something like this: (not working)

我当前的代码看起来像这样:(不工作)

//REG EXP.
$s_searchFor = '@^/.dont know what to put here..@ui';    

//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
  $file_contents = curl_exec($ch);
}
curl_close($ch);

// display file
echo $file_contents;

So i'd like to know how i can use reg expressions to find a specific div and how to unsetthe rest of the webpage so that $file_contentonly contains the div.

所以我想知道如何使用 reg 表达式来查找特定的 div 以及如何取消设置网页的其余部分,以便$file_content只包含 div。

回答by Yacoby

HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOMor DOM

HTML 不是常规的,所以你不应该使用正则表达式。相反,我会推荐一个 HTML 解析器,例如Simple HTML DOMDOM

If you were going to use Simple HTML DOM you would do something like the following:

如果您打算使用简单的 HTML DOM,您将执行以下操作:

$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.

即使您使用了正则表达式,您的代码仍然无法正常工作。您需要先获取页面的内容,然后才能使用正则表达式。

//wrong
if(!preg_match($s_searchFor, $ch)){
    $file_contents = curl_exec($ch);
}

//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements

回答by Amit Garg

include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Download simple_html_dom.php

下载simple_html_dom.php

回答by imightbeinatree at Cloudspace

check our hpricot, it lets you elegantly select sections

检查我们的 hpricot,它可以让您优雅地选择部分

first you would use curl to get the document, then use hpricot to get the part you need

首先你会使用 curl 来获取文档,然后使用 hpricot 来获取你需要的部分