php curl 无法获取网页内容，为什么？

Question

提问by Sotheby it

i am using a curl script to go to a link and get its content for further manipulation. following is the link and curl script:

我正在使用 curl 脚本转到链接并获取其内容以进行进一步操作。以下是链接和 curl 脚本：

<?php 
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&amp;templateName=detail.htm&amp;requestingHandler=WebNSORDetailHandler&amp;ID=368343543';

//curl script to get content of given url

$ch = curl_init();

// set the target url

curl_setopt($ch, CURLOPT_URL,$url);

// request as if Firefox

curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>

but the website is not excepting it through script it is giving user exception in result, but if we normally paste the url in browser it is opening the page perfectly alright.

但是该网站并没有通过脚本排除它，它在结果中给了用户异常，但是如果我们通常将 url 粘贴到浏览器中，它可以完美地打开页面。

Please help, what i am doing wrong here.

请帮忙，我在这里做错了什么。

Thanks and regards

感谢致敬

Answer 1

回答by Alan Storm

I ran the following program/script and the page was downloaded correctly. This most likely means the server you're running your script from can't reach the server at "criminaljustice.state.ny.us". This is either because your server is mis-configured, or their server is explicitly blocking you, which is a common result of aggressive screen scraping.

我运行了以下程序/脚本并且页面已正确下载。这很可能意味着您运行脚本的服务器无法访问位于“criminaljustice.state.ny.us”的服务器。这要么是因为您的服务器配置错误，要么是他们的服务器明确阻止了您，这是激进屏幕抓取的常见结果。

<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;

Additional troubleshooting tip -- if you have shell access to the machine your PHP script is running from, run the following command

其他故障排除技巧——如果您对运行 PHP 脚本的机器具有 shell 访问权限，请运行以下命令

curl -I 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543'

This will output the response headers, which may contain some clue as to why your request is failing.

这将输出响应标头，其中可能包含有关请求失败原因的一些线索。

Answer 2

回答by Sotheby it

I had the same issue which ended up being the followlocation option not being set. I thought curl would set it to true by default but I guess not!? Once I set it it got the full site no problem

我遇到了同样的问题，最终没有设置 followlocation 选项。我以为 curl 默认情况下会将其设置为 true 但我猜不是！？一旦我设置它就可以得到完整的站点没问题

Answer 3

回答by xkcd150

For useragent i think you want to use the CURLOPT_USERAGENT constant

对于用户代理，我认为您想使用 CURLOPT_USERAGENT 常量

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");

Answer 4

回答by alex

Is the user agent meant to be in an array like that? I haven't seen it done like that before.

用户代理是否应该在这样的数组中？我以前从未见过这样做过。

Try just using a plain string, i.e.

尝试只使用一个普通的字符串，即

curl_setopt($ch, CURLOPT_HTTPHEADER, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15');

php curl 无法获取网页内容，为什么？

提问by Sotheby it

回答by Alan Storm

回答by Sotheby it

回答by xkcd150

回答by alex

相关推荐

最近更新

标签

php curl 无法获取网页内容，为什么？

提问by Sotheby it

回答by Alan Storm

回答by Sotheby it

回答by xkcd150

回答by alex

相关推荐

PHP simpleXML 如何以格式化的方式保存文件？

php 肥皂错误致命错误：未捕获的 SoapFault 异常：[HTTP] 无法连接到主机

如何在 PHP/MYSQL 中同时执行两个 mysql 查询？

php MySQL 服务器在“读取初始通信数据包”，系统错误：111

相关推荐

最近更新

标签