如何将您的 PHP 脚本伪装成浏览器?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4184869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to disguise your PHP script as a browser?
提问by pandronic
We've been using information from a site for a while now (something that the site allows if you mention the source and we do) and we've been copying the information by hand. As you could imagine this can become tedious pretty fast so I've been trying to automate the process by fetching the information with a PHP script.
我们已经使用来自某个站点的信息有一段时间了(如果您提及来源并且我们这样做,则该站点允许这样做)并且我们一直在手动复制信息。正如您可以想象的那样,这会很快变得乏味,所以我一直在尝试通过使用 PHP 脚本获取信息来自动化该过程。
The URL I'm trying to fetch is:
我试图获取的 URL 是:
http://mediaforest.ro/weeklycharts/viewchart.aspx?r=WeeklyChartRadioLocal&y=2010&w=46 08-11-10 14-11-10
If I enter it in a browser it works, if I try a file_get_contents() I get Bad Request
如果我在浏览器中输入它就可以工作,如果我尝试 file_get_contents() 我得到错误的请求
I figured that they checked to see if the client is a browser so I rolled a CURL based solution:
我想他们会检查客户端是否是浏览器,所以我推出了一个基于 CURL 的解决方案:
$ch = curl_init();
$header=array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_COOKIEFILE,'cookies.txt');
curl_setopt($ch,CURLOPT_COOKIEJAR,'cookies.txt');
curl_setopt($ch,CURLOPT_HTTPHEADER,$header);
$result=curl_exec($ch);
curl_close($ch);
I've checked and the headers are identical with my browser's headers and I still get Bad Request
我已经检查过,标题与浏览器的标题相同,但我仍然收到错误请求
So I tried another solution:
所以我尝试了另一种解决方案:
http://www.php.net/manual/en/function.curl-setopt.php#78046
Unfortunately this doesn't work either and I'm out of ideas. What am I missing?
不幸的是,这也不起作用,我没有想法。我错过了什么?
采纳答案by Reese Moore
Try escaping your URL, it works for me that way.
尝试转义您的网址,它对我有用。
http://mediaforest.ro/weeklycharts/viewchart.aspx?r=WeeklyChartRadioLocal&y=2010&w=46%2008-11-10%2014-11-10
回答by ThiefMaster
Use curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
用 curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
You can replace the useragent with another one of course.
当然,您可以用另一个替换用户代理。
However, "Bad Request" is most likely NOT related to a missing/bad useragent. It sounds like the webserver itself doesn't like your request.. not the application behind the requested URI.
但是,“错误请求”很可能与丢失/错误的用户代理无关。听起来网络服务器本身不喜欢您的请求……而不是请求的 URI 背后的应用程序。