php file_get_contents() 给我 403 Forbidden
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11680709/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
file_get_contents() give me 403 Forbidden
提问by Steven
I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden.
我有一个合作伙伴为我创建了一些内容供我抓取。
我可以使用浏览器访问该页面,但是在尝试使用 user 时file_get_contents,我得到一个403 forbidden.
I've tried using stream_context_create, but that's not helping - it might be because I don't know what should go in there.
我试过使用stream_context_create,但这没有帮助 - 可能是因为我不知道应该在那里放什么。
1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?
1)我有什么办法可以抓取数据吗?
2) 如果没有,并且不允许合作伙伴配置服务器允许我访问,我该怎么办?
The code I've tried using:
我尝试使用的代码:
$opts = array(
'http'=>array(
'user_agent' => 'My company name',
'method'=>"GET",
'header'=> implode("\r\n", array(
'Content-type: text/plain;'
))
)
);
$context = stream_context_create($opts);
//Get header content
$_header = file_get_contents($partner_url,false, $context);
回答by Cleric
This is not a problem in your script, its a feature in you partners web server security.
这在您的脚本中不是问题,它是您合作伙伴 Web 服务器安全中的一项功能。
It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.
很难说到底是什么阻碍了你,很可能是某种阻止抓取的障碍。如果您的合作伙伴可以访问他的网络服务器设置,这可能有助于查明。
What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.
您可以做的是通过设置用户代理标头来“伪造 Web 浏览器”,使其模仿标准的 Web 浏览器。
I would recommend cURL to do this, and it will be easy to find good documentation for doing this.
我会推荐 cURL 来执行此操作,并且很容易找到用于执行此操作的良好文档。
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
回答by Abid Hussain
//set User Agent first
//首先设置用户代理
ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');
回答by ARIF MAHMUD RANA
I have two things in my mind, If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode() and A URL can be used as a filename with this function if the fopen wrappers have been enabled.
我有两件事,如果您要打开带有特殊字符(例如空格)的 URI,则需要使用 urlencode() 对 URI 进行编码,如果 fopen 包装器具有,则 URL 可以用作此函数的文件名已启用。

