php file_get_contents() 给我 403 Forbidden

Question

提问by Steven

I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden.

我有一个合作伙伴为我创建了一些内容供我抓取。
我可以使用浏览器访问该页面，但是在尝试使用 user 时file_get_contents，我得到一个403 forbidden.

I've tried using stream_context_create, but that's not helping - it might be because I don't know what should go in there.

我试过使用stream_context_create，但这没有帮助 - 可能是因为我不知道应该在那里放什么。

1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?

1）我有什么办法可以抓取数据吗？
2) 如果没有，并且不允许合作伙伴配置服务器允许我访问，我该怎么办？

The code I've tried using:

我尝试使用的代码：

$opts = array(
  'http'=>array(
    'user_agent' => 'My company name',
    'method'=>"GET",
    'header'=> implode("\r\n", array(
      'Content-type: text/plain;'
    ))
  )
);

$context = stream_context_create($opts);

//Get header content
$_header = file_get_contents($partner_url,false, $context);

Answer 1

回答by Cleric

This is not a problem in your script, its a feature in you partners web server security.

这在您的脚本中不是问题，它是您合作伙伴 Web 服务器安全中的一项功能。

It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.

很难说到底是什么阻碍了你，很可能是某种阻止抓取的障碍。如果您的合作伙伴可以访问他的网络服务器设置，这可能有助于查明。

What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.

您可以做的是通过设置用户代理标头来“伪造 Web 浏览器”，使其模仿标准的 Web 浏览器。

I would recommend cURL to do this, and it will be easy to find good documentation for doing this.

我会推荐 cURL 来执行此操作，并且很容易找到用于执行此操作的良好文档。

    // create curl resource
    $ch = curl_init();

    // set url
    curl_setopt($ch, CURLOPT_URL, "example.com");

    //return the transfer as a string
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

    // $output contains the output string
    $output = curl_exec($ch);

    // close curl resource to free up system resources
    curl_close($ch);

Answer 2

回答by Abid Hussain

//set User Agent first

//首先设置用户代理

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

Answer 3

回答by ARIF MAHMUD RANA

I have two things in my mind, If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode() and A URL can be used as a filename with this function if the fopen wrappers have been enabled.

我有两件事，如果您要打开带有特殊字符（例如空格）的 URI，则需要使用 urlencode() 对 URI 进行编码，如果 fopen 包装器具有，则 URL 可以用作此函数的文件名已启用。

php file_get_contents() 给我 403 Forbidden

提问by Steven

回答by Cleric

回答by Abid Hussain

回答by ARIF MAHMUD RANA

相关推荐

最近更新

标签

php file_get_contents() 给我 403 Forbidden

提问by Steven

回答by Cleric

回答by Abid Hussain

回答by ARIF MAHMUD RANA

相关推荐

php 如何获取一维标量数组作为学说dql查询结果？

php 如何更改默认的 Netbeans 7 项目目录？

从 php curl post 请求中获取头信息

php 从浏览器中删除cookie？

相关推荐

最近更新

标签