php 如果发生重定向,如何在 file_get_contents 之后获取真实 URL?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4323985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the real URL after file_get_contents if redirection happens?
提问by HappyDeveloper
I'm using file_get_contents()
to grab content from a site, and amazingly it works even if the URL I pass as argument redirects to another URL.
我正在使用file_get_contents()
从站点获取内容,令人惊讶的是,即使我作为参数传递的 URL 重定向到另一个 URL,它也能正常工作。
The problem is I need to know the new URL, is there a way to do that?
问题是我需要知道新的 URL,有没有办法做到这一点?
采纳答案by alex
You might make a request with cURL instead of file_get_contents()
.
您可以使用 cURL 而不是file_get_contents()
.
Something like this should work...
像这样的东西应该工作......
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
$l = trim($r[1]);
回答by Jakub Zalas
If you need to use file_get_contents()
instead of curl, don't follow redirects automatically:
如果您需要使用file_get_contents()
而不是 curl,请不要自动跟随重定向:
$context = stream_context_create(
array(
'http' => array(
'follow_location' => false
)
)
);
$html = file_get_contents('http://www.example.com/', false, $context);
var_dump($http_response_header);
Answer inspired by: How do I ignore a moved-header with file_get_contents in PHP?
回答by Renaud
Everything in one function:
一个功能中的所有内容:
function get_web_page( $url ) {
$res = array();
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // do not return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$res['content'] = $content;
$res['url'] = $header['url'];
return $res;
}
print_r(get_web_page("http://www.example.com/redirectfrom"));
回答by Martin Prikryl
A complete solution using the bare file_get_contents
(note the in-out $url
parameter):
使用裸机的完整解决方案file_get_contents
(注意输入输出$url
参数):
function get_url_contents_and_final_url(&$url)
{
do
{
$context = stream_context_create(
array(
"http" => array(
"follow_location" => false,
),
)
);
$result = file_get_contents($url, false, $context);
$pattern = "/^Location:\s*(.*)$/i";
$location_headers = preg_grep($pattern, $http_response_header);
if (!empty($location_headers) &&
preg_match($pattern, array_values($location_headers)[0], $matches))
{
$url = $matches[1];
$repeat = true;
}
else
{
$repeat = false;
}
}
while ($repeat);
return $result;
}
Note that this works only with an absolute URL in the Location
header. If you need to support relative URLs, see
PHP: How to resolve a relative url.
请注意,这仅适用于Location
标头中的绝对 URL 。如果您需要支持相对 URL,请参阅
PHP:如何解析相对 URL。
For example, if you use the solution from the answer by @Joyce Babu, replace:
例如,如果您使用@Joyce Babu的答案中的解决方案,请替换:
$url = $matches[1];
with:
和:
$url = getAbsoluteURL($matches[1], $url);