如何在纯 PHP 中遵循 HTTP 重定向后获取最终 URL?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3799134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get final URL after following HTTP redirections in pure PHP?
提问by Weboide
What I'd like to do is find out what is the last/final URL after following the redirections.
我想做的是找出遵循重定向后的最后一个/最终 URL 是什么。
I would prefer not to use cURL. I would like to stick with pure PHP (stream wrappers).
我不想使用 cURL。我想坚持使用纯 PHP(流包装器)。
Right now I have a URL (let's say http://domain.test), and I use get_headers() to get specific headers from that page. get_headers will also return multiple Location:
headers (see Editbelow). Is there a way to use those headers to build the final URL? or is there a PHP function that would automatically do this?
现在我有一个 URL(假设http://domain.test),我使用 get_headers() 从该页面获取特定标题。get_headers 还将返回多个Location:
标题(请参阅下面的编辑)。有没有办法使用这些标头来构建最终 URL?或者是否有一个 PHP 函数可以自动执行此操作?
Edit:get_headers() follows redirections and returns all the headers for each response/redirections, so I have all the Location:
headers.
编辑:get_headers() 遵循重定向并返回每个响应/重定向的所有Location:
标头,因此我拥有所有标头。
回答by webjay
function getRedirectUrl ($url) {
stream_context_set_default(array(
'http' => array(
'method' => 'HEAD'
)
));
$headers = get_headers($url, 1);
if ($headers !== false && isset($headers['Location'])) {
return $headers['Location'];
}
return false;
}
Additionally...
此外...
As was mentioned in a comment, the finalitem in $headers['Location']
will be your final URL after all redirects. It's important to note, though, that it won't alwaysbe an array. Sometimes it's just a run-of-the-mill, non-array variable. In this case, trying to access the last array element will most likely return a single character. Not ideal.
正如评论中提到的,在所有重定向之后,最后一项$headers['Location']
将是您的最终 URL。不过,重要的是要注意,它并不总是一个数组。有时它只是一个普通的非数组变量。在这种情况下,尝试访问最后一个数组元素很可能会返回一个字符。不理想。
If you are only interested in the final URL, after all the redirects, I would suggest changing
如果您只对最终 URL 感兴趣,在所有重定向之后,我建议您更改
return $headers['Location'];
to
到
return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location'];
... which is just if short-handfor
......这只是如果简写为
if(is_array($headers['Location'])){
return array_pop($headers['Location']);
}else{
return $headers['Location'];
}
This fix will take care of either case (array, non-array), and remove the need to weed-out the final URL after calling the function.
此修复程序将处理任何一种情况(数组、非数组),并消除调用函数后清除最终 URL 的需要。
In the case where there are no redirects, the function will return false
. Similarly, the function will also return false
for invalid URLs (invalid for any reason). Therefor, it is important to check the URL for validitybeforerunning this function, or else incorporate the redirect check somewhere into your validation.
在没有重定向的情况下,该函数将返回false
. 同样,该函数也将返回false
无效的 URL(因任何原因无效)。因此,在运行此函数之前检查 URL 的有效性非常重要,否则将重定向检查合并到您的验证中。
回答by xaav
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* @param string $url
* @return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* @param string $url
* @return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
And, as always, give credit:
并且,一如既往,给予信任:
http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
回答by Paul Dixon
While the OP wanted to avoid cURL
, it's best to use it when it's available. Here's a solution which has the following advantages
虽然 OP 想要避免cURL
,但最好在可用时使用它。这是一个具有以下优点的解决方案
- uses curl for all the heavy lifting, so works with https
- copes with servers which return lower cased
location
header name (both xaav and webjay's answers do not handle this) - allows you to control how deep you want you go before giving up
- 使用 curl 完成所有繁重的工作,因此适用于 https
- 处理返回小写
location
标题名称的服务器(xaav 和 webjay 的答案都没有处理这个) - 允许你在放弃之前控制你想要走多远
Here's the function:
这是函数:
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
$url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close ($ch);
return $url;
}
Here's a more verbose version which allows you to inspect the redirection chain rather than let curl follow it.
这是一个更详细的版本,它允许您检查重定向链而不是让 curl 跟随它。
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
while ($maxRequests--) {
//fetch
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
//try to determine redirection url
$location = '';
if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
if (preg_match('/Location:(.*)/i', $response, $match)) {
$location = trim($match[1]);
}
}
if (empty($location)) {
//we've reached the end of the chain...
return $url;
}
//build next url
if ($location[0] == '/') {
$u = parse_url($url);
$url = $u['scheme'] . '://' . $u['host'];
if (isset($u['port'])) {
$url .= ':' . $u['port'];
}
$url .= $location;
} else {
$url = $location;
}
}
return null;
}
As an example of redirection chain which this function handles, but the others do not, try this:
作为此函数处理的重定向链示例,但其他函数不处理,请尝试以下操作:
echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')
At the time of writing, this involves 4 requests, with a mixture of Location
and location
headers involved.
在撰写本文时,这涉及 4 个请求,其中包含Location
和location
标头。
回答by Houssem BDIOUI
xaavanswer is very good; except for the following two issues:
xaav 的回答非常好;除了以下两个问题:
- It does not support HTTPS protocol => The solution was proposed as a comment in the original site: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Some sites will not work since they will not recognise the underlying user agent (client browser) => This is simply fixed by adding a User-agent header field: I added an Android user agent (you can find here http://www.useragentstring.com/pages/useragentstring.phpother user agent examples according you your need):
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
- 不支持HTTPS协议=>解决方案是在原站的评论中提出的:http: //w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in- php/
某些站点将无法工作,因为它们无法识别底层用户代理(客户端浏览器)=> 这可以通过添加 User-agent 标头字段来解决:我添加了一个 Android 用户代理(您可以在这里找到http://www. useragentstring.com/pages/useragentstring.php根据您的需要其他用户代理示例):
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\ r\n";
Here's the modified answer:
这是修改后的答案:
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* @param string $url
* @return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* @param string $url
* @return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
回答by mature
Added to code from answers @xaav and @Houssem BDIOUI: 404 Error case and case when URL with no response. get_final_url($url)
in that cases return strings: 'Error: 404 Not Found' and 'Error: No Responce'.
添加到答案@xaav 和@Houssem BDIOUI 的代码中:404 错误案例和URL 无响应时的案例。get_final_url($url)
在这种情况下,返回字符串:“错误:404 未找到”和“错误:无响应”。
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect,
* or 'Error: No Responce',
* or 'Error: 404 Not Found'
*
* @param string $url
* @return string
*/
function get_redirect_url($url)
{
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts)
return false;
if (!isset($url_parts['host']))
return false; //can't process relative URLs
if (!isset($url_parts['path']))
$url_parts['path'] = '/';
$sock = @fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return 'Error: No Responce';
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?' . $url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while (!feof($sock))
$response .= fread($sock, 8192);
fclose($sock);
if (stripos($response, '404 Not Found') !== false)
{
return 'Error: 404 Not Found';
}
if (preg_match('/^Location: (.+?)$/m', $response, $matches))
{
if (substr($matches[1], 0, 1) == "/")
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else
{
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url)
{
$redirects = array();
while ($newurl = get_redirect_url($url))
{
if (in_array($newurl, $redirects))
{
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect,
* or 'Error: No Responce'
* or 'Error: 404 Not Found',
*
* @param string $url
* @return string
*/
function get_final_url($url)
{
$redirects = get_all_redirects($url);
if (count($redirects) > 0)
{
return array_pop($redirects);
} else
{
return $url;
}
}