php 如何使用 cURL 找到我将被重定向到的位置？

Question

提问by Thomas Van Nuffel

I'm trying to make curl follow a redirect but I can't quite get it to work right. I have a string that I want to send as a GET param to a server and get the resulting URL.

我试图让 curl 跟随重定向，但我无法让它正常工作。我有一个字符串，我想将它作为 GET 参数发送到服务器并获取结果 URL。

Example:

例子：

String = Kobold Vermin
Url = www.wowhead.com/search?q=Kobold+Worker

If you go to that url it will redirect you to "www.wowhead.com/npc=257". I want curl to return this URL to my PHP code so that i can extract the "npc=257" and use it.

如果您转到该网址，它会将您重定向到“www.wowhead.com/npc=257”。我希望 curl 将此 URL 返回到我的 PHP 代码，以便我可以提取“npc=257”并使用它。

Current code:

当前代码：

function npcID($name) {
    $urltopost = "http://www.wowhead.com/search?q=" . $name;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
    curl_setopt($ch, CURLOPT_URL, $urltopost);
    curl_setopt($ch, CURLOPT_REFERER, "http://www.wowhead.com");
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type:application/x-www-form-urlencoded"));
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
}

This however returns www.wowhead.com/search?q=Kobold+Workerand not www.wowhead.com/npc=257.

然而，这会返回www.wowhead.com/search?q=Kobold+Worker而不是www.wowhead.com/npc=257。

I suspect PHP is returning before the external redirect happens. How can I fix this?

我怀疑 PHP 在外部重定向发生之前就返回了。我怎样才能解决这个问题？

Answer 1

回答by Matt Gibson

To make cURL follow a redirect, use:

要使 cURL 跟随重定向，请使用：

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

Erm... I don't think you're actually executing the curl... Try:

嗯...我不认为你真的在执行卷曲...尝试：

curl_exec($ch);

...after setting the options, and before the curl_getinfo()call.

...在设置选项之后，在curl_getinfo()通话之前。

EDIT: If you just want to find out where a page redirects to, I'd use the advice here, and just use Curl to grab the headers and extract the Location: header from them:

编辑：如果您只想找出页面重定向到的位置，我会使用这里的建议，只需使用 Curl 获取标题并从中提取 Location: 标题：

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
if (preg_match('~Location: (.*)~i', $result, $match)) {
   $location = trim($match[1]);
}

Answer 2

回答by Luca Camillo

Add this line to curl inizialization

将此行添加到 curl 初始化

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

and use getinfo before curl_close

并在 curl_close 之前使用 getinfo

$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );

es:

es：

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0); 
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$html = curl_exec($ch);
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
curl_close($ch);

Answer 3

回答by GR1NN3R

The answer above didn't work for me on one of my servers, something to to with basedir, so I re-hashed it a little. The code below works on all my servers.

上面的答案在我的一台服务器上对我不起作用，这与 basedir 有关，因此我对其进行了一些重新哈希处理。下面的代码适用于我所有的服务器。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
curl_close( $ch ); 
// the returned headers
$headers = explode("\n",$a);
// if there is no redirection this will be the final url
$redir = $url;
// loop through the headers and check for a Location: str
$j = count($headers);
for($i = 0; $i < $j; $i++){
// if we find the Location header strip it and fill the redir var       
if(strpos($headers[$i],"Location:") !== false){
        $redir = trim(str_replace("Location:","",$headers[$i]));
        break;
    }
}
// do whatever you want with the result
echo redir;

Answer 4

回答by broox

The chosen answer here is decent but its case sensitive, doesn't protect against relative location:headers (which some sites do) or pages that might actually have the phrase Location:in their content... (which zillow currently does).

此处选择的答案不错，但它区分大小写，不能防止相对location:标题（某些网站这样做）或可能Location:在其内容中实际包含该短语的页面......（zillow 目前这样做）。

A bit sloppy, but a couple quick edits to make this a bit smarter are:

有点草率，但有几个快速编辑可以使它更聪明：

function getOriginalURL($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    $result = curl_exec($ch);
    $httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    // if it's not a redirection (3XX), move along
    if ($httpStatus < 300 || $httpStatus >= 400)
        return $url;

    // look for a location: header to find the target URL
    if(preg_match('/location: (.*)/i', $result, $r)) {
        $location = trim($r[1]);

        // if the location is a relative URL, attempt to make it absolute
        if (preg_match('/^\/(.*)/', $location)) {
            $urlParts = parse_url($url);
            if ($urlParts['scheme'])
                $baseURL = $urlParts['scheme'].'://';

            if ($urlParts['host'])
                $baseURL .= $urlParts['host'];

            if ($urlParts['port'])
                $baseURL .= ':'.$urlParts['port'];

            return $baseURL.$location;
        }

        return $location;
    }
    return $url;
}

Note that this still only goes 1 redirection deep. To go deeper, you actually need to get the content and follow the redirects.

请注意，这仍然只有 1 个重定向深度。要更深入，您实际上需要获取内容并遵循重定向。

Answer 5

回答by Igor Parra

Sometimes you need to get HTTP headers but at the same time you don't want return those headers.**

有时您需要获取 HTTP 标头，但同时又不想返回这些标头。**

This skeleton takes care of cookies and HTTP redirects using recursion. The main idea here is to avoid return HTTP headersto the client code.

此框架使用递归处理 cookie 和 HTTP 重定向。这里的主要思想是避免将 HTTP 标头返回给客户端代码。

You can build a very strong curl class over it. Add POST functionality, etc.

你可以在它上面构建一个非常强大的 curl 类。添加POST功能等。

<?php

class curl {

  static private $cookie_file            = '';
  static private $user_agent             = '';  
  static private $max_redirects          = 10;  
  static private $followlocation_allowed = true;

  function __construct()
  {
    // set a file to store cookies
    self::$cookie_file = 'cookies.txt';

    // set some general User Agent
    self::$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';

    if ( ! file_exists(self::$cookie_file) || ! is_writable(self::$cookie_file))
    {
      throw new Exception('Cookie file missing or not writable.');
    }

    // check for PHP settings that unfits
    // correct functioning of CURLOPT_FOLLOWLOCATION 
    if (ini_get('open_basedir') != '' || ini_get('safe_mode') == 'On')
    {
      self::$followlocation_allowed = false;
    }    
  }

  /**
   * Main method for GET requests
   * @param  string $url URI to get
   * @return string      request's body
   */
  static public function get($url)
  {
    $process = curl_init($url);    

    self::_set_basic_options($process);

    // this function is in charge of output request's body
    // so DO NOT include HTTP headers
    curl_setopt($process, CURLOPT_HEADER, 0);

    if (self::$followlocation_allowed)
    {
      // if PHP settings allow it use AUTOMATIC REDIRECTION
      curl_setopt($process, CURLOPT_FOLLOWLOCATION, true);
      curl_setopt($process, CURLOPT_MAXREDIRS, self::$max_redirects); 
    }
    else
    {
      curl_setopt($process, CURLOPT_FOLLOWLOCATION, false);
    }

    $return = curl_exec($process);

    if ($return === false)
    {
      throw new Exception('Curl error: ' . curl_error($process));
    }

    // test for redirection HTTP codes
    $code = curl_getinfo($process, CURLINFO_HTTP_CODE);
    if ($code == 301 || $code == 302)
    {
      curl_close($process);

      try
      {
        // go to extract new Location URI
        $location = self::_parse_redirection_header($url);
      }
      catch (Exception $e)
      {
        throw $e;
      }

      // IMPORTANT return 
      return self::get($location);
    }

    curl_close($process);

    return $return;
  }

  static function _set_basic_options($process)
  {

    curl_setopt($process, CURLOPT_USERAGENT, self::$user_agent);
    curl_setopt($process, CURLOPT_COOKIEFILE, self::$cookie_file);
    curl_setopt($process, CURLOPT_COOKIEJAR, self::$cookie_file);
    curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
    // curl_setopt($process, CURLOPT_VERBOSE, 1);
    // curl_setopt($process, CURLOPT_SSL_VERIFYHOST, false);
    // curl_setopt($process, CURLOPT_SSL_VERIFYPEER, false);
  }

  static function _parse_redirection_header($url)
  {
    $process = curl_init($url);    

    self::_set_basic_options($process);

    // NOW we need to parse HTTP headers
    curl_setopt($process, CURLOPT_HEADER, 1);

    $return = curl_exec($process);

    if ($return === false)
    {
      throw new Exception('Curl error: ' . curl_error($process));
    }

    curl_close($process);

    if ( ! preg_match('#Location: (.*)#', $return, $location))
    {
      throw new Exception('No Location found');
    }

    if (self::$max_redirects-- <= 0)
    {
      throw new Exception('Max redirections reached trying to get: ' . $url);
    }

    return trim($location[1]);
  }

}

Answer 6

回答by Patrick Valibus

Lot's of regex here, despite the fact i really like them this way might be more stable to me:

这里有很多正则表达式，尽管我真的很喜欢它们，但这种方式对我来说可能更稳定：

$resultCurl=curl_exec($curl); //get curl result
//Optional line if you want to store the http status code
$headerHttpCode=curl_getinfo($curl,CURLINFO_HTTP_CODE);

//let's use dom and xpath
$dom = new \DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($resultCurl, LIBXML_HTML_NODEFDTD);
libxml_use_internal_errors(false);
$xpath = new \DOMXPath($dom);
$head=$xpath->query("/html/body/p/a/@href");

$newUrl=$head[0]->nodeValue;

The location part is a link in the HTML sent by apache. So Xpath is perfect to recover it.

location 部分是 apache 发送的 HTML 中的一个链接。所以Xpath非常适合恢复它。

Answer 7

回答by Abhilash Nayak

You can use:

您可以使用：

$redirectURL = curl_getinfo($ch,CURLINFO_REDIRECT_URL);

php 如何使用 cURL 找到我将被重定向到的位置？

提问by Thomas Van Nuffel

回答by Matt Gibson

回答by Luca Camillo

回答by GR1NN3R

回答by broox

回答by Igor Parra

回答by Patrick Valibus

回答by Abhilash Nayak

相关推荐

最近更新

标签

php 如何使用 cURL 找到我将被重定向到的位置？

提问by Thomas Van Nuffel

回答by Matt Gibson

回答by Luca Camillo

回答by GR1NN3R

回答by broox

回答by Igor Parra

回答by Patrick Valibus

回答by Abhilash Nayak

相关推荐

php MySQL删除多列

php 如果在 echo 语句中阻塞？

php 从另一台计算机访问 XAMPP MySql 数据库

php 如何从控制器内部访问不同的控制器 Symfony2

相关推荐

最近更新

标签