除了 CURLOPT_COOKIEFILE 之外,如何使用 PHP curl 发送 cookie?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16872082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I send cookies using PHP curl in addition to CURLOPT_COOKIEFILE?
提问by BastiaanWW
I am scraping some content from a website after a form submission. The problem is that the script is failing every now and then, say 2 times out of 5 the script fails. I am using php curl, COOKIEFILE and COOKIEJAR to handle the cookie. However when I observed the sent headers of my browser (when visiting the target website from my browser and using live http headers) and the headers sent by php and saw there are many differences.
提交表单后,我正在从网站上抓取一些内容。问题是脚本时不时地失败,假设脚本失败 5 次中有 2 次失败。我正在使用 php curl、COOKIEFILE 和 COOKIEJAR 来处理 cookie。但是,当我观察浏览器发送的标头(从浏览器访问目标网站并使用实时 http 标头时)和 php 发送的标头时,发现有很多差异。
My browser sent a lot more cookie variables than php curl. I think this difference might be because javascript is resposible for setting most of the cookies, however I'm not sure about this.
我的浏览器发送的 cookie 变量比 php curl 多得多。我认为这种差异可能是因为 javascript 负责设置大多数 cookie,但是我不确定这一点。
I am using the below code to do the scraping and I am showing the sent headers of my browser and of php curl:
我正在使用以下代码进行抓取,并显示浏览器和 php curl 的已发送标头:
$ckfile = tempnam ("/tmp", 'cookiename');
$url = 'https://www.domain.com/firststep';
$poststring = 'variable1=4&variable2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
$output = curl_exec ($ch);
curl_close($ch);
$url = 'https://www.domain.com/nextstep';
$poststring = 'variableB1=4&variableB2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
$output = curl_exec ($ch);
$headers = curl_getinfo($ch, CURLINFO_HEADER_OUT);
curl_close($ch);
print_r($headers);
// Gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
User-Agent: Mozilla
Host: domain.subdomain.nl
Accept: */*
Cookie: JSESSIONID=7BC2A5277A4EB07D9A7237A707BE1366; www-20480=MIFBNLFDFAAA
Content-Length: 187
Content-Type: application/x-www-form-urlencoded
// Where live http headers gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
Host: domain.subdomain.nl
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: https://domain.subdomain.nl/dd/doffers.html?returnUrl=https%3A%2F%2Fttcc.subdomain.nl%2Fdd%2Fpreferences.html%3FValueChanged%3Dfalse&BEGBA=&departureDate=13-06-2013&extChangeTime=&pax2=0&bp=&pax1=1&pax4=0&bk=&pax3=0&shopId=&xtpage=&partner=NSINT&bc=&xt_pc=&ov=&departureTime=&comfortClass=2&destination=DEBHF&thalysTicketless=&beneUser=&debugDOffer=&logonId=&valueChanged=&iDomesticOrigin=&rp=&returnTime=&locale=nl_NL&vu=&thePassWeekend=false&returnDate=&xtsite=&pax=A&lc2=&lc1=&lc4=&lc3=&lc6=&lc5=&BECRA=&passType2=&custId=&lc9=&iDomesticDestination=&passType1=A&lc7=&lc8=&origin=NLASC&toporef=&pid=&passType4=&returnTimeType=1&passType3=&departureTimeType=1&socusId=&idr3=&xtn2=&loyaltyCard=&idr2=&idr1=&thePassBusiness=false&cid=14812
Content-Length: 219
Cookie: subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
AJAXREQUEST=_viewRoot&doffersForm=doffersForm&doffersForm%3AvalueChanged=&doffersForm%3ArequestValid=true&javax.faces.ViewState=j_id3&doffersForm%3Aj_id937=doffersForm%3Aj_id937&valueChanged=false&AJAX%3AEVENTS_COUNT=1&
I would like to use:
我想使用:
$headers = array();
$headers[] = 'Cookie: ' . $cookie;
and:
和:
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
where:
在哪里:
$cookie = 'subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133';
Some of the parameters in the cookie above I might be able to scrape from the content of the website, but not all. Some of them I might be able to read from the $ckfile, but I don't know how to do that. Especially the utma utmc, utmz, utmcsr, utmccn, utmcmd I am not able to get from anywhere, I think these are generated by the javascript.
上面 cookie 中的一些参数我可能能够从网站的内容中抓取,但不是全部。其中一些我可能能够从 $ckfile 中读取,但我不知道如何做到这一点。特别是 utma utmc, utmz, utmcsr, utmccn, utmcmd 我无法从任何地方获得,我认为这些是由 javascript 生成的。
Question 1:Am I doing something wrong with the cookie handling in the current code as very few cookie variables are sent by php curl and a lot more by the browser? Further: can other differences between sent headers by browser and php curl be a problem to return the right content?
问题 1:我是否对当前代码中的 cookie 处理做错了什么,因为 php curl 发送的 cookie 变量很少,而浏览器发送的 cookie 变量很多?进一步:浏览器和 php curl 发送的标头之间的其他差异是否会成为返回正确内容的问题?
Question 2:Are the missing cookie variables due to the javascript setting those cookies?
问题 2:是否由于 javascript 设置了这些 cookie 而丢失了 cookie 变量?
Question 3:What is the best way to handle the cookies to make sure that all required cookies are being sent to the remote server?
问题 3:处理 cookie 以确保所有必需的 cookie 都被发送到远程服务器的最佳方法是什么?
Your help is very welcome!
非常欢迎您的帮助!
回答by Sabuj Hassan
If the cookie is generated from script, then you can send the cookie manually along with the cookie from the file(using cookie-file option). For example:
如果 cookie 是从脚本生成的,那么您可以手动发送 cookie 和文件中的 cookie(使用 cookie-file 选项)。例如:
# sending manually set cookie
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: test=cookie"));
# sending cookies from file
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
In this case curl will send your defined cookie along with the cookies from the file.
在这种情况下,curl 会将您定义的 cookie 与文件中的 cookie 一起发送。
If the cookie is generated through javascrript, then you have to trace it out how its generated and then you can send it using the above method(through http-header).
如果cookie是通过javascrript生成的,那么你必须追踪它是如何生成的,然后你可以使用上述方法(通过http-header)发送它。
The utma utmc, utmz
are seen when cookies are sent from Mozilla. You shouldn't bet worry about these things anymore.
在utma utmc, utmz
当cookie是从Mozilla的发送被看见。你不应该再担心这些事情了。
Finally, the way you are doing is alright. Just make sure you are using absolute path for the file names(i.e. /var/dir/cookie.txt
) instead of relative one.
最后,你做的方式没问题。只要确保您使用文件名的绝对路径(即/var/dir/cookie.txt
)而不是相对路径。
Always enable the verbose mode when working with curl. It will help you a lot on tracing the requests. Also it will save lot of your times.
使用 curl 时始终启用详细模式。它将对您跟踪请求有很大帮助。它还可以节省您的大量时间。
curl_setopt($ch, CURLOPT_VERBOSE, true);
回答by Dasitha Abeysinghe
Try below code,
试试下面的代码,
$cookieFile = "cookies.txt";
if(!file_exists($cookieFile)) {
$fh = fopen($cookieFile, "w");
fwrite($fh, "");
fclose($fh);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $apiCall);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $jsonDataEncoded);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); // Cookie aware
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); // Cookie aware
curl_setopt($ch, CURLOPT_VERBOSE, true);
if(!curl_exec($ch)){
die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}
else{
$response = curl_exec($ch);
}
curl_close($ch);
$result = json_decode($response, true);
echo '<pre>';
var_dump($result);
echo'</pre>';
I hope this will help you.
我希望这能帮到您。
Best regards, Dasitha.
最好的问候,达西莎。
回答by Serhii Andriichuk
Here is a list of examples for sending cookies - https://github.com/andriichuk/php-curl-cookbook#cookies
以下是发送 cookie 的示例列表 - https://github.com/andriichuk/php-curl-cookbook#cookies
$curlHandler = curl_init();
curl_setopt_array($curlHandler, [
CURLOPT_URL => 'https://httpbin.org/cookies',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_COOKIEFILE => $cookieFile,
CURLOPT_COOKIE => 'foo=bar;baz=foo',
/**
* Or set header
* CURLOPT_HTTPHEADER => [
'Cookie: foo=bar;baz=foo',
]
*/
]);
$response = curl_exec($curlHandler);
curl_close($curlHandler);
echo $response;
回答by Atanas Atanasov
I think the only cookie you need is JSESSIONID=xxx..
我认为您需要的唯一 cookie 是 JSESSIONID=xxx..
Also NEVER share your cookies, becasuse someone may access your personal data that way. Specially when the cookies are session. These cookies will stop working once you logout the site.
也永远不要共享您的 cookie,因为有人可能会以这种方式访问您的个人数据。特别是当 cookie 是会话时。一旦您退出网站,这些 cookie 将停止工作。