php 向 LinkedIn 发出 HEAD 请求的 999 错误代码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27231113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
999 Error Code on HEAD request to LinkedIn
提问by charltoons
We're using a curl HEAD request in a PHP application to verify the validity of generic links. We check the status code just to make sure that the link the user has entered is valid. Links to all websites have succeeded, except LinkedIn.
我们在 PHP 应用程序中使用 curl HEAD 请求来验证通用链接的有效性。我们检查状态代码只是为了确保用户输入的链接有效。除 LinkedIn 外,所有网站的链接均已成功。
While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We've tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?
虽然它似乎可以在本地(Mac)运行,但当我们尝试从任何 Ubuntu 服务器发出请求时,LinkedIn 会返回 999 状态代码。不是 API 请求,只是一个简单的 curl,就像我们对每个其他链接所做的一样。我们已经在几台不同的机器上尝试过并尝试改变用户代理,但没有骰子。如何修改我们的 curl 以便工作链接返回 200?
A sample HEAD request:
示例 HEAD 请求:
curl -I --url https://www.linkedin.com/company/linkedin
curl -I --url https://www.linkedin.com/company/linkedin
Sample Response on Ubuntu machine:
Ubuntu 机器上的示例响应:
HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html
To respond to @alexandru-guzinschi a little better. We've tried masking the User Agents. To sum up our trials:
更好地回应@alexandru-guzinschi。我们已经尝试屏蔽用户代理。总结我们的试验:
- Mac machine + Mac UA => works
- Mac machine + Windows UA => works
- Ubuntu remote machine + (no UA change) => fails
- Ubuntu remote machine + Mac UA => fails
- Ubuntu remote machine + Windows UA => fails
- Ubuntu local virtual machine (on Mac) + (no UA change) => fails
- Ubuntu local virtual machine (on Mac) + Windows UA => works
- Ubuntu local virtual machine (on Mac) + Mac UA => works
- Mac 机 + Mac UA =>工作
- Mac 机 + Windows UA =>工作
- Ubuntu 远程机器 +(无 UA 更改)=>失败
- Ubuntu 远程机器 + Mac UA =>失败
- Ubuntu 远程机器 + Windows UA =>失败
- Ubuntu 本地虚拟机(在 Mac 上)+(无 UA 更改)=>失败
- Ubuntu 本地虚拟机(在 Mac 上)+ Windows UA =>工作
- Ubuntu 本地虚拟机(在 Mac 上)+ Mac UA =>工作
So now I'm thinking they block any curl requests that dont provide an alternate UA and alsoblock hosting providers?
所以现在我想他们会阻止任何不提供替代 UA 的 curl 请求,并且还会阻止托管服务提供商?
Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?
有没有其他方法可以从使用 PHP 的 Ubuntu 机器检查到 Linkedin 的链接是否有效,或者它是否会导致他们的 404 页面?
回答by Alexandru Guzinschi
It looks like they filter requests based on the user-agent:
看起来他们根据用户代理过滤请求:
$ curl -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 999 Request denied
$ curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 200 OK
回答by Andrey Izman
I found the workaround, important to set accept-encoding header:
我找到了解决方法,对于设置接受编码标头很重要:
curl --url "https://www.linkedin.com/in/izman" \
--header "user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36" \
--header "accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header "accept-encoding:gzip, deflate, sdch, br" \
| gunzip
回答by olefrank
Seems like LinkedIn filter both user agent AND ip address. I tried this both at home and from an Digital Ocean node:
似乎 LinkedIn 过滤用户代理和 IP 地址。我在家里和从 Digital Ocean 节点都尝试过这个:
curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin
From home I got a 200 OK, from DO I got 999 Denied...
从家里我得到了 200 OK,从 DO 我得到了 999 Denied...
So you need a proxy service like HideMyAssor other (haven't tested it so I couldn't say if it's valid or not). Hereis a good comparison of proxy services.
所以你需要一个像HideMyAss或其他的代理服务(没有测试过,所以我不能说它是否有效)。这是代理服务的一个很好的比较。
Or you could setup a proxy on your home network, for example use a Raspberry PI to proxy your requests. Hereis a guide on that.
或者您可以在您的家庭网络上设置一个代理,例如使用 Raspberry PI 来代理您的请求。这是一个指南。
回答by dmarlow
Proxy would work, but I think there's another way around it. I see that from AWS and other clouds that it's blocked by IP. I can issue the request from my machine and it works just fine.
代理可以工作,但我认为还有另一种方法。我从 AWS 和其他云中看到它被 IP 阻止。我可以从我的机器发出请求,它工作得很好。
I did notice that in the response from the cloud service that it returns some JS that the browser has to execute to take you to a login page. Once there, you can login and access the page. The login page is only for those accessing via a blocked IP.
我确实注意到,在来自云服务的响应中,它返回一些浏览器必须执行的 JS 才能将您带到登录页面。在那里,您可以登录并访问该页面。登录页面仅适用于通过被阻止的 IP 访问的用户。
If you use a headless client that executes JS, or maybe go straight to the subsequent link and provide the credentials of a linkedin user, you may be able to bypass it.
如果您使用执行 JS 的无头客户端,或者直接转到后续链接并提供linkedin 用户的凭据,您可能可以绕过它。
回答by Muhammad Numan
LinkedIn does not allow direct access. They have blacklisted Heroku/AWS IP address and the only way to access the data is to use their APIs. it can be accessed from the local machine or headless browser if you want to scrap LinkedIn or you can use proxy to scrap LinkedIn because LinkedIn has blocked many servers IPs
LinkedIn 不允许直接访问。他们已将 Heroku/AWS IP 地址列入黑名单,访问数据的唯一方法是使用他们的 API。如果你想报废LinkedIn可以从本地机器或无头浏览器访问它,或者你可以使用代理报废LinkedIn,因为LinkedIn屏蔽了许多服务器IP