java 如何使 chromedriver 无法检测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42169488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make chromedriver undetectable
提问by bryce
This is my first Stack Overflow question so please bear with me.
这是我的第一个 Stack Overflow 问题,所以请耐心等待。
I have read thisSO question, which lead me to wondering, is it possible to make chromedriver completely undetectable?
我已经阅读了这个SO 问题,这让我想知道,是否有可能使 chromedriver 完全无法检测?
For my own curiosity's sake I have tested the method described and found that it was unsuccessful in creating a completely anonymous browser.
为了我自己的好奇心,我测试了所描述的方法,发现它无法成功创建一个完全匿名的浏览器。
I read through the driver's documentation and found this:
我通读了驱动程序的文档,发现了这一点:
partial interface Navigator { readonly attribute boolean webdriver; };
The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.
This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.
partial interface Navigator { readonly attribute boolean webdriver; };
Navigator 接口的 webdriver IDL 属性必须返回 webdriver-active 标志的值,该值最初为 false。
此属性允许网站确定用户代理受 WebDriver 控制,并可用于帮助减轻拒绝服务攻击。
However, I cannot find where these tags are even located through the browser console or in the source code.
但是,我什至无法通过浏览器控制台或源代码找到这些标签的位置。
I would imagine this is responsible for the detection of chromedriver, however, after combing through the source code, I could not find this interface. As a result, it has left me wondering whether or not this feature is included in the current chromedriver. If not, I still know that the current chromedriver is detectable by websites and other services such as distill.
我想这个是负责检测chromedriver的,但是梳理了一下源码,没找到这个接口。因此,它让我怀疑当前的 chromedriver 中是否包含此功能。如果不是这样,我仍然知道目前chromedriver是网站和其他服务检测,如提制。
回答by freesoul
In order to use ChromeDriver undetectable to Distil checkpoints (which are described nicely in this stackoverflow post), you will need to ensure that your browser does not contain any variable in its window or document prototypes that reveal that you are using a webdriver, as the one you mention.
为了使用无法检测到 Distil 检查点的 ChromeDriver(在这个stackoverflow 帖子中很好地描述了),您需要确保您的浏览器在其窗口或文档原型中不包含任何表明您正在使用网络驱动程序的变量,因为你提到的一个。
You can use software as Selenium along with ChromeDriver and Chrome as long as you take some precautions and make some fixes to the binaries. This method will apply only to headed version, if you wish to use headless, you would need to take additional measurements to pass window/rendering tests, described here.
只要您采取一些预防措施并对二进制文件进行一些修复,您就可以将软件作为 Selenium 与 ChromeDriver 和 Chrome 一起使用。此方法仅适用于有头版本,如果您希望使用无头版本,则需要进行额外的测量以通过窗口/渲染测试,此处描述。
1. Fix Chrome binary, or use an old version
1.修复Chrome二进制文件,或使用旧版本
First, lets deal with that navigator.webdriver
set to True. This
is defined by W3C protocol hereas part of
'NavigatorAutomationInformation' of browsers, which extends the Navigator
interface. How to remove it? That project has a lot of files, third party stuff, blink web runtimes, etc. So, instead of becoming crazy trying to figure out how this works, as Chromium is open-source, just be clever and search google for the commit which incorporated that. Here is the link. Pay attention to these files:
首先,让我们处理navigator.webdriver
设置为 True。这是由 W3C 协议在此处定义为浏览器“NavigatorAutomationInformation”的一部分,它扩展了 Navigator 接口。如何删除它?该项目有很多文件、第三方内容、blink web 运行时等。因此,不要疯狂地试图弄清楚它是如何工作的,因为 Chromium 是开源的,只要聪明一点,在谷歌搜索合并的提交那。这是链接。注意这些文件:
third_party/WebKit/Source/core/frame/Navigator.h
, which holds the line of code:`bool webdriver() const { return true; }` This method is supossed to always return true, as you can see.
third_party/WebKit/Source/core/frame/Navigator.idl
, which contains the extensions of Navigators, included our`Navigator implements NavigatorAutomationInformation;` which is being commited. Interesting, isn't it?
third_party/WebKit/Source/core/frame/NavigatorAutomationInformation.idl
contains the extension itself, with a read-only variable, which iswebdriver
:`[ NoInterfaceObject, // Always used on target of 'implements' Exposed=(Window), RuntimeEnabled=AutomationControlled ] interface NavigatorAutomationInformation { readonly attribute boolean webdriver; };`
third_party/WebKit/Source/core/frame/Navigator.h
,其中包含代码行:`bool webdriver() const { return true; }` This method is supossed to always return true, as you can see.
third_party/WebKit/Source/core/frame/Navigator.idl
,其中包含导航器的扩展,包括我们的`Navigator implements NavigatorAutomationInformation;` which is being commited. Interesting, isn't it?
third_party/WebKit/Source/core/frame/NavigatorAutomationInformation.idl
包含扩展本身,带有一个只读变量,它是webdriver
:`[ NoInterfaceObject, // Always used on target of 'implements' Exposed=(Window), RuntimeEnabled=AutomationControlled ] interface NavigatorAutomationInformation { readonly attribute boolean webdriver; };`
To get rid of this functionality, it should be enough commenting the line in Navigator.idl
which extends Navigator
with this functionality, and compile the source (compiling in linux here). However, this is a laborious task for almost any computer and can take several hours. If you look the date of the commit, it was on Oct 2017, so an option is to download any version of Chrome released before that date. To search for mirrors, you can google for inurl:/deb/pool/main/g/google-chrome-stable/
.
要摆脱此功能,只需注释Navigator.idl
扩展Navigator
此功能的行并编译源代码(此处在 linux 中编译)就足够了。然而,这对于几乎所有计算机来说都是一项艰巨的任务,可能需要几个小时。如果您查看提交日期,它是2017年10 月,因此可以选择下载在该日期之前发布的任何 Chrome 版本。要搜索镜像,您可以谷歌搜索inurl:/deb/pool/main/g/google-chrome-stable/
.
2. Fix ChromeDriver
2.修复ChromeDriver
Distil checks the regex rule '/\$[a-z]dc_/' against window variables, and ChromeDriver adds one as mentioned herewhich satisfies that condition. As they mention, you have to edit call_function.js
amongst the source code, and redefine the variable var key = '$cdc_asdjflasutopfhvcZLmcfl_';
. with something else. Also, probably easier, you can use an hex editor to update the existing binary.
Distil 根据窗口变量检查正则表达式规则 '/\$[az]dc_/',ChromeDriver 添加了一个满足该条件的这里提到的规则。正如他们所提到的,您必须call_function.js
在源代码中进行编辑,并重新定义变量var key = '$cdc_asdjflasutopfhvcZLmcfl_';
。用别的东西。此外,可能更容易,您可以使用十六进制编辑器来更新现有的二进制文件。
If you decided to use an older version of Chrome -I guess you did-, you will need to use an appropiate version of ChromeDriver. You can know which one is fine for your Chrome version in the ChromeDriver downloads webpage. For example, for Chrome v61 (which fits your needs), you could use ChromeDriver 2.34. Once done, just put the ChromeDriver binary on '/usr/bin/local'.
如果您决定使用旧版本的 Chrome(我猜您是这样做的),您将需要使用适当版本的 ChromeDriver。您可以在ChromeDriver 下载网页 中了解哪个适合您的 Chrome 版本。例如,对于 Chrome v61(满足您的需求),您可以使用ChromeDriver 2.34。完成后,只需将 ChromeDriver 二进制文件放在“/usr/bin/local”上。
3. Take other precautions
3. 采取其他预防措施
- Pay attention to your user-agent.
- Don't perform too many repeated requests.
- Use a (random) delay between requests.
- Use the Chrome arguments used hereto mimic a normal user profile.
- 注意你的用户代理。
- 不要执行太多重复的请求。
- 在请求之间使用(随机)延迟。
- 使用此处使用的 Chrome 参数来模拟普通用户配置文件。
回答by forresthopkinsa
You can't use Selenium's WebDriver itself to change UserAgent, which sounds like what you're really trying to do here.
您不能使用 Selenium 的 WebDriver 本身来更改 UserAgent,这听起来就像您在这里真正想要做的一样。
However, that doesn't mean it can't be changed.
然而,这并不意味着它不能改变。
Enter PhantomJS.
进入 PhantomJS。
Check out this answer. You can use that to disguise Selenium as a different browser, or pretty much anything else. Of course, if a website is determined to figure you out, there are plenty of clues that Selenium leaves behind (like clicking with perfect precision).
看看这个答案。您可以使用它来将 Selenium 伪装成不同的浏览器,或者几乎任何其他浏览器。当然,如果一个网站决心找出你,Selenium 会留下很多线索(比如精确点击)。