C# (.NET) 的无头浏览器?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10161413/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Headless browser for C# (.NET)?
提问by Bo Milanovich
I am (was) a Python developer who is building a GUI web scraping application. Recently I've decided to migrate to .NET framework and write the same application in C# (this decision wasn't mine).
我是(曾经)一名 Python 开发人员,他正在构建一个 GUI 网络抓取应用程序。最近我决定迁移到 .NET 框架并用 C# 编写相同的应用程序(这个决定不是我的决定)。
In Python, I've used the Mechanize library. However, I can't seem to find anything similar in .NET. What I need is a browser that will run in a headless mode, which has the ability to fill out forms, submit them, etc. JavaScript parser is not a must, but it would be quite useful.
在 Python 中,我使用了 Mechanize 库。但是,我似乎在 .NET 中找不到任何类似的东西。我需要的是一个以无头模式运行的浏览器,它能够填写表单、提交表单等。JavaScript 解析器不是必须的,但它会非常有用。
采纳答案by Yahia
There are some options:
有一些选择:
WebKit.Net(free)
Awesomium
It is based on Chrome/WebKit and works like a charm. There is a free license available but also a commercial one and if need be you can buy the source code :-)HTML Agility Pack(free)
This helps with extracting information from HTML etc. and might be useful in your case (possibly in combination withHttpWebRequest)
WebKit.Net(免费)
Awesomium
它是基于 Chrome/WebKit 的,工作起来非常有魅力。有可用的免费许可证,但也有商业许可证,如果需要,您可以购买源代码 :-)HTML Agility Pack(免费)
这有助于从 HTML 等中提取信息,并且可能对您有用(可能与 结合使用HttpWebRequest)
回答by Steven de Salas
You may be after TrifleJS(currently in beta), or something similar using the .NET WebBrowserclass which communicates with IE via a windowless ActiveX/COM API.
您可能会使用TifleJS(目前处于测试阶段)或类似的东西,使用.NET WebBrowser类,该类通过无窗口的 ActiveX/COM API 与 IE 通信。
You'll essentially be running a fully fledged browser (not a http request wrapper) using Internet Explorer's Trident engine, if you are not interested in the JavaScript API (a port of phantomjs) you may still be able to use some of the C# codebase to get around key concepts (custom headers, cookies, script execution, screenshot rendering etc).
您基本上将使用 Internet Explorer 的 Trident 引擎运行一个完全成熟的浏览器(不是 http 请求包装器),如果您对 JavaScript API(phantomjs 的一个端口)不感兴趣,您仍然可以使用一些 C# 代码库绕过关键概念(自定义标题、cookie、脚本执行、屏幕截图渲染等)。
Note that this can also emulate different versions of IE depending on what you have installed.
请注意,这也可以根据您安装的内容模拟不同版本的 IE。
回答by Knyaz
More solutions:
更多解决方案:
- PhantomJS- full featured headless web browser. Often used in pair with Selenium which allows you to access the browser from .NET application.
- Optimus(nuget package)- lightweight headless web browser. It's in beta but it is sufficient for some cases.
- PhantomJS- 功能齐全的无头 Web 浏览器。通常与 Selenium 一起使用,它允许您从 .NET 应用程序访问浏览器。
- Optimus(nuget 包)- 轻量级无头 Web 浏览器。它处于测试阶段,但对于某些情况已经足够了。
I used to use both for web testing. But they are also suitable for web scraping.
我曾经将两者都用于网络测试。但它们也适用于网页抓取。


