使用 .Net 加载 DOM 并在服务器端执行 javascript
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10886161/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Load a DOM and Execute javascript, server side, with .Net
提问by Brook
I would like to load a DOM using a document (in string form) or a URL, and then Execute javascript functions (including jquery selectors) against it. This would be totally server side, in process, no client/browser.
我想使用文档(以字符串形式)或 URL 加载 DOM,然后对其执行 javascript 函数(包括 jquery 选择器)。这将完全是服务器端,在进程中,没有客户端/浏览器。
Basically I need to load the dom and then use jquery selectors and text() & type val() functions to extract strings from it. I don't really need to manipulate the dom.
基本上我需要加载 dom,然后使用 jquery 选择器和 text() & type val() 函数从中提取字符串。我真的不需要操纵 dom。
I have looked at .Net javascript engines such as Jurassic and Jint, but neither support loading a DOM, and so therefore can't do what I need.
我查看过 .Net javascript 引擎,例如 Jurassic 和 Jint,但都不支持加载 DOM,因此无法执行我需要的操作。
I would be willing to consider non .Net solutions (node.js, ruby, etc) if they exist, but would really prefer .Net.
如果存在,我愿意考虑非 .Net 解决方案(node.js、ruby 等),但我真的更喜欢 .Net。
editThe below is a good answer, but currently I'm trying a different route, I'm attempting to port envjs to jurassic. If I can get that working I think it will do what I want, stay tuned....
编辑下面是一个很好的答案,但目前我正在尝试不同的路线,我正在尝试将 envjs 移植到侏罗纪。如果我能让它工作,我认为它会做我想要的,请继续关注......
回答by Jamie Treworgy
The answer depends on what you are trying to do. If your goal is basically a complete web browser simulation, or a "headless browser," there are a number of solutions, but none of them (that I know of) exist cleanly in .NET. To mimic a browser, you need a javascript engine and a DOM. You've identified a few engines; I've found Jurassic to be both the most robust and fastest. The google chrome V8 engine is also very popular; the Neosis Javascript.NETproject provides a .NET wrapper for it. It's not quite pure .NET since you have a non-.NET dependency, but it integrates cleanly and is not much trouble to use.
答案取决于您要尝试做什么。如果您的目标基本上是一个完整的 Web 浏览器模拟或“无头浏览器”,那么有许多解决方案,但没有一个(据我所知)完全存在于 .NET 中。要模仿浏览器,您需要一个 javascript 引擎和一个 DOM。您已经确定了一些引擎;我发现侏罗纪是最强大和最快的。google chrome V8 引擎也很受欢迎;该Neosis Javascript.NET项目提供了一个.NET包装它。它不是纯粹的 .NET,因为您有一个非 .NET 依赖项,但它集成得很干净,使用起来也不是很麻烦。
But as you've noted, you still need a DOM. In pure C# there is XBrowser, but it looks a bit stale. There are javascript-based representations of the entire browser DOM like jsdom, too. You could probably run jsdom in Jurassic, giving you a DOM simulation without a browser, all in C# (though likely very slowly!) It would definitely run just fine in V8. If you get outside the .NET realm, there are other better-supported solutions. This questiondiscusses HtmlUnit. Then there's Seleniumfor automating actual web browsers.
但是正如您所指出的,您仍然需要一个 DOM。在纯 C# 中有XBrowser,但它看起来有点陈旧。也有像jsdom这样的整个浏览器 DOM 的基于 javascript 的表示。您可能可以在 Jurassic 中运行 jsdom,在没有浏览器的情况下为您提供 DOM 模拟,全部使用 C#(虽然可能非常慢!)它肯定会在 V8 中运行得很好。如果您不在 .NET 领域,还有其他更好的支持解决方案。这个问题讨论了 HtmlUnit。然后是Selenium,用于自动化实际的 Web 浏览器。
Also, bear in mind that a lot of the work done around the these tools is for testing. While that doesn't mean you couldn't use them for something else, they may not perform or integrate well for any kind of stable use in inline production code. If you are trying to basically do real-time HTML manipulation, then a solution mixing a lot of technologies not that aren't widely used except for testing might be a poor choice.
另外,请记住,围绕这些工具所做的很多工作都是为了测试。虽然这并不意味着您不能将它们用于其他用途,但它们可能无法很好地执行或集成,无法在内联生产代码中稳定使用。如果您试图基本上进行实时 HTML 操作,那么混合了许多除测试之外并未广泛使用的技术的解决方案可能是一个糟糕的选择。
If your need is actually HTML manipulation, and it doesn't really need to use Javascript but you are thinking more about the wealth of such tools available in JS, then I would look at C# tools designed for this purpose. For example HTML Agility Pack, or my own project CsQuery, which is a C# jQuery port.
如果您的需求实际上是 HTML 操作,并且它并不真正需要使用 Javascript,但您更多地考虑的是 JS 中可用的此类工具的丰富性,那么我会考虑为此目的设计的 C# 工具。例如HTML Agility Pack或我自己的项目CsQuery,它是一个 C# jQuery 端口。
If you are basically trying to take some code that was written for the client, but run it on a server -- e.g. for sophisticated/accelerated web scraping -- I'd search around using those terms. For example this questiondiscusses this, with answers including PhantomJS, a headless webkit browser stack, as well as some of the testing tools I have already mentioned. For web scraping, I would imagine you can live without it all being in .NET, and that may be the only reasonable answer anyway.
如果您基本上是想获取一些为客户端编写的代码,但在服务器上运行它——例如用于复杂/加速的网络抓取——我会使用这些术语进行搜索。例如,这个问题讨论了这个问题,答案包括PhantomJS、无头 webkit 浏览器堆栈,以及我已经提到的一些测试工具。对于网络抓取,我想你可以在没有 .NET 的情况下生活,无论如何这可能是唯一合理的答案。