javascript 解析 HTML 以获取脚本变量值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18156795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 10:54:34  来源:igfitidea点击:

Parsing HTML to get script variable value

c#javascripthtml-agility-pack

提问by James Jeffery

I'm trying to find a method of accessing data between tags returned by a server I am making HTTP requests to. The document has multiple tags, but only one of the tags has JavaScript code between it, the rest are included from files. I want to accesses the code between the script tag.

我试图找到一种方法来访问我正在向其发出 HTTP 请求的服务器返回的标签之间的数据。文档有多个标签,但只有一个标签之间有 JavaScript 代码,其余的都包含在文件中。我想访问脚本标记之间的代码。

An example of the code is:

代码示例如下:

<html>
    // Some HTML

    <script>
        var spect = [['temper', 'init', []],
                    ['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
                    ["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]];

    </script>

    // More HTML
</html>

I'm looking for an ideal way to grab the data between 'spect' and parse it. Sometimes there is a space between 'spect' and the '=' and sometimes there isn't. No idea why, but I have no control over the server.

我正在寻找一种理想的方式来获取“spect”之间的数据并对其进行解析。有时在“spect”和“=”之间有一个空格,有时没有。不知道为什么,但我无法控制服务器。

I know this question may have been asked, but the responses suggest using something like HTMLAgilityPack, and I'd rather avoid using a library for this task as I only need to get the JavaScript from the DOM once.

我知道可能有人问过这个问题,但回复建议使用 HTMLAgilityPack 之类的东西,我宁愿避免使用库来完成这项任务,因为我只需要从 DOM 中获取 JavaScript 一次。

回答by Prix

Very simple example of how this could be easy using a HTMLAgilityPackand Jurassic libraryto evaluate the result:

一个非常简单的例子,说明如何使用HTMLAgilityPackJurassic 库来评估结果很容易:

var html = @"<html>
             // Some HTML
             <script>
               var spect = [['temper', 'init', []],
               ['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
               [""cap"",""dm"",[{""tackmod"":""profile"",""xMod"":""timed""}]]];
             </script>
             // More HTML
             </html>";

// Grab the content of the first script element
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var script = doc.DocumentNode.Descendants()
                             .Where(n => n.Name == "script")
                             .First().InnerText;

// Return the data of spect and stringify it into a proper JSON object
var engine = new Jurassic.ScriptEngine();
var result = engine.Evaluate("(function() { " + script + " return spect; })()");
var json = JSONObject.Stringify(engine, result);

Console.WriteLine(json);
Console.ReadKey();

Output:

输出:

[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]]

[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap", "dm",[{"tackmod":"profile","xMod":"timed"}]]]

Note:I am not accounting for errors or anything else, this merely serves as an example of how to grab the script and evaluate for the value of spect.

注意:我不考虑错误或其他任何事情,这仅作为如何获取脚本并评估spect 值的示例。

There are a few other libraries for executing/evaluating JavaScript as well.

还有其他一些用于执行/评估 JavaScript 的库。