javascript 如何使用正则表达式提取正文内容

Question

提问by faressoft

I have this code in a var.

我在 var 中有这段代码。

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body anything="">
        content
    </body>

</html>

or

或者

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body>
        content
    </body>

</html>

result should be

结果应该是

content

Answer 1

回答by Jeffrey Blake

Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regexsolution is that you can more easily provide for a case-insensitive matchon the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.

请注意，上面提供的基于字符串的答案应该适用于大多数情况。正则表达式解决方案提供的一个主要优点是您可以更轻松地在打开/关闭正文标签上提供不区分大小写的匹配。如果这不是您关心的问题，那么在这里使用正则表达式就没有什么大不了的。

And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, contentcontained </body>then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:

对于那些同时看到 HTML 和正则表达式的人来说，他们感到很不舒服……因为您实际上并没有尝试用这个来解析 HTML，所以您可以使用正则表达式来做一些事情。如果由于某种原因被content包含，</body>那么它会失败，但除此之外，你有一个足够具体的场景，正则表达式能够做你想做的事：

const strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below 
const pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im;
const array_matches = pattern.exec(strVal);

After the above executes, array_matches[1]will hold whatever came between the <bodyand </body>tags.

上述执行后，array_matches[1]将保留<body和</body>标签之间的任何内容。

Answer 2

回答by Catalin Enache

var matched = XMLHttpRequest.responseText.match(/<body[^>]*>([\w|\W]*)<\/body>/im);
alert(matched[1]);

Answer 3

回答by Doug

I believe you can load your html document into the .net HTMLDocument object and then simply call the HTMLDocument.body.innerHTML?

我相信您可以将 html 文档加载到 .net HTMLDocument 对象中，然后简单地调用 HTMLDocument.body.innerHTML?

I am sure there is even and easier way with the newer XDocumnet as well.

我相信更新的 XDocumnet 也有更简单的方法。

And just to echo some of the comments above regex is not the best tool to use as html is not a regular language and there are some edge cases that are difficult to solve for.

只是为了回应上面的一些评论 regex 不是最好的工具，因为 html 不是常规语言，并且有一些难以解决的边缘情况。

https://en.wikipedia.org/wiki/Regular_language

Enjoy!

享受！

javascript 如何使用正则表达式提取正文内容

提问by faressoft

回答by Jeffrey Blake

回答by Catalin Enache

回答by Doug

相关推荐

最近更新

标签

javascript 如何使用正则表达式提取正文内容

提问by faressoft

回答by Jeffrey Blake

回答by Catalin Enache

回答by Doug

相关推荐

javascript Highcharts - 非默认动画

javascript 我们如何通过 .attr() 在 jquery 中更改 onclick 函数？

javascript 使用MVC打开一个新窗口

UIWebView 不加载外部 Javascript 文件

相关推荐

最近更新

标签