Javascript 如何从 HTML 文档中仅获取文本(无标签)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5321739/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 16:39:46  来源:igfitidea点击:

How can I get the text only (no tags) from a HTML document?

javascriptparsing

提问by Anusha

I have a HTML page, and I want the text only (all text nodes).

我有一个 HTML 页面,我只想要文本(所有文本节点)。

Example HTML

示例 HTML

<span>hello <strong>sir</strong></span>

Desired Output

期望输出

hello sir

回答by alex

Assuming you only want children of bodyelement...

假设你只想要body元素的孩子......

Example HTML

示例 HTML

<html><head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <title> Example</title>
</head>
<body>
  a <div>b<span>c</span></div>
</body></html>

JavaScript

JavaScript

var body = document.body;
var textContent = body.textContent || body.innerText;

console.log(textContent);  //   a bc

You need to check for textContentbecause our good friend IE uses innerTextinstead.

您需要检查一下,textContent因为我们的好朋友 IE 使用了innerText

It is much easier if you have a library such as jQuery, i.e. $('body').text().

如果您有一个库,例如jQuery,即$('body').text().

Also, it can be achieved on the server side, such as strip_tags()in PHP. However, if you only wanted the bodyelement, you'd need to drill down to it using a DOM parser such as DOMDocument.

此外,它可以在服务器端实现,例如strip_tags()在 PHP 中。但是,如果您只需要该body元素,则需要使用 DOM 解析器(例如DOMDocument )深入查看它。

回答by moe

Assuming you are trying to get the html for the page your JS is residing on

假设您正在尝试获取 JS 所在页面的 html

var elems = document.getElementsByTagName('*');
var result = '';
for(var k in elems)
    result += elems[k].innerHTML || '';
alert(result);

回答by Adam Ayres

I am not sure I completely understand but if you want the markup for the current page then I guess you could make an Ajax request against the current page and use that:

我不确定我是否完全理解,但是如果您想要当前页面的标记,那么我想您可以针对当前页面发出 Ajax 请求并使用它:

$.get("/current-page-name", function(data) {
   console.log(data);
});

http://jsfiddle.net/magicaj/CAWkx/

http://jsfiddle.net/magicaj/CAWkx/