将 HTML 转换为纯文本同时保留换行符（使用 JavaScript）的最便捷方法是什么？

Question

提问by Danylo Mysak

Basically I just need the effect of copying that HTML from browser window and pasting it in a textarea element.

基本上我只需要从浏览器窗口复制该 HTML 并将其粘贴到 textarea 元素中的效果。

For example I want this:

例如我想要这个：

<p>Some</p>
<div>text<br />Some</div>
<div>text</div>

to become this:

变成这样：

Some
text
Some
text

Answer 1

回答by Tim Down

If that HTML is visible within your web page, you could do it with the user selection (or just a TextRangein IE). This does preserve line breaks, if not necessarily leading and trailing white space.

如果该 HTML 在您的网页中可见，您可以通过用户选择（或仅TextRange在 IE 中）来实现。这确实保留了换行符，如果不一定是前导和尾随空格。

UPDATE 10 December 2012

2012 年 12 月 10 日更新

However, the toString()method of Selectionobjects is not yet standardizedand works inconsistently between browsers, so this approach is based on shaky ground and I don't recommend using it now. I would delete this answer if it weren't accepted.

但是，对象的toString()方法还没有标准化，并且在浏览器之间的工作不一致，所以这种方法是基于不可靠的，我现在不建议使用它。如果没有被接受，我会删除这个答案。Selection

Demo: http://jsfiddle.net/wv49v/

演示：http: //jsfiddle.net/wv49v/

Code:

代码：

function getInnerText(el) {
    var sel, range, innerText = "";
    if (typeof document.selection != "undefined" && typeof document.body.createTextRange != "undefined") {
        range = document.body.createTextRange();
        range.moveToElementText(el);
        innerText = range.text;
    } else if (typeof window.getSelection != "undefined" && typeof document.createRange != "undefined") {
        sel = window.getSelection();
        sel.selectAllChildren(el);
        innerText = "" + sel;
        sel.removeAllRanges();
    }
    return innerText;
}

Answer 2

回答by Kevin Wiskia

I tried to find some code I wrote for this a while back that I used. It worked nicely. Let me outline what it did, and hopefully you could duplicate its behavior.

我试图找到一些我用过的代码。它工作得很好。让我概述它做了什么，希望你能复制它的行为。

Replace images with alt or title text.
Replace links with "text[link]"
Replace things that generally produce vertical white space. h1-h6, div, p, br, hr, etc. (I know, I know. These could actually be inline elements, but it works out well.)
Strip out the rest of the tags and replace with an empty string.

用 alt 或标题文本替换图像。
用“文本[链接]”替换链接
替换通常会产生垂直空白的东西。h1-h6、div、p、br、hr 等（我知道，我知道。这些实际上可以是内联元素，但效果很好。）
去掉其余的标签并用空字符串替换。

You could even expand this more to format things like ordered and unordered lists. It really just depends on how far you'll want to go.

您甚至可以进一步扩展它以格式化有序和无序列表之类的内容。这真的只取决于你想走多远。

EDIT

编辑

Found the code!

找到代码了！

public static string Convert(string template)
{
    template = Regex.Replace(template, "<img .*?alt=[\"']?([^\"']*)[\"']?.*?/?>", ""); /* Use image alt text. */
    template = Regex.Replace(template, "<a .*?href=[\"']?([^\"']*)[\"']?.*?>(.*)</a>", " []"); /* Convert links to something useful */
    template = Regex.Replace(template, "<(/p|/div|/h\d|br)\w?/?>", "\n"); /* Let's try to keep vertical whitespace intact. */
    template = Regex.Replace(template, "<[A-Za-z/][^<>]*>", ""); /* Remove the rest of the tags. */

    return template;
}

Answer 3

回答by chrmcpn

I made a function based on this answer: https://stackoverflow.com/a/42254787/3626940

我根据这个答案做了一个函数：https: //stackoverflow.com/a/42254787/3626940

function htmlToText(html){
    //remove code brakes and tabs
    html = html.replace(/\n/g, "");
    html = html.replace(/\t/g, "");

    //keep html brakes and tabs
    html = html.replace(/<\/td>/g, "\t");
    html = html.replace(/<\/table>/g, "\n");
    html = html.replace(/<\/tr>/g, "\n");
    html = html.replace(/<\/p>/g, "\n");
    html = html.replace(/<\/div>/g, "\n");
    html = html.replace(/<\/h>/g, "\n");
    html = html.replace(/<br>/g, "\n"); html = html.replace(/<br( )*\/>/g, "\n");

    //parse html into text
    var dom = (new DOMParser()).parseFromString('<!doctype html><body>' + html, 'text/html');
    return dom.body.textContent;
}

Answer 4

回答by holm50

Based on chrmcpnanswer, I had to convert a basic HTML email template into a plain text version as part of a build script in node.js. I had to use JSDOMto make it work, but here's my code:

根据chrmcpn答案，我必须将基本 HTML 电子邮件模板转换为纯文本版本，作为node.js 中构建脚本的一部分。我不得不使用JSDOM使其工作，但这是我的代码：

const htmlToText = (html) => {
    html = html.replace(/\n/g, "");
    html = html.replace(/\t/g, "");

    html = html.replace(/<\/p>/g, "\n\n");
    html = html.replace(/<\/h1>/g, "\n\n");
    html = html.replace(/<br>/g, "\n");
    html = html.replace(/<br( )*\/>/g, "\n");

    const dom = new JSDOM(html);
    let text = dom.window.document.body.textContent;

    text = text.replace(/  /g, "");
    text = text.replace(/\n /g, "\n");
    text = text.trim();
    return text;
}

Answer 5

回答by Serapth

Three steps.

三个步骤。

First get the html as a string.
Second, replace all <BR /> and <BR> with \r\n.
Third, use the regular expression "<(.|\n)*?>" to replace all markup with "".

将 HTML 转换为纯文本同时保留换行符（使用 JavaScript）的最便捷方法是什么？

提问by Danylo Mysak

回答by Tim Down

回答by Kevin Wiskia

回答by chrmcpn

回答by holm50

回答by Serapth

相关推荐

最近更新

标签

将 HTML 转换为纯文本同时保留换行符（使用 JavaScript）的最便捷方法是什么？

提问by Danylo Mysak

回答by Tim Down

回答by Kevin Wiskia

回答by chrmcpn

回答by holm50

回答by Serapth

相关推荐

将原生 javascript 对象与 jQuery 进行比较

javascript WebGL 和两种图像大小的威力

javascript 如何使用jQuery在for循环的迭代中不断添加到变量？

javascript 脚本在本地主机中不起作用

相关推荐

最近更新

标签