Javascript 从 contentEditable div 中提取文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3455931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 04:42:10  来源:igfitidea点击:

Extracting text from a contentEditable div

javascriptjqueryhtmlcsscontenteditable

提问by Shaggy Frog

I have a div set to contentEditableand styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.

我有一个 div 设置为contentEditable" white-space:pre"并设置样式,所以它保留了换行符之类的东西。在 Safari、FF 和 IE 中,div 的外观和工作原理几乎相同。一切都很好。我想要做的是从这个 div 中提取文本,但不会丢失格式——特别是换行符。

We are using jQuery, whose text()function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.

我们使用的是 jQuery,它的text()功能基本上是做一个预排序的 DFS,并将 DOM 的那个分支中的所有内容粘合在一起成为一个单一的块。这会丢失格式。

I had a look at the html()function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditablediv. Assuming I type this into my div:

我查看了该html()函数,但似乎所有三个浏览器都对在我的contentEditablediv 中在幕后生成的实际 HTML 执行不同的操作。假设我在我的 div 中输入:

1
2
3

These are the results:

这些是结果:

Safari 4:

野生动物园 4:

1
<div>2</div>
<div>3</div>

Firefox 3.6:

火狐 3.6:

1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">

IE 8:

IE 8:

<P>1</P><P>2</P><P>3</P>

Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)

啊。这里没有什么非常一致的。令人惊讶的是,MSIE 看起来是最理智的!(大写的 P 标签和所有)

The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pretag (which was alluded to on some pages I found using Google).

div 将动态设置样式(字体、颜色、大小和对齐方式),这是使用 CSS 完成的,所以我不确定是否可以使用pre标签(在我使用 Google 找到的某些页面上提到过)。

Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks?I'd prefer not to reinvent a parsing wheel if I don't have to.

有谁知道任何 JavaScript 代码和/或 jQuery 插件或一些可以从 contentEditable div 中提取文本以保留换行符的东西?如果不需要,我宁愿不重新发明解析轮。

Update: I cribbed the getTextfunction from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);

更新:我getText从 jQuery 1.4.2 中复制了该函数并对其进行了修改,以将其提取出来的空格几乎完好无损(我只更改了一行添加换行符的地方);

function extractTextWithWhitespace( elems ) {
    var ret = "", elem;

    for ( var i = 0; elems[i]; i++ ) {
        elem = elems[i];

        // Get the text from text nodes and CDATA nodes
        if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
            ret += elem.nodeValue + "\n";

        // Traverse everything else, except comment nodes
        } else if ( elem.nodeType !== 8 ) {
            ret += extractTextWithWhitespace2( elem.childNodes );
        }
    }

    return ret;
}

I call this function and use its output to assign it to an XML node with jQuery, something like:

我调用此函数并使用其输出将其分配给带有 jQ​​uery 的 XML 节点,例如:

var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);

The resulting XML is eventually sent to a server via an AJAX call.

生成的 XML 最终通过 AJAX 调用发送到服务器。

This works well in Safari and Firefox.

这在 Safari 和 Firefox 中运行良好。

On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):

在 IE 上,只有第一个 '\n' 似乎以某种方式被保留。进一步研究,看起来 jQuery 正在像这样设置文本(jQuery-1.4.2.js 的第 4004 行):

return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );

Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

继续阅读createTextNode,似乎 IE 的实现可能会混淆空白。这是真的还是我做错了什么?

采纳答案by Shaggy Frog

I forgot about this question until now, when Nico slapped a bounty on it.

直到现在,我才忘记了这个问题,当时 Nico 悬赏了它。

I solved the problem by writing the function I needed myself, cribbing a function from the existing jQuery codebase and modifying it to work as I needed.

我通过编写自己需要的函数解决了这个问题,从现有的 jQuery 代码库中提取一个函数并修改它以根据需要工作。

I've tested this function with Safari (WebKit), IE, Firefox and Opera. I didn't bother checking any other browsers since the whole contentEditable thing is non-standard. It is also possible that an update to any browser could break this function if they change how they implement contentEditable. So programmer beware.

我已经在 Safari (WebKit)、IE、Firefox 和 Opera 上测试过这个功能。我没有费心检查任何其他浏览器,因为整个 contentEditable 东西都是非标准的。如果任何浏览器更改其实现 contentEditable 的方式,则任何浏览器的更新也可能会破坏此功能。所以程序员要小心。

function extractTextWithWhitespace(elems)
{
    var lineBreakNodeName = "BR"; // Use <br> as a default
    if ($.browser.webkit)
    {
        lineBreakNodeName = "DIV";
    }
    else if ($.browser.msie)
    {
        lineBreakNodeName = "P";
    }
    else if ($.browser.mozilla)
    {
        lineBreakNodeName = "BR";
    }
    else if ($.browser.opera)
    {
        lineBreakNodeName = "P";
    }
    var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName);

    return extractedText;
}

// Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace
function extractTextWithWhitespaceWorker(elems, lineBreakNodeName)
{
    var ret = "";
    var elem;

    for (var i = 0; elems[i]; i++)
    {
        elem = elems[i];

        if (elem.nodeType === 3     // text node
            || elem.nodeType === 4) // CDATA node
        {
            ret += elem.nodeValue;
        }

        if (elem.nodeName === lineBreakNodeName)
        {
            ret += "\n";
        }

        if (elem.nodeType !== 8) // comment node
        {
            ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName);
        }
    }

    return ret;
}

回答by Nick Craver

Unfortunately you do still have to handle this for the precase individually per-browser (I don't condone browserdetection in many cases, use featuredetection...but in this case it's necessary), but luckily you can take care of them all pretty concisely, like this:

不幸的是,您仍然必须为pre每个浏览器单独处理此情况(在许多情况下,我不会容忍浏览器检测,使用功能检测......但在这种情况下这是必要的),但幸运的是您可以处理所有这些非常简洁,像这样:

var ce = $("<pre />").html($("#edit").html());
if($.browser.webkit) 
  ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });    
if($.browser.msie) 
  ce.find("p").replaceWith(function() { return this.innerHTML  +  "<br>"; });
if($.browser.mozilla || $.browser.opera ||$.browser.msie )
  ce.find("br").replaceWith("\n");

var textWithWhiteSpaceIntact = ce.text();

You can test it out here. IE in particular is a hassle because of the way is does &nbsp;and new lines in text conversion, that's why it gets the <br>treatment above to make it consistent, so it needs 2 passes to be handled correctly.

你可以在这里测试一下。IE 特别麻烦,因为它的方式&nbsp;和文本转换中的新行,这就是为什么它得到<br>上面的处理以使其一致,所以它需要 2 次才能正确处理。

In the above #editis the ID of the contentEditablecomponent, so just change that out, or make this a function, for example:

上面#editcontentEditable组件的ID ,所以把它改掉,或者把它变成一个函数,例如:

function getContentEditableText(id) {
    var ce = $("<pre />").html($("#" + id).html());
    if ($.browser.webkit)
      ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
    if ($.browser.msie)
      ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
    if ($.browser.mozilla || $.browser.opera || $.browser.msie)
      ce.find("br").replaceWith("\n");

    return ce.text();
}

You can test that here. Or, since this is built on jQuery methods anyway, make it a plugin, like this:

你可以在这里测试。或者,因为无论如何它都是建立在 jQuery 方法上的,所以把它作为一个插件,像这样:

$.fn.getPreText = function () {
    var ce = $("<pre />").html(this.html());
    if ($.browser.webkit)
      ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
    if ($.browser.msie)
      ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
    if ($.browser.mozilla || $.browser.opera || $.browser.msie)
      ce.find("br").replaceWith("\n");

    return ce.text();
};

Then you can just call it with $("#edit").getPreText(), you can test that version here.

然后你可以调用它$("#edit").getPreText()你可以在这里测试那个版本

回答by user10

see this fiddle

看到这个小提琴

Or this post

或者这个帖子

How to parse editable DIV's text with browser compatibility

如何解析具有浏览器兼容性的可编辑 DIV 文本

created after lot of effort...........

经过努力创造......

回答by alfadog67

I discovered this today in Firefox:

我今天在 Firefox 中发现了这一点:

I pass a contenteditable div who's white-space is set to "pre" to this function, and it works sharply.

我将一个 contenteditable div 的空白设置为“pre”传递给这个函数,它工作得很好。

I added a line to show how many nodes there are, and a button that puts the output into another PRE, just to prove that the linebreaks are intact.

我添加了一行来显示有多少个节点,以及一个将输出放入另一个 PRE 的按钮,只是为了证明换行符完好无损。

It basically says this:

它基本上是这样说的:

For each child node of the DIV,
   if it contains the 'data' property,
      add the data value to the output
   otherwise
      add an LF (or a CRLF for Windows)
}
and return the result.

There is an issue, tho. When you hit enter at the end of any line of the original text, instead of putting a LF in, it puts a "?" in. You can hit enter again and it puts a LF in there, but not the first time. And you have to delete the "?" (it looks like a space). Go figure - I guess that's a bug.

有个问题,呵呵。当您在原始文本的任何行的末尾按 Enter 键时,它不是放入 LF,而是放入“?” in。你可以再次按回车键,它会在那里放一个 LF,但不是第一次。你必须删除“?” (它看起来像一个空格)。去图 - 我想这是一个错误。

This doesn't occur in IE8. (change textContent to innerText) There is a different bug there, tho. When you hit enter, it splits the node into 2 nodes, as it does in Firefox, but the "data" property of each one of those nodes then becomes "undefined".

这在 IE8 中不会发生。(将 textContent 更改为innerText)那里有一个不同的错误。当您按 Enter 键时,它会将节点拆分为 2 个节点,就像在 Firefox 中一样,但每个节点的“数据”属性随后变为“未定义”。

I'm sure there's much more going on here than meets the eye, so any input on the matter will be enlightening.

我敢肯定,这里发生的事情远比我们看到的要多得多,因此对此事的任何意见都将具有启发性。

<!DOCTYPE html>
<html>
<HEAD>
<SCRIPT type="text/javascript">
    function htmlToText(elem) {
        var outText="";
        for(var x=0; x<elem.childNodes.length; x++){
            if(elem.childNodes[x].data){
                outText+=elem.childNodes[x].data;
            }else{
                outText+="\n";
            }
        }
        alert(elem.childNodes.length + " Nodes: \r\n\r\n" + outText);
        return(outText);
    }
</SCRIPT>
</HEAD>
<body>

<div style="white-space:pre;" contenteditable=true id=test>Text in a pre element
is displayed in a fixed-width
font, and it preserves
both      spaces and
line breaks
</DIV>
<INPUT type=button value="submit" onclick="document.getElementById('test2').textContent=htmlToText(document.getElementById('test'))">
<PRE id=test2>
</PRE>
</body>
</html>

回答by Jon z

here's a solution (using underscore and jquery) that seems to work in iOS Safari (iOS 7 and 8), Safari 8, Chrome 43, and Firefox 36 in OS X, and IE6-11 on Windows:

这是一个解决方案(使用下划线和 jquery),它似乎适用于 OS X 中的 iOS Safari(iOS 7 和 8)、Safari 8、Chrome 43 和 Firefox 36,以及 Windows 上的 IE6-11:

_.reduce($editable.contents(), function(text, node) {
    return text + (node.nodeValue || '\n' +
        (_.isString(node.textContent) ? node.textContent : node.innerHTML));
}, '')

see test page here: http://brokendisk.com/code/contenteditable.html

在此处查看测试页面:http: //brokendisk.com/code/contenteditable.html

although I think the real answer is that if you're not interested in the markup provided by the browser, you shouldn't be using the contenteditableattribute - a textarea would be the proper tool for the job.

尽管我认为真正的答案是,如果您对浏览器提供的标记不感兴趣,则不应使用该contenteditable属性 - textarea 将是适合该工作的工具。

回答by Artur Vanesyan

this.editableVal = function(cont, opts) 
{
  if (!cont) return '';
  var el = cont.firstChild;
  var v = '';
  var contTag = new RegExp('^(DIV|P|LI|OL|TR|TD|BLOCKQUOTE)$');
  while (el) {
    switch (el.nodeType) {
      case 3:
        var str = el.data.replace(/^\n|\n$/g, ' ').replace(/[\n\xa0]/g, ' ').replace(/[ ]+/g, ' ');
        v += str;
        break;
      case 1:
        var str = this.editableVal(el);
        if (el.tagName && el.tagName.match(contTag) && str) {
          if (str.substr(-1) != '\n') {
            str += '\n';
          }

          var prev = el.previousSibling;
          while (prev && prev.nodeType == 3 && PHP.trim(prev.nodeValue) == '') {
            prev = prev.previousSibling;
          }
          if (prev && !(prev.tagName && (prev.tagName.match(contTag) || prev.tagName == 'BR'))) {
            str = '\n' + str;
          }

        }else if (el.tagName == 'BR') {
          str += '\n';
        }
        v += str;
        break;
    }
    el = el.nextSibling;
  }
  return v;
}