javascript 将纯文本呈现为 HTML 保留空白 - 没有 <pre>

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5007574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 15:38:14  来源:igfitidea点击:

Rendering Plaintext as HTML maintaining whitespace – without <pre>

javascriptpythonhtmlalgorithmplaintext

提问by Alan H.

Given any arbitrary text file full of printable characters, how can this be converted to HTML that would be rendered exactly the same (with the following requirements)?

给定任何充满可打印字符的任意文本文件,如何将其转换为呈现完全相同的 HTML(具有以下要求)?

  • Does not rely on any but the default HTML whitespace rules
    • No <pre>tag
    • No CSS white-spacerules
  • <p>tags are fine, but not required (<br />s and/or <div>s are fine)
  • Whitespace is maintained exactly.

    Given the following lines of input (ignore erroneous auto syntax highlighting):

    Line one
        Line two, indented    four spaces
    

    A browser should render the output exactly the same, maintaining the indentation of the second line and the gap between "indented" and "spaces". Of course, I am not actually looking for monospaced output, and the font is orthogonal to the algorithm/markup.

    Given the two lines as a complete input file, example correct output would be:

    Line one<br />&nbsp;&nbsp;&nbsp;&nbsp;Line two, 
    indented&nbsp;&nbsp;&nbsp; four spaces
    
  • Soft wrapping in the browser is desirable. That is, the resulting HTML should not force the user to scroll, even when input lines are wider than their viewport (assuming individual words are still narrowing than said viewport).

  • 除了默认的 HTML 空白规则之外,不依赖任何规则
    • <pre>标签
    • 没有 CSSwhite-space规则
  • <p>标签很好,但不是必需的(<br />s 和/或<div>s 很好)
  • 空白被精确地维护。

    鉴于以下输入行(忽略错误的自动语法突出显示):

    Line one
        Line two, indented    four spaces
    

    浏览器应该呈现完全相同的输出,保持第二行的缩进以及“缩进”和“空格”之间的间隙。当然,我实际上并不是在寻找等宽输出,并且字体与算法/标记正交。

    将这两行作为完整的输入文件,示例正确的输出将是:

    Line one<br />&nbsp;&nbsp;&nbsp;&nbsp;Line two, 
    indented&nbsp;&nbsp;&nbsp; four spaces
    
  • 浏览器中的软包装是可取的。也就是说,生成的 HTML 不应强制用户滚动,即使输入行比他们的视口宽(假设单个词仍然比所述视口窄)。

I'm looking for fully defined algorithm.Bonus points for implementation in pythonor javascript.

我正在寻找完全定义的算法。pythonjavascript 中实现的奖励积分。

(Please do not just answer that I should be using <pre>tags or a CSS white-spacerule, as my requirements render those options untenable. Please also don't post untested and/or na?ve suggestions such as “replace all spaces with &nbsp;.” After all, I'm positive a solution is technically possible — it's an interesting problem, don't you think?)

(请不要只回答我应该使用<pre>标签或 CSSwhite-space规则,因为我的要求使这些选项站不住脚。也请不要发布未经测试和/或幼稚的建议,例如“用 替换所有空格&nbsp;”。毕竟,我很肯定一个解决方案在技术上是可行的——这是一个有趣的问题,你不觉得吗?)

回答by Arnaud Le Blanc

The solution to do that while still allowing the browser to wrap long linesis to replace each sequence of two spaces with a space and a non break space.

在仍然允许浏览器换行的同时,解决方案是将两个空格的每个序列替换为一个空格和一个非中断空格。

The browser will correctly render all spaces (normal and non break ones), while still wrapping long lines (due to normal spaces).

浏览器将正确呈现所有空格(正常和非中断),同时仍然包装长行(由于正常空格)。

Javascript:

Javascript:

text = html_escape(text); // dummy function
text = text.replace(/\t/g, '    ')
           .replace(/  /g, '&nbsp; ')
           .replace(/  /g, ' &nbsp;') // second pass
                                      // handles odd number of spaces, where we 
                                      // end up with "&nbsp;" + " " + " "
           .replace(/\r\n|\n|\r/g, '<br />');

回答by gilly3

Use a zero-width space(&#8203;) to preserve whitespace and allow the text to wrap. The basic idea is to pair each space or sequence of spaces with a zero-width space. Then replace each space with a non-breaking space. You'll also want to encode html and add line breaks.

使用零宽度空格( &#8203;) 来保留空格并允许文本换行。基本思想是将每个空间或空间序列与一个零宽度空间配对。然后用不间断空格替换每个空格。您还需要对 html 进行编码并添加换行符。

If you don't care about unicode characters, it's trivial. You can just use string.replace():

如果你不关心 unicode 字符,那是微不足道的。你可以只使用string.replace()

function textToHTML(text)
{
    return ((text || "") + "")  // make sure it is a string;
        .replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/\t/g, "    ")
        .replace(/ /g, "&#8203;&nbsp;&#8203;")
        .replace(/\r\n|\r|\n/g, "<br />");
}

If it's ok for the white space to wrap, pair each space with a zero-width space as above. Otherwise, to keep white space together, pair each sequenceof spaces with a zero-width space:

如果空格可以换行,请将每个空格与上述零宽度空格配对。否则,为了将空格保持在一起,请将每个空格序列与零宽度空格配对:

    .replace(/ /g, "&nbsp;")
    .replace(/((&nbsp;)+)/g, "&#8203;&#8203;")

To encode unicode characters, it's a little more complex. You need to iterate the string:

要编码 unicode 字符,它有点复杂。您需要迭代字符串:

var charEncodings = {
    "\t": "&nbsp;&nbsp;&nbsp;&nbsp;",
    " ": "&nbsp;",
    "&": "&amp;",
    "<": "&lt;",
    ">": "&gt;",
    "\n": "<br />",
    "\r": "<br />"
};
var space = /[\t ]/;
var noWidthSpace = "&#8203;";
function textToHTML(text)
{
    text = (text || "") + "";  // make sure it is a string;
    text = text.replace(/\r\n/g, "\n");  // avoid adding two <br /> tags
    var html = "";
    var lastChar = "";
    for (var i in text)
    {
        var char = text[i];
        var charCode = text.charCodeAt(i);
        if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || ""))
        {
            html += noWidthSpace;
        }
        html += char in charEncodings ? charEncodings[char] :
        charCode > 127 ? "&#" + charCode + ";" : char;
        lastChar = char;
    }
    return html;
}  

Now, just a comment. Without using monospace fonts, you'll lose some formatting. Consider how these lines of text with a monospace font form columns:

现在,只是一个评论。如果不使用等宽字体,您将丢失一些格式。考虑这些带有等宽字体的文本行如何形成列:

ten       seven spaces
eleven    four spaces

Without the monospaced font, you will lose the columns:

如果没有等宽字体,您将丢失列:

 ten       seven spaces
 eleven    four spaces

 十个七格
 十一四个格

It seems that the algorithm to fix that would be very complex.

解决这个问题的算法似乎非常复杂。

回答by martineau

While this doesn't quite meet all your requirements — for one thing it doesn't handle tabs, I've used the following gem, which adds a wordWrap()method to Javascript Strings, on a couple of occasions to do something similar to what you're describing — so it might be a good starting point to come up with something that also does the additional things you want.

虽然这并不能完全满足您的所有要求 - 一方面它不处理选项卡,我使用了以下 gem,它wordWrap()向 Javascript Strings添加了一个方法,有几次做一些与您类似的事情重新描述——所以提出一些可以做你想要的额外事情的东西可能是一个很好的起点。

//+ Jonas Raoni Soares Silva
//@ http://jsfromhell.com/string/wordwrap [rev. #2]

// String.wordWrap(maxLength: Integer,
//                 [breakWith: String = "\n"],
//                 [cutType: Integer = 0]): String
//
//   Returns an string with the extra characters/words "broken".
//
//     maxLength  maximum amount of characters per line
//     breakWith  string that will be added whenever one is needed to
//                break the line
//     cutType    0 = words longer than "maxLength" will not be broken
//                1 = words will be broken when needed
//                2 = any word that trespasses the limit will be broken

String.prototype.wordWrap = function(m, b, c){
    var i, j, l, s, r;
    if(m < 1)
        return this;
    for(i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
        for(s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s = s.slice(j)).length ? b : ""))
            j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m : j.input.length - j[0].length
            || c == 1 && m || j.input.length + (j = s.slice(m).match(/^\S*/)).input.length;
    return r.join("\n");
};

I'd also like to comment that it seems to me as though, in general, you'd want to use a monospaced font if tabs are involved because the width of words would vary with the proportional font used (making the results of using of tab stops very font dependent).

我还想评论一下,在我看来,一般来说,如果涉及制表符,您会想要使用等宽字体,因为单词的宽度会随着使用的比例字体而变化(使用制表位非常依赖字体)。

Update: Here's a slightly more readable version courtesy of an online javascript beautifier:

更新:这是一个由在线javascript 美化器提供的可读性稍强的版本:

String.prototype.wordWrap = function(m, b, c) {
    var i, j, l, s, r;
    if (m < 1)
        return this;
    for (i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
        for (s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s =
                s.slice(j)).length ? b : ""))
            j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m :
            j.input.length - j[0].length || c == 1 && m || j.input.length +
            (j = s.slice(m).match(/^\S*/)).input.length;
    return r.join("\n");
};

回答by Abdennour TOUMI

Is is very simple if you use jQuery library in your project.

如果您在项目中使用 jQuery 库,则非常简单。

Just one line ,Add asHTmlextenstion to String Class and :

只需一行,将asHTml扩展添加到字符串类并:

var plain='&lt;a&gt; i am text plain &lt;/a&gt;'
plain.asHtml();
/* '<a> i am text plain </a>' */

DEMO :http://jsfiddle.net/abdennour/B6vGG/3/

演示:http: //jsfiddle.net/abdennour/B6vGG/3/

Note : You will not have to access to DoM . Just use builder design pattern of jQuery $('<tagName />')

注意:您不必访问 DoM。只需使用 jQuery 的构建器设计模式 $('<tagName />')