Javascript Javascript中带有HTML标签的子字符串文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6003271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 19:50:09  来源:igfitidea点击:

Substring text with HTML tags in Javascript

javascripthtmltagssubstring

提问by honzahommer

Do you have solution to substring text with HTML tags in Javascript?

你有在 Javascript 中使用 HTML 标签的子字符串文本的解决方案吗?

For example:

例如:

var str = 'Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, consectetur adipiscing elit.'

html_substr(str, 20)
// return Lorem ipsum <a href="#">dolor <strong>si</strong></a>

html_substr(str, 30)
// return Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, co

采纳答案by Dan Manastireanu

Taking into consideration that parsing html with regex is a bad idea, here is a solution that does just that :)

考虑到用正则表达式解析 html 是一个坏主意,这里有一个解决方案:)

EDIT: Just to be clear: This is not a valid solution, it was meant as an exercise that made very lenient assumptions about the input string, and as such should be taken with a grain of salt. Read the link above and see why parsing html with regex can never be done.

编辑:要明确一点:这不是一个有效的解决方案,它是作为一个练习,对输入字符串做出非常宽松的假设,因此应该持保留态度。阅读上面的链接,看看为什么永远无法使用正则表达式解析 html。

function htmlSubstring(s, n) {
    var m, r = /<([^>\s]*)[^>]*>/g,
        stack = [],
        lasti = 0,
        result = '';

    //for each tag, while we don't have enough characters
    while ((m = r.exec(s)) && n) {
        //get the text substring between the last tag and this one
        var temp = s.substring(lasti, m.index).substr(0, n);
        //append to the result and count the number of characters added
        result += temp;
        n -= temp.length;
        lasti = r.lastIndex;

        if (n) {
            result += m[0];
            if (m[1].indexOf('/') === 0) {
                //if this is a closing tag, than pop the stack (does not account for bad html)
                stack.pop();
            } else if (m[1].lastIndexOf('/') !== m[1].length - 1) {
                //if this is not a self closing tag than push it in the stack
                stack.push(m[1]);
            }
        }
    }

    //add the remainder of the string, if needed (there are no more tags in here)
    result += s.substr(lasti, n);

    //fix the unclosed tags
    while (stack.length) {
        result += '</' + stack.pop() + '>';
    }

    return result;

}

Example:http://jsfiddle.net/danmana/5mNNU/

示例:http : //jsfiddle.net/danmana/5mNNU/

Note: patrick dw's solutionmay be safer regarding bad html, but I'm not sure how well it handles white spaces.

注意:patrick dw 的解决方案对于坏 html 可能更安全,但我不确定它处理空格的效果如何。

回答by mishanon

it is solution for single tags

它是单个标签的解决方案

function subStrWithoutBreakingTags(str, start, length) {
    var countTags = 0;
    var returnString = "";
    var writeLetters = 0;
    while (!((writeLetters >= length) && (countTags == 0))) {
        var letter = str.charAt(start + writeLetters);
        if (letter == "<") {
            countTags++;
        }
        if (letter == ">") {
            countTags--;
        }
        returnString += letter;
        writeLetters++;
    }
    return returnString;
}

回答by user113716

Usage:

用法:

var str = 'Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, consectetur adipiscing elit.';

var res1 = html_substr( str, 20 );
var res2 = html_substr( str, 30 );

alert( res1 ); // Lorem ipsum <a href="#">dolor <strong>si</strong></a>
alert( res2 ); // Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, co

Example:http://jsfiddle.net/2ULbK/4/

示例:http : //jsfiddle.net/2ULbK/4/



Function:

功能:

function html_substr( str, count ) {

    var div = document.createElement('div');
    div.innerHTML = str;

    walk( div, track );

    function track( el ) {
        if( count > 0 ) {
            var len = el.data.length;
            count -= len;
            if( count <= 0 ) {
                el.data = el.substringData( 0, el.data.length + count );
            }
        } else {
            el.data = '';
        }
    }

    function walk( el, fn ) {
        var node = el.firstChild;
        do {
            if( node.nodeType === 3 ) {
                fn(node);
                    //          Added this >>------------------------------------<<
            } else if( node.nodeType === 1 && node.childNodes && node.childNodes[0] ) {
                walk( node, fn );
            }
        } while( node = node.nextSibling );
    }
    return div.innerHTML;
}

回答by Mubeen Khan

let str = 'Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, consectetur adipiscing elit.'
let plainText = htmlString.replace(/<[^>]+>/g, '');

Extract plain text with above given regular expression then use JS String based ".substr()" function for desired results

使用上面给定的正则表达式提取纯文本,然后使用基于 JS 字符串的“.substr()”函数来获得所需的结果

回答by Shaz

Use something similar to = str.replace(/<[^>]*>?/gi, '').substr(0, 20);
I've created an example at: http://fiddle.jshell.net/xpW9j/1/

使用类似于= str.replace(/<[^>]*>?/gi, '').substr(0, 20);
我在以下位置创建的示例的内容:http: //fiddle.jshell.net/xpW9j/1/

回答by herostwist

Javascript has a sub-string method. It makes no difference if the string contains html.

Javascript 有一个子字符串方法。如果字符串包含 html,则没有区别。

see http://www.w3schools.com/jsref/jsref_substr.asp

http://www.w3schools.com/jsref/jsref_substr.asp