jQuery 如何使用javascript仅删除字符串中的html标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17164335/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove only html tags in a string using javascript
提问by cp100
I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.
我想使用 javascript 从给定的字符串中删除 html 标签。我研究了当前的方法,但它们出现了一些未解决的问题。
Current solutions
当前解决方案
(1) Using javascript, creating virtual div tag and get the text
(1) 使用javascript,创建虚拟div标签并获取文本
function remove_tags(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent||tmp.innerText;
}
(2) Using regex
(2) 使用正则表达式
function remove_tags(html)
{
return html.replace(/<(?:.|\n)*?>/gm, '');
}
(3) Using JQuery
(3) 使用JQuery
function remove_tags(html)
{
return jQuery(html).text();
}
These three solutions are working correctly, but if the string is like this
这三个解决方案都可以正常工作,但是如果字符串是这样的
<div> hello <hi all !> </div>
stripped string is like
hello
. But I need only remove html tags only. like hello <hi all !>
剥离的字符串就像
hello
. 但我只需要删除 html 标签。喜欢hello <hi all !>
Edited: Background is, I want to remove all the user input html tags for a particular text area. But I want to allow users to enter <hi all>
kind of text. In current approach, its remove any content which include within <>.
编辑:背景是,我想删除特定文本区域的所有用户输入 html 标签。但我想允许用户输入<hi all>
某种文本。在当前的方法中,它会删除 <> 中包含的任何内容。
回答by Andy E
Using a regex might not be a problem if you consider a different approach. For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names:
如果您考虑不同的方法,使用正则表达式可能不是问题。例如,查找所有标签,然后检查标签名称是否与已定义的有效 HTML 标签名称列表匹配:
var protos = document.body.constructor === window.HTMLBodyElement;
validHTMLTags =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;
function sanitize(txt) {
var // This regex normalises anything between quotes
normaliseQuotes = /=(["'])(?=[^]*[<>])[^]*/g,
normaliseFn = function ( text = html.replace(/<\/?(span|div|img|p...)\b[^<>]*>/g, "")
, q, sym) {
return var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
.replace(/</g, '<').replace(/>/g, '>');
},
replaceInvalid = function (function removeTags(){
var txt = document.getElementById('myString').value;
var rex = /(<([^>]+)>)/ig;
alert(txt.replace(rex , ""));
}
, tag, off, txt) {
var
// Is it a valid tag?
invalidTag = protos &&
document.createElement(tag) instanceof HTMLUnknownElement
|| !validHTMLTags.test(tag),
// Is the tag complete?
isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;
return invalidTag || !isComplete ? '<' + tag : <form>
<textarea class="box"></textarea>
<button>Submit</button>
</form>
<script>
$(".box").focusout( function(e) {
var reg =/<(.|\n)*?>/g;
if (reg.test($('.box').val()) == true) {
alert('HTML Tag are not allowed');
}
e.preventDefault();
});
</script>
;
};
txt = txt.replace(normaliseQuotes, normaliseFn)
.replace(/<(\w+)/g, replaceInvalid);
var tmp = document.createElement("DIV");
tmp.innerHTML = txt;
return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
}
Working Demo: http://jsfiddle.net/m9vZg/3/
工作演示:http: //jsfiddle.net/m9vZg/3/
This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag. It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.
这是有效的,因为浏览器将 '>' 解析为文本,如果它不是匹配的 '<' 开始标记的一部分。它不会遇到与尝试使用正则表达式解析 HTML 标签相同的问题,因为您只需要查找开始分隔符和标签名称,其他一切都无关紧要。
It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag. If the element is an instance of HTMLUnknownElement
, we know that it's not a valid HTML tag. The validHTMLTags
regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.
这也是面向未来的:WebIDL 规范告诉供应商如何实现 HTML 元素的原型,因此我们尝试从当前匹配的标签创建一个 HTML 元素。如果元素是 的一个实例HTMLUnknownElement
,我们就知道它不是一个有效的 HTML 标签。在validHTMLTags
正则表达式定义为旧的浏览器,如IE 6和7,不实现这些原型HTML标签的列表。
回答by georg
If you want to keep invalid markup untouched, regular expressions is your best bet. Something like this might work:
如果您想保持无效标记不变,正则表达式是您最好的选择。像这样的事情可能会奏效:
<script type="text/javascript">
function removeHTMLTags() {
var str="<html><p>I want to remove HTML tags</p></html>";
alert(str.replace(/<[^>]+>/g, ''));
}</script>
Expand (span|div|img|p...)
into a list of all tags(or only those you want to remove). NB: the list must be sorted by length, longer tags first!
展开所有标签(span|div|img|p...)
的列表(或仅包含您想要删除的标签)。注意:列表必须按长度排序,长标签优先!
This may provide incorrect results in some edge cases (like attributes with <>
characters), but the only real alternative would be to program a complete html parser by yourself. Not that it would be extremely complicated, but might be an overkill here. Let us know.
这在某些边缘情况下可能会提供不正确的结果(例如带有<>
字符的属性),但唯一真正的替代方法是自己编写一个完整的 html 解析器。并不是说它会非常复杂,但在这里可能有点矫枉过正。让我们知道。
回答by Prashobh
回答by Human Being
Here is my solution ,
这是我的解决方案,
##代码##回答by Purvik Dhorajiya
I use regular expression for preventing HTML tags in my textarea
我使用正则表达式来防止我的 textarea 中的 HTML 标签