JavaScript RegExp 匹配文本忽略 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7339157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JavaScript RegExp match text ignoring HTML
提问by Francisc
Is it possible to match "the dog is really really fat" in "The <strong>dog</strong> is really <em>really</em> fat!
" and add "<span class="highlight">WHAT WAS MATCHED</span>
" around it?
是否可以在“ ”中匹配“狗真的很胖The <strong>dog</strong> is really <em>really</em> fat!
”并<span class="highlight">WHAT WAS MATCHED</span>
在其周围加上“ ”?
I don't mean this specifically, but generally be able to search text ignoring HTML, keeping it in the end result, and just add the span above around it all?
我不是这个意思,但通常能够搜索文本而忽略 HTML,将其保留在最终结果中,然后在上面添加跨度?
EDIT:
Considering the HTML tag overlapping problem, would it be possible to match a phrase and just add the span around each of the matched words? The problem here is that I don't want the word "dog" matched when it's not in the searched context, in this case, "the dog is really really fat."
编辑:
考虑到 HTML 标签重叠问题,是否可以匹配一个短语并在每个匹配的单词周围添加跨度?这里的问题是,当“狗”这个词不在搜索的上下文中时,我不希望它匹配,在这种情况下,“狗真的很胖”。
回答by Briguy37
Update:
更新:
Here is a working fiddle that does what you want. However, you will need to update the htmlTagRegEx
to handle matching on any HTML tag, as this just performs a simple match and will not handle all the cases.
这是一个可以完成您想要的工作的小提琴。但是,您需要更新htmlTagRegEx
以处理任何 HTML 标签上的匹配,因为这只是执行简单的匹配,不会处理所有情况。
http://jsfiddle.net/briguy37/JyL4J/
http://jsfiddle.net/briguy37/JyL4J/
Also, below is the code. Basically, it takes out the html elements one by one, then does a replace in the text to add the highlight span around the matched selection, and then pushes back in the html elements one by one. It's ugly, but it's the easiest way I could think of to get it to work...
另外,下面是代码。基本上就是将html元素一一取出,然后在文本中进行替换以在匹配的选择周围添加高亮跨度,然后将html元素一一推回。这很丑陋,但这是我能想到的让它工作的最简单方法......
function highlightInElement(elementId, text){
var elementHtml = document.getElementById(elementId).innerHTML;
var tags = [];
var tagLocations= [];
var htmlTagRegEx = /<{1}\/{0,1}\w+>{1}/;
//Strip the tags from the elementHtml and keep track of them
var htmlTag;
while(htmlTag = elementHtml.match(htmlTagRegEx)){
tagLocations[tagLocations.length] = elementHtml.search(htmlTagRegEx);
tags[tags.length] = htmlTag;
elementHtml = elementHtml.replace(htmlTag, '');
}
//Search for the text in the stripped html
var textLocation = elementHtml.search(text);
if(textLocation){
//Add the highlight
var highlightHTMLStart = '<span class="highlight">';
var highlightHTMLEnd = '</span>';
elementHtml = elementHtml.replace(text, highlightHTMLStart + text + highlightHTMLEnd);
//plug back in the HTML tags
var textEndLocation = textLocation + text.length;
for(i=tagLocations.length-1; i>=0; i--){
var location = tagLocations[i];
if(location > textEndLocation){
location += highlightHTMLStart.length + highlightHTMLEnd.length;
} else if(location > textLocation){
location += highlightHTMLStart.length;
}
elementHtml = elementHtml.substring(0,location) + tags[i] + elementHtml.substring(location);
}
}
//Update the innerHTML of the element
document.getElementById(elementId).innerHTML = elementHtml;
}
回答by Ivan Nikolchov
Naah... just use the good old RegExp ;)
Naah...只需使用旧的 RegExp ;)
var htmlString = "The <strong>dog</strong> is really <em>really</em> fat!";
var regexp = /<\/?\w+((\s+\w+(\s*=\s*(?:\".*?"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/gi;
var result = '<span class="highlight">' + htmlString.replace(regexp, '') + '</span>';
回答by Eliecer Chicott
A simpler way with JQuery would be.
使用 JQuery 的一种更简单的方法是。
originalHtml = $("#div").html();
newHtml = originalHtml.replace(new RegExp(keyword + "(?![^<>]*>)", "g"), function(e){
return "<span class='highlight'>" + e + "</span>";
});
$("#div").html(newHtml);
This works just fine for me.
这对我来说很好用。
回答by Roy van Arem
Here is a working regex example to exclude matches inside html tags as well as javascripts:
这是一个有效的正则表达式示例,用于排除 html 标签和 javascripts 中的匹配项:
Use this regex in a replace() script.
在 replace() 脚本中使用此正则表达式。
/(a)(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi
回答by bluesman
You can use string replace with this expression </?\w*>
and you'll get your string
你可以使用字符串替换这个表达式</?\w*>
,你会得到你的字符串
回答by Jacob
If you use jQuery, you can use the text
property on the element containing the text you're searching for. Given this markup:
如果您使用 jQuery,则可以text
在包含您要搜索的文本的元素上使用该属性。鉴于此标记:
<p id="the-text">
The <strong>dog</strong> is really <em>really</em> fat!
</p>
This would yield "The dog is really really fat!":
这将产生“这只狗真的很胖!”:
$('#the-text').text();
You could do your regex search on that text instead of trying to do so in the markup.
您可以对该文本进行正则表达式搜索,而不是尝试在标记中进行搜索。
Without jQuery, I'm unsure of an easy way to extract and concatenate the text nodes from all child elements.
如果没有 jQuery,我不确定从所有子元素中提取和连接文本节点的简单方法。