Javascript Regex 替换不在 html 属性中的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5904914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 18:45:22  来源:igfitidea点击:

Javascript Regex to replace text NOT in html attributes

javascriptregex

提问by m14t

I'd like a Javascript Regex to wrap a given list of of words in a given start (<span>) and end tag (i.e. </span>), but only if the word is actually "visible text" on the page, and not inside of an html attribute (such as a link's title tag, or inside of a <script></script>block.

我想要一个 Javascript 正则表达式在给定的开始 ( <span>) 和结束标记 ( ie </span>) 中包装给定的单词列表,但前提是该单词实际上是页面上的“可见文本”,而不是在 html 属性内(例如链接的标题标签,或<script></script>块内。

I've created a JS Fiddle with the basics setup: http://jsfiddle.net/4YCR6/1/

我已经创建了一个基本设置的 JS Fiddle:http: //jsfiddle.net/4YCR6/1/

回答by T.J. Crowder

HTML is too complex to reliably parse with a regular expression.

HTML 太复杂,无法使用正则表达式进行可靠的解析。

If you're looking to do this client-side, you can create a document fragment and/or disconnected DOM node (neither of which is displayed anywhere) and initialize it with your HTML string, then walk through the resulting DOM tree and process the text nodes. (Or use a library to help you do that, although it's actually quite simple.)

如果您希望在客户端执行此操作,您可以创建一个文档片段和/或断开连接的 DOM 节点(它们都不会显示在任何地方)并使用您的 HTML 字符串对其进行初始化,然后遍历生成的 DOM 树并处理文本节点。(或者使用一个库来帮助你做到这一点,虽然它实际上很简单。)

Here's a DOM walking example. This example is slightlysimpler than your problem because it just updates the text, it doesn't add new elements to the structure (wrapping parts of the text in spans involves updating the structure), but it should get you going. Notes on what you'll need to change at the end.

这是一个 DOM 行走示例。这个例子比你的问题稍微简单,因为它只是更新文本,它不会向结构添加新元素(在spans 中包装部分文本涉及更新结构),但它应该让你开始。关于最后需要更改的内容的注释。

var html =
    "<p>This is a test.</p>" +
    "<form><input type='text' value='test value'></form>" +
    "<p class='testing test'>Testing here too</p>";
var frag = document.createDocumentFragment();
var body = document.createElement('body');
var node, next;

// Turn the HTML string into a DOM tree
body.innerHTML = html;

// Walk the dom looking for the given text in text nodes
walk(body);

// Insert the result into the current document via a fragment
node = body.firstChild;
while (node) {
  next = node.nextSibling;
  frag.appendChild(node);
  node = next;
}
document.body.appendChild(frag);

// Our walker function
function walk(node) {
  var child, next;

  switch (node.nodeType) {
    case 1:  // Element
    case 9:  // Document
    case 11: // Document fragment
      child = node.firstChild;
      while (child) {
        next = child.nextSibling;
        walk(child);
        child = next;
      }
      break;
    case 3: // Text node
      handleText(node);
      break;
  }
}

function handleText(textNode) {
  textNode.nodeValue = textNode.nodeValue.replace(/test/gi, "TEST");
}

Live example

活生生的例子

The changes you'll need to make will be in handleText. Specifically, rather than updating nodeValue, you'll need to:

您需要进行的更改将在handleText. 具体来说,nodeValue您需要:而不是更新:

  • Find the index of the beginning of each word within the nodeValuestring.
  • Use Node#splitTextto split the text node into up to three text nodes (the part before your matching text, the part that isyour matching text, and the part following your matching text).
  • Use document.createElementto create the new span(this is literally just span = document.createElement('span')).
  • Use Node#insertBeforeto insert the new spanin front of the third text node (the one containing the text following your matched text); it's okay if you didn't need to create a third node because your matched text was at the end of the text node, just pass in nullas the refChild.
  • Use Node#appendChildto move the second text node (the one with the matching text) into the span. (No need to remove it from its parent first; appendChilddoes that for you.)
  • 查找nodeValue字符串中每个单词开头的索引。
  • 使用Node#splitText该文本节点分成最多三个文本节点(您匹配的文本之前的部分,该部分你匹配的文本,并按照您的匹配文本的部分)。
  • 使用document.createElement以创建新的span(这是真的只是span = document.createElement('span'))。
  • 用于在第三个文本节点(包含匹配文本之后的文本的节点)前面Node#insertBefore插入新的span;如果您不需要创建第三个节点也没关系,因为匹配的文本位于文本节点的末尾,只需null作为refChild.
  • 使用Node#appendChild所述第二文本节点(具有匹配的文本)移入span。(无需先将其从其父项中删除;appendChild为您执行此操作。)

回答by Tim Down

T.J. Crowder's answeris correct. I've gone a little further code-wise: here's a fully-formed example that works in all major browsers. I've posted variations of this code on Stack Overflow before (hereand here, for example), and made it nice and generic so I (or anyone else) don't have to change it much to reuse it.

TJ Crowder 的回答是正确的。我在代码方面更进一步:这是一个适用于所有主要浏览器的完整示例。我之前已经在 Stack Overflow 上发布了此代码的变体(例如,这里这里),并使其变得美观和通用,因此我(或其他任何人)不必对其进行太多更改即可重用它。

jsFiddle example: http://jsfiddle.net/7Vf5J/38/

jsFiddle 示例:http: //jsfiddle.net/7Vf5J/38/

Code:

代码:

// Reusable generic function
function surroundInElement(el, regex, surrounderCreateFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                surroundInElement(child, regex, surrounderCreateFunc);
            } else if (child.nodeType == 3) {
                surroundMatchingText(child, regex, surrounderCreateFunc);
            }
            child = child.previousSibling;
        }
    }
}

// Reusable generic function
function surroundMatchingText(textNode, regex, surrounderCreateFunc) {
    var parent = textNode.parentNode;
    var result, surroundingNode, matchedTextNode, matchLength, matchedText;
    while ( textNode && (result = regex.exec(textNode.data)) ) {
        matchedTextNode = textNode.splitText(result.index);
        matchedText = result[0];
        matchLength = matchedText.length;
        textNode = (matchedTextNode.length > matchLength) ?
            matchedTextNode.splitText(matchLength) : null;
        // Ensure searching starts at the beginning of the text node
        regex.lastIndex = 0;
        surroundingNode = surrounderCreateFunc(matchedTextNode.cloneNode(true));
        parent.insertBefore(surroundingNode, matchedTextNode);
        parent.removeChild(matchedTextNode);
    }
}

// This function does the surrounding for every matched piece of text
// and can be customized  to do what you like
function createSpan(matchedTextNode) {
    var el = document.createElement("span");
    el.style.color = "red";
    el.appendChild(matchedTextNode);
    return el;
}

// The main function
function wrapWords(container, words) {
    // Replace the words one at a time to ensure "test2" gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        surroundInElement(container, new RegExp(words[i]), createSpan);
    }
}

wrapWords(document.getElementById("container"), ["test2", "test"]);