Javascript 通过特定标签将一串 HTML 拆分成一个数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34491459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 16:25:21  来源:igfitidea点击:

Split a string of HTML into an array by particular tags

javascriptregex

提问by Don P

Given this HTML as a string "html", how can I split it into an array where each header <hmarks the start of an element?

将此 HTML 作为字符串“html”,如何将其拆分为每个标头<h标记元素开始的数组?

Begin with this:

从这个开始:

<h1>A</h1>
<h2>B</h2>
<p>Foobar</p>
<h3>C</h3>

Result:

结果

["<h1>A</h1>", "<h2>B</h2><p>Foobar</p>", "<h3>C</h3>"]

What I've tried:

我试过的:

I wanted to use Array.split()with a regex, but the result splits each <hinto its own element. I need to figure out how to capture from the start of one <huntil the next <h. Then include the first one but exclude the second one.

我想Array.split()与正则表达式一起使用,但结果将每个拆分<h为自己的元素。我需要弄清楚如何从一个开始<h到下一个捕获<h。然后包括第一个但排除第二个。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
var foo = html.split(/(<h)/);

Edit: Regex is not a requirement in anyway, it's just the only solution that I thought would work for generally splitting HTML strings in this way.

编辑:无论如何,正则表达式不是必需的,它只是我认为通常以这种方式拆分 HTML 字符串的唯一解决方案。

回答by andlrc

In your example you can use:

在您的示例中,您可以使用:

/
  <h   // Match literal <h
  (.)  // Match any character and save in a group
  >    // Match literal <
  .*?  // Match any character zero or more times, non greedy
  <\/h // Match literal </h
     // Match what previous grouped in (.)
  >    // Match literal >
/g
var str = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>'
str.match(/<h(.)>.*?<\/h>/g); // ["<h1>A</h1>", "<h2>B</h2>", "<h3>C</h3>"]

But please don't parse HTML with regexp, read RegEx match open tags except XHTML self-contained tags

但是请不要用正则表达式解析 HTML,读取正则表达式匹配除 XHTML 自包含标签之外的开放标签

回答by Tomalak

From the comments to the question, this seems to be the task:

从评论到问题,这似乎是任务:

I'm taking dynamic markdown that I'm scraping from GitHub. Then I want to render it to HTML, but wrap every title element in a ReactJS <WayPoint>component.

我正在使用从 GitHub 上抓取的动态降价。然后我想将它呈现为 HTML,但将每个标题元素包装在一个 ReactJS<WayPoint>组件中。

The following is a completely library-agnostic, DOM-API based solution.

以下是一个完全与库无关、基于 DOM-API 的解决方案。

function waypointify(html) {
    var div = document.createElement("div"), nodes;

    // parse HTML and convert into an array (instead of NodeList)
    div.innerHTML = html;
    nodes = [].slice.call(div.childNodes);

    // add <waypoint> elements and distribute nodes by headings
    div.innerHTML = "";
    nodes.forEach(function (node) {
        if (!div.lastChild || /^h[1-6]$/i.test(node.nodeName)) {
            div.appendChild( document.createElement("waypoint") );
        }
        div.lastChild.appendChild(node);
    });

    return div.innerHTML;
}

Doing the same in a modern library with less lines of code is absolutely possible, see it as a challenge.

在现代库中用更少的代码做同样的事情是绝对可能的,把它看作是一个挑战。

This is what it produces with your sample input:

这是它使用您的示例输入生成的内容:

<waypoint><h1>A</h1></waypoint>
<waypoint><h2>B</h2><p>Foobar</p></waypoint>
<waypoint><h3>C</h3></waypoint>

回答by Donnie D'Amato

I'm sure someone could reduce the for loop to put the angle brackets back in but this is how I'd do it.

我敢肯定有人可以减少 for 循环以将尖括号放回原处,但这就是我的做法。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';

//split on ><
var arr = html.split(/></g);

//split removes the >< so we need to determine where to put them back in.
for(var i = 0; i < arr.length; i++){
  if(arr[i].substring(0, 1) != '<'){
    arr[i] = '<' + arr[i];
  }

  if(arr[i].slice(-1) != '>'){
    arr[i] = arr[i] + '>';
  }
}

Additionally, we could actually remove the first and last bracket, do the split and then replace the angle brackets to the whole thing.

此外,我们实际上可以删除第一个和最后一个括号,进行拆分,然后将尖括号替换为整个内容。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';

//remove first and last characters
html = html.substring(1, html.length-1);

//do the split on ><
var arr = html.split(/></g);

//add the brackets back in
for(var i = 0; i < arr.length; i++){
    arr[i] = '<' + arr[i] + '>';
}

Oh, of course this will fail with elements that have no content.

哦,当然这会因没有内容的元素而失败。

回答by HalleyRios

Hi I used this function to convert html String Dom in array

嗨,我用这个函数来转换数组中的 html String Dom

  static getArrayTagsHtmlString(str){
    let htmlSplit = str.split(">")
    let arrayElements = []
    let nodeElement =""
    htmlSplit.forEach((element)=>{  
      if (element.includes("<")) {
        nodeElement = element+">"   
       }else{
         nodeElement = element
        }
        arrayElements.push(nodeElement)
    })
    return arrayElements
  }

Happy code

快乐码