Java 遍历 NodeList 时删除 DOM 节点

Question

提问by skiphoppy

I'm about to delete certain elements in an XML document, using code like the following:

我将要删除 XML 文档中的某些元素，使用如下代码：

NodeList nodes = ...;
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);
  }
}

Will this interfere with proper traversal of the NodeList? Any other caveats with this approach? If this is totally wrong, what's the proper way to do it?

这会干扰 NodeList 的正确遍历吗？这种方法还有其他注意事项吗？如果这是完全错误的，那么正确的做法是什么？

Answer 1

回答by Dirk

According to the DOM specificaion, the result of a call to node.getElementsByTagName("...")is supposed to be "live", that is, any modification made to the DOM tree will be reflected in the NodeListobject. Well, for conforming implementations, that is...

根据 DOM 规范，调用node.getElementsByTagName("...")的结果应该是“实时”的，也就是说，对 DOM 树所做的任何修改都将反映在NodeList对象中。好吧，对于符合要求的实现，那就是......

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects.

DOM 中的 NodeList 和 NamedNodeMap 对象是活动的；也就是说，对底层文档结构的更改反映在所有相关的 NodeList 和 NamedNodeMap 对象中。

(DOM Specification)

( DOM 规范)

So, when you modify the tree structure, a conforming implementation will change the NodeListto reflect these changes.

因此，当您修改树结构时，符合要求的实现将更改NodeList以反映这些更改。

Answer 2

回答by skiphoppy

So, given that removing nodes while traversing the NodeList will cause the NodeList to be updated to reflect the new reality, I assume that my indices will become invalid and this will not work.

因此，考虑到在遍历 NodeList 时删除节点将导致 NodeList 更新以反映新的现实，我假设我的索引将变得无效并且这将不起作用。

So, it seems the solution is to keep track of the elements to delete during the traversal, and delete them all afterward, once the NodeList is no longer used.

因此，似乎解决方案是在遍历过程中跟踪要删除的元素，然后在不再使用 NodeList 时将它们全部删除。

NodeList nodes = ...;
Set<Element> targetElements = new HashSet<Element>();
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    targetElements.add(e);
  }
}
for (Element e: targetElements) {
  e.getParentNode().removeChild(e);
}

Answer 3

回答by kdgregory

The Practical XMLlibrary now contains NodeListIterator, which wraps a NodeList and provides full Iterator support (this seemed like a better choice than posting the code that we discussed in the comments). If you don't want to use the full library, feel free to copy that one class: http://practicalxml.svn.sourceforge.net/viewvc/practicalxml/trunk/src/main/java/net/sf/practicalxml/util/NodeListIterator.java?revision=125&view=markup

该实用XML库现在包含NodeListIterator，它包装一个节点列表，并提供了完整的支持迭代器（这个好像比发布，我们在评论中讨论的代码更好的选择）。如果您不想使用完整的库，请随意复制该类：http: //practicalxml.svn.sourceforge.net/viewvc/practicalxml/trunk/src/main/java/net/sf/practicalxml/ util/NodeListIterator.java?revision=125&view=markup

Answer 4

回答by Algok

Removing nodes while looping will cause undesirable results, e.g. either missed or duplicated results. This isn't even an issue with synchronization and thread safety, but if the nodes are modified by the loop itself. Most of Java's Iterator's will throw a ConcurrentModificationException in such a case, something that NodeList does not account for.

在循环时移除节点会导致不希望的结果，例如遗漏或重复的结果。这甚至不是同步和线程安全的问题，但如果节点被循环本身修改。在这种情况下，大多数 Java 的 Iterator 会抛出 ConcurrentModificationException，这是 NodeList 没有考虑到的。

It can be fixed by decrementing NodeList size and by decrementing iteraror pointer at the same time. This solution can be used only if we proceed one remove action for each loop iteration.

它可以通过减少 NodeList 大小和同时减少 iteraror 指针来修复。仅当我们为每个循环迭代执行一个删除操作时，才能使用此解决方案。

NodeList nodes = ...;
for (int i = nodes.getLength() - 1; i >= 0; i--) {
  Element e = (Element)nodes.item(i);
   if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);
  }
}

Answer 5

回答by Brett Caswell

According to the DOM Level 3 Core specification,

根据 DOM Level 3 Core 规范，

the result of a call to method node.getElementsByTagName("...")will be a reference to a "live" NodeListtype.

调用方法的结果node.getElementsByTagName("...")将是对“ live”NodeList类型的引用。

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects. ... changes are automatically reflected in the NodeList, without further action on the user's part.
1.1.1 The DOM Structure Model, para. 2

DOM 中的 NodeList 和 NamedNodeMap 对象是活动的；也就是说，对底层文档结构的更改反映在所有相关的 NodeList 和 NamedNodeMap 对象中。... 更改会自动反映在 NodeList 中，用户无需采取进一步行动。
1.1.1 DOM 结构模型，段落。2

JavaSE 7 conforms to the DOM Level 3 specification: it implements the liveNodeListinterface and defines it as a type; it defines and exposes getElementsByTagNamemethod on Interface Element, which returns the liveNodeListtype.

JavaSE 7 符合 DOM Level 3 规范：它实现了liveNodeList接口并将其定义为一个类型；它getElementsByTagName在Interface Element上定义并公开方法，该方法返回实时NodeList类型。

References

参考

W3C - Document Object Model (DOM) Level 3 Core Specification - getElementsByTagName

W3C - 文档对象模型 (DOM) 级别 3 核心规范 - getElementsByTagName

JavaSE 7 - Interface Element

JavaSE 7 - 接口元素

JavaSE 7 - NodeList Type

JavaSE 7 - 节点列表类型

Answer 6

回答by Simon

Old post, but nothing marked as answer. My approach is to iterate from the end, ie

旧帖子，但没有标记为答案。我的方法是从最后迭代，即

for (int i = nodes.getLength() - 1; i >= 0; i--) {
    // do processing, and then
    e.getParentNode().removeChild(e);
}

With this, you needn't worry about the NodeList getting shorter while you delete.

有了这个，您不必担心删除时 NodeList 会变短。

Answer 7

回答by Mikhaylo Plotnikov

As already mentioned, removing an element reduces the size of the list but the counter is still increasing (i++):

如前所述，删除元素会减少列表的大小，但计数器仍在增加（i++）：

[element 1] <- Delete 
[element 2]
[element 3]
[element 4]
[element 5]

[element 2]  
[element 3] <- Delete
[element 4]
[element 5]
--

[element 2]  
[element 4] 
[element 5] <- Delete
--
--

[element 2]  
[element 4] 
--
--
--

The simplest solution, in my opinion, would be to remove i++ section in the loop and do it as needed when the iterated element was not deleted.

在我看来，最简单的解决方案是删除循环中的 i++ 部分，并在未删除迭代元素时根据需要执行此操作。

NodeList nodes = ...;
for (int i = 0; i < nodes.getLength();) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);        
  } else {
    i++;
  }
}

Pointer stays on the same place when the iterated element was deleted. The list shifts by itself.

当迭代元素被删除时，指针停留在同一个位置。列表自行移动。

[element 1] <- Delete 
[element 2]
[element 3]
[element 4]
[element 5]

[element 2] <- Leave
[element 3]
[element 4]
[element 5]
--

[element 2] 
[element 3] <- Leave
[element 4]
[element 5]
--

[element 2] 
[element 3] 
[element 4] <- Delete
[element 5]
--

[element 2] 
[element 3] 
[element 5] <- Delete
--
--

[element 2] 
[element 3] 
--
--
--

Java 遍历 NodeList 时删除 DOM 节点

提问by skiphoppy

回答by Dirk

回答by skiphoppy

回答by kdgregory

回答by Algok

回答by Brett Caswell

References

参考

回答by Simon

回答by Mikhaylo Plotnikov

相关推荐

最近更新

标签

Java 遍历 NodeList 时删除 DOM 节点

提问by skiphoppy

回答by Dirk

回答by skiphoppy

回答by kdgregory

回答by Algok

回答by Brett Caswell

References

参考

回答by Simon

回答by Mikhaylo Plotnikov

相关推荐

Java 使用 Selenium Webdriver 连接远程数据库并通过 Eclipse 从本地机器运行测试用例

Java 调试时如何处理 ClassNotLoadedException？

Java 如何使用当前日期作为函数中的输入获取月份名称

Java 将 JSON 响应解析为对象

相关推荐

最近更新

标签