Java XML DOM:id 属性有何特别之处?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3423430/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 00:39:08  来源:igfitidea点击:

Java XML DOM: how are id Attributes special?

javadom

提问by bmargulies

The javadoc for the Documentclass has the following note under getElementById.

Document该类的 javadoc 在 下有以下注释getElementById

Note: Attributes with the name "ID" or "id" are not of type ID unless so defined

注意:除非如此定义,否则名称为“ID”或“id”的属性不是 ID 类型

So, I read an XHTML doc into the DOM (using Xerces 2.9.1).

因此,我将 XHTML 文档读入 DOM(使用 Xerces 2.9.1)。

The doc has a plain old <p id='fribble'>in it.

该文档中有一个普通的旧<p id='fribble'>内容。

I call getElementById("fribble"), and it returns null.

我调用getElementById("fribble"),它返回空值。

I use XPath to get "//*[id='fribble']", and all is well.

我使用 XPath 来获取“///*[id='fribble']”,一切都很好。

So, the question is, what causes the DocumentBuilderto actually mark ID attributes as 'so defined?'

所以,问题是,是什么导致DocumentBuilder实际将 ID 属性标记为“如此定义”?

采纳答案by Tom Tresansky

For the getElementById()call to work, the Documenthas to know the types of its nodes, and the target node must be of the XML ID type for the method to find it. It knows about the types of its elements via an associated schema. If the schema is not set, or does not declare the idattribute to be of the XML ID type, getElementById()will never find it.

为了使getElementById()调用起作用,Document必须知道其节点的类型,并且目标节点必须是 XML ID 类型的方法才能找到它。它通过关联的模式了解其元素的类型。如果未设置架构,或未将id属性声明为 XML ID 类型,getElementById()则永远找不到它。

My guess is that your document doesn't know the pelement's idattribute is of the XML ID type (is it?). You can navigate to the node in the DOM using getChildNodes()and other DOM-traversal functions, and try calling Attr.isId()on the id attribute to tell for sure.

我的猜测是您的文档不知道p元素的id属性是 XML ID 类型(是吗?)。您可以使用getChildNodes()和其他 DOM 遍历函数导航到 DOM 中的节点,并尝试调用Attr.isId()id 属性来确定。

From the getElementByIdjavadoc:

getElementByIdjavadoc:

The DOM implementation is expected to use the attribute Attr.isId to determine if an attribute is of type ID.

Note: Attributes with the name "ID" or "id" are not of type ID unless so defined.

DOM 实现应使用属性 Attr.isId 来确定属性是否属于类型 ID。

注意:除非如此定义,否则名称为“ID”或“id”的属性不是 ID 类型。

If you are using a DocumentBuilderto parse your XML into a DOM, be sure to call setSchema(schema)on the DocumentBuilderFactory before calling newDocumentBuilder(), to ensure that the builder you get from the factory is aware of element types.

如果您使用 aDocumentBuilder将 XML 解析为 DOM,请确保setSchema(schema)在调用 newDocumentBuilder() 之前调用 DocumentBuilderFactory,以确保您从工厂获得的构建器知道元素类型。

回答by Sergii Pozharov

ID attribute isn't an attribute whose name is "ID", it's an attribute which is declared to be an ID attribute by a DTD or a schema. For example, the html 4 DTD describes it:

ID 属性不是名称为“ID”的属性,而是由 DTD 或模式声明为 ID 属性的属性。例如,html 4 DTD 描述它:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

回答by J?rn Horstmann

The corresponding xpath expression would actually be id('fribble'), which should return the same result as getElementById. For this to work, the dtd or schema associated with your document has to declare the attribute as being of type ID.

相应的 xpath 表达式实际上是id('fribble'),它应该返回与getElementById. 为此,与您的文档关联的 dtd 或架构必须将该属性声明为 ID 类型。

If you are in control of the queried xml you could also try renaming the attribute to xml:idas per http://www.w3.org/TR/xml-id/.

如果您控制查询的 xml,您还可以尝试将属性重命名xml:idhttp://www.w3.org/TR/xml-id/

回答by Hoylen

These attributes are special because of their typeand not because of their name.

这些属性之所以特殊,是因为它们的类型而不是它们的名称

IDs in XML

XML 格式的 ID

Although it is easy to think of attributes as name="value"with the value is being a simple string, that is not the full story -- there is also an attribute typeassociated with attributes.

尽管很容易将属性视为name="value"值是一个简单的字符串,但这并不是全部——还有一个与属性相关联的属性类型

This is easy to appreciate when there is an XML Schema involved, since XML Schema supports datatypes for both XML elements and XML attributes. The XML attributes are defined to be of a simple type (e.g. xs:string, xs:integer, xs:dateTime, xs:anyURI). The attributes being discussed here are defined with the xs:IDbuilt-in datatype (see section 3.3.8 of the XML Schema Part 2: Datatypes).

当涉及到 XML Schema 时,这很容易理解,因为 XML Schema 支持 XML 元素和 XML 属性的数据类型。XML 属性定义为简单类型(例如 xs:string、xs:integer、xs:dateTime、xs:anyURI)。此处讨论的属性是使用xs:ID内置数据类型定义的(请参阅XML 模式第 2 部分:数据类型的第 3.3.8 节)。

<xs:element name="foo">
  <xs:complexType>
   ...
   <xs:attribute name="bar" type="xs:ID"/>
   ...
  </xs:complexType>
</xs:element>

Although DTD don't support the rich datatypes in XML Schema, it does support a limited set of attribute types(which is defined in section 3.3.1 of XML 1.0). The attributes being discussed here are defined with an attribute typeof ID.

尽管 DTD 不支持 XML Schema 中的丰富数据类型,但它确实支持一组有限的属性类型(在XML 1.0 的 3.3.1 节中定义)。这里讨论的属性是用属性类型定义的ID

<!ATTLIST foo  bar ID #IMPLIED>

With either the above XML Schema or DTD, the following element will be identified by the ID value of "xyz".

使用上述 XML Schema 或 DTD,以下元素将由“xyz”的 ID 值标识。

<foo bar="xyz"/>

Without knowing the XML Schema or DTD, there is no way to tell what is an ID and what is not:

在不知道 XML Schema 或 DTD 的情况下,无法分辨什么是 ID,什么不是:

  • Attributes with the name of "id" do not necessarily have an attribute typeof ID; and
  • Attributes with names that are not "id" might have an attribute typeof ID!
  • 名称为“id”的属性不一定具有ID的属性类型;和
  • 名称不是“id”的属性可能具有ID的属性类型

To improve this situation, the xml:idwas subsequently invented (see xml:id W3C Recommendation). This is an attribute that always has the same prefix and name, and is intended to be treated as an attribute with attribute typeof ID. However, whether it does will depend on the parser being used is aware of xml:idor not. Since many parsers were initially written before xml:idwas defined, it might not be supported.

为了改善这种情况,xml:id随后发明了xml:id W3C Recommendation。这是一个始终具有相同前缀和名称的属性,旨在将其视为具有ID属性类型属性。但是,它是否确实取决于所使用的解析器是否知道xml:id。由于许多解析器最初是在xml:id定义之前编写的,因此可能不受支持。

IDs in Java

Java 中的 ID

In Java, getElementById()finds elements by looking for attributes of typeID, not for attributes with the nameof "id".

在 Java 中,getElementById()通过查找类型ID 的属性而不是名称为“id”的属性来查找元素。

In the above example, getElementById("xyz")will return that fooelement, even though the name of the attribute on it is not "id" (assuming the DOM knows that barhas an attribute typeof ID).

在上面的示例中,getElementById("xyz")将返回该foo元素,即使其上的属性名称不是“id”(假设 DOM 知道bar具有ID的属性类型)。

So how does the DOM know what attribute typean attribute has? There are three ways:

那么 DOM 是如何知道一个属性有什么属性类型的呢?有以下三种方式:

  1. Provide an XML Schema to the parser (example)
  2. Provide a DTD to the parser
  3. Explicitly indicate to the DOM that it is treated as an attribute type of ID.
  1. 向解析器提供 XML 模式(示例
  2. 向解析器提供 DTD
  3. 向 DOM 显式指示它被视为 ID 的属性类型。

The third option is done using the setIdAttribute()or setIdAttributeNS()or setIdAttributeNode()methods on the org.w3c.dom.Elementclass.

第三个选项是使用上的setIdAttribute()orsetIdAttributeNS()setIdAttributeNode()方法完成的。org.w3c.dom.Element

Document doc;
Element fooElem;

doc = ...; // load XML document instance
fooElem = ...; // locate the element node "foo" in doc

fooElem.setIdAttribute("bar", true); // without this, 'found' would be null

Element found = doc.getElementById("xyz");

This has to be done for each element node that has one of these type of attributes on them. There is no simple built-in method to make all occurrences of attributes with a given name (e.g. "id") be of attribute typeID.

必须对每个具有这些类型属性之一的元素节点执行此操作。没有简单的内置方法可以使所有出现的具有给定名称(例如“id”)的属性都具有属性类型ID。

This third approach is only useful in situations where the code calling the getElementById()is separate from that creating the DOM. If it was the same code, it already has found the element to set the ID attribute so it is unlikely to need to call getElementById().

第三种方法仅在调用 的代码getElementById()与创建 DOM的代码分开的情况下有用。如果是相同的代码,则它已经找到了设置 ID 属性的元素,因此不太可能需要调用getElementById().

Also, be aware that those methods were not in the original DOM specification. The getElementByIdwas introduced in DOM level 2.

另外,请注意这些方法不在原始 DOM 规范中。该getElementById年推出的DOM 2级

IDs in XPath

XPath 中的 ID

The XPath in the original question gave a result because it was only matching the attribute name.

原始问题中的 XPath 给出了结果,因为它只匹配属性name

To match on attribute typeID values, the XPath idfunction needs to be used (it is one of the Node Set Functions from XPath 1.0):

要匹配属性类型ID 值,id需要使用XPath函数(它是XPath 1.0中的节点集函数之一):

id("xyz")

If that had been used, the XPath would have given the same result as getElementById()(i.e. no match found).

如果使用了它,XPath 将给出相同的结果getElementById()(即未找到匹配项)。

IDs in XML continued

XML 中的 ID 续

Two important features of ID should be highlighted.

应强调 ID 的两个重要特征。

Firstly, the values of all attributes of attribute typeID must be unique to the whole XML document. In the following example, if personIdand companyIdboth have attribute typeof ID, it would be an error to add another company with companyIdof id24601, because it will be a duplicate of an existing ID value. Even though the attribute names are different, it is the attribute typethat matters.

首先,属性类型ID的所有属性的值在整个 XML 文档中必须是唯一的。在以下示例中,如果personIdcompanyId都具有ID 的属性类型,则添加另一个具有companyIdid24601 的公司将是错误的,因为它将与现有 ID 值重复。即使属性名称不同,重要的是属性类型

<test1>
 <person personId="id24600">...</person>
 <person personId="id24601">...</person>
 <company companyId="id12345">...</company>
 <company companyId="id12346">...</company>
</test1>

Secondly, the attributes are defined on elementsrather than the entire XML document. So attributes with the same attribute name on different elements might have different attribute typeproperties. In the following example XML document, if only alpha/@barhas an attribute typeof ID (and no other attribute was), getElementById("xyz")will return an element, but getElementById("abc")will not (since beta/@baris not of attribute typeID). Also, it is not an error for the attribute gamma/@barto have the same value as alpha/@bar, that value is not considered in the uniqueness of IDs in the XML document because it is is not of attribute typeID.

其次,属性是在元素上定义的,而不是在整个 XML 文档上定义的。因此,不同元素上具有相同属性名称的属性可能具有不同的属性类型属性。在下面的示例 XML 文档中,如果只有alpha/@bar一个属性类型ID(并且没有其他属性),getElementById("xyz")将返回一个元素,但getElementById("abc")不会(因为beta/@bar不是属性类型ID)。此外,属性gamma/@bar与 具有相同的值也不是错误,alpha/@barXML 文档中 ID 的唯一性不考虑该值,因为它不是属性类型ID。

<test2>
  <alpha bar="xyz"/>
  <beta bar="abc"/>
  <gamma bar="xyz"/>
</test2>

回答by Brad Parks

The following will allow you to get an element by id:

以下将允许您通过 id 获取元素:

public static Element getElementById(Element rootElement, String id)
{
    try 
    {
        String path = String.format("//*[@id = '%1$s' or @Id = '%1$s' or @ID = '%1$s' or @iD = '%1$s' ]", id);
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodes = (NodeList)xPath.evaluate(path, rootElement, XPathConstants.NODESET);

        return (Element) nodes.item(0);
    } 
    catch (Exception e) 
    {
        return null;
    }
}