Java DOM 处理后 XML 属性的顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/726395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 18:45:23  来源:igfitidea点击:

Order of XML attributes after DOM processing

javaxmldom

提问by Fernando Miguélez

When processing XML by means of standard DOM, attribute order is not guaranteed after you serialize back. At last that is what I just realized when using standard java XML Transform API to serialize the output.

使用标准 DOM 处理 XML 时,序列化回后无法保证属性顺序。最后这就是我在使用标准 java XML Transform API 序列化输出时才意识到的。

However I do need to keep an order. I would like to know if there is any posibility on Java to keep the original order of attributes of an XML file processed by means of DOM API, or any way to force the order (maybe by using an alternative serialization API that lets you set this kind of property). In my case processing reduces to alter the value of some attributes (not all) of a sequence of the same elements with a bunch of attributes, and maybe insert a few more elements.

但是我确实需要保留订单。我想知道在 Java 上是否有任何可能性来保持通过 DOM API 处理的 XML 文件的属性的原始顺序,或以任何方式强制顺序(也许通过使用替代序列化 API,让您设置此一种财产)。在我的情况下,处理减少以改变具有一堆属性的相同元素序列的某些属性(不是全部)的值,并且可能插入更多元素。

Is there any "easy" way or do I have to define my own XSLT transformation stylesheet to specify the output and altering the whole input XML file?

有什么“简单”的方法,还是我必须定义我自己的 XSLT 转换样式表来指定输出并更改整个输入 XML 文件?

UpdateI must thank all your answers. The answer seems now more obvious than I expected. I never paid any attention to attribute order, since I had never needed it before.

更新我必须感谢您的所有回答。答案现在似乎比我预期的更明显。我从来没有注意过属性顺序,因为我以前从不需要它。

The main reason to require an attribute order is that the resulting XML file just looksdifferent. The target is a configuration file that holds hundreds of alarms (every alarm is defined by a set of attributes). This file usually has little modifications over time, but it is convenient to keep it ordered, since when we need to modify something it is edited by hand. Now and then some projects need light modifications of this file, such as setting one of the attributes to a customer specific code.

需要属性顺序的主要原因是生成的 XML 文件看起来不同。目标是一个包含数百个警报的配置文件(每个警报由一组属性定义)。随着时间的推移,这个文件通常几乎没有修改,但保持有序是很方便的,因为当我们需要修改某些内容时,它是手动编辑的。有时,一些项目需要对该文件进行轻微修改,例如将属性之一设置为客户特定的代码。

I just developed a little application to merge original file (common to all projects) with specific parts of each project (modify the value of some attributes), so project-specific file gets the updates of the base one (new alarm definitions or some attribute values bugfixes). My main motivation to require ordered attributes is to be able to check the output of the application againts the original file by means of a text comparation tool (such as Winmerge). If the format (mainly attribute order) remains the same, the differences can be easily spotted.

我刚刚开发了一个小应用程序来合并原始文件(所有项目通用)与每个项目的特定部分(修改某些属性的值),因此项目特定文件获取基本文件的更新(新警报定义或某些属性值错误修正)。我要求有序属性的主要动机是能够通过文本比较工具(例如 Winmerge)将应用程序的输出与原始文件进行比较。如果格式(主要是属性顺序)保持不变,则很容易发现差异。

I really thought this was possible, since XML handling programs, such as XML Spy, lets you edit XML files and apply some ordering (grid mode). Maybe my only choice is to use one of these programs to manuallymodify the output file.

我真的认为这是可能的,因为 XML 处理程序,例如 XML Spy,允许您编辑 XML 文件并应用一些排序(网格模式)。也许我唯一的选择是使用这些程序之一来手动修改输出文件。

采纳答案by Alain Pannetier

Sorry to say, but the answer is more subtle than "No you can't" or "Why do you need to do this in the first place ?".

很抱歉,但答案比“不,你不能”或“你为什么首先需要这样做?”更微妙。

The short answer is "DOM will not allow you to do that, but SAX will".

简短的回答是“DOM 不允许您这样做,但 SAX 会”。

This is because DOM does not care about the attribute order, since it's meaningless as far as the standard is concerned, and by the time the XSL gets hold of the input stream, the info is already lost. Most XSL engine will actually gracefully preserve the input stream attribute order (e.g. Xalan-C (except in one case) or Xalan-J (always)). Especially if you use <xsl:copy*>.

这是因为 DOM 不关心属性顺序,因为就标准而言它毫无意义,而当 XSL 获取输入流时,信息已经丢失。大多数 XSL 引擎实际上会优雅地保留输入流属性顺序(例如 Xalan-C(一种情况除外)或 Xalan-J(始终))。特别是如果您使用<xsl:copy*>.

Cases where the attribute order is not kept, best of my knowledge, are. - If the input stream is a DOM - Xalan-C: if you insert your result-tree tags literally (e.g. <elem att1={@att1} .../>

据我所知,不保留属性顺序的情况是。- 如果输入流是一个 DOM - Xalan-C:如果你按字面插入结果树标签(例如<elem att1={@att1} .../>

Here is one example with SAX, for the record (inhibiting DTD nagging as well).

下面是一个使用 SAX 的例子,作为记录(也禁止 DTD 唠叨)。

SAXParserFactory spf = SAXParserFactoryImpl.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
spf.setFeature("http://xml.org/sax/features/validation", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser sp = spf.newSAXParser() ;
Source src = new SAXSource ( sp.getXMLReader(), new InputSource( input.getAbsolutePath() ) ) ;
String resultFileName = input.getAbsolutePath().replaceAll(".xml$", ".cooked.xml" ) ;
Result result = new StreamResult( new File (resultFileName) ) ;
TransformerFactory tf = TransformerFactory.newInstance();
Source xsltSource = new StreamSource( new File ( COOKER_XSL ) );
xsl = tf.newTransformer( xsltSource ) ;
xsl.setParameter( "srcDocumentName", input.getName() ) ;
xsl.setParameter( "srcDocumentPath", input.getAbsolutePath() ) ;

xsl.transform(src, result );

I'd also like to point out, at the intention of many naysayers that there arecases where attribute order doesmatter.

我还想指出的是,在许多反对者的意图存在情况下,属性顺序的事情。

Regression testing is an obvious case. Whoever has been called to optimise not-so-well written XSL knows that you usually want to make sure that "new" result trees are similar or identical to the "old" ones. And when the result tree are around one million lines, XML diff tools prove too unwieldy... In these cases, preserving attribute order is of great help.

回归测试是一个明显的例子。任何被要求优化编写得不太好的 XSL 的人都知道,您通常希望确保“新”结果树与“旧”结果树相似或相同。当结果树大约有一百万行时,XML diff 工具被证明太笨拙了……在这些情况下,保留属性顺序有很大帮助。

Hope this helps ;-)

希望这可以帮助 ;-)

回答by Soviut

You really shouldn't need to keep any sort of order. As far as I know, no schema takes attribute order into account when validating an XML document either. It sounds like whatever is processing XML on the other end isn't using a proper DOM to parse the results.

你真的不需要保持任何顺序。据我所知,在验证 XML 文档时,也没有任何模式考虑属性顺序。听起来在另一端处理 XML 的任何内容都没有使用适当的 DOM 来解析结果。

I suppose one option would be to manually build up the document using string building, but I strongly recommend against that.

我想一种选择是使用字符串构建手动构建文档,但我强烈建议不要这样做。

回答by Robert Rossney

Look at section 3.1 of the XML recommendation. It says, "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."

查看 XML 建议的第 3.1 节。它说,“请注意,开始标签或空元素标签中属性规范的顺序并不重要。”

If a piece of software requires attributes on an XML element to appear in a specific order, that software is not processing XML, it's processing text that looks superficially like XML. It needs to be fixed.

如果某个软件需要 XML 元素上的属性以特定顺序出现,则该软件不是在处理 XML,而是在处理表面上看起来像 XML 的文本。它需要修复。

If it can't be fixed, and you have to produce files that conform to its requirements, you can't reliably use standard XML tools to produce those files. For instance, you might try (as you suggest) to use XSLT to produce attributes in a defined order, e.g.:

如果无法修复,并且必须生成符合其要求的文件,则无法可靠地使用标准 XML 工具生成这些文件。例如,您可以尝试(如您所建议的)使用 XSLT 以定义的顺序生成属性,例如:

<test>
   <xsl:attribute name="foo"/>
   <xsl:attribute name="bar"/>
   <xsl:attribute name="baz"/>
</test>

only to find that the XSLT processor emits this:

只是发现 XSLT 处理器发出以下信息:

<test bar="" baz="" foo=""/>

because the DOM that the processor is using orders attributes alphabetically by tag name. (That's common but not universal behavior among XML DOMs.)

因为处理器使用的 DOM 按标签名称的字母顺序排列属性。(这是 XML DOM 中常见但不普遍的行为。)

But I want to emphasize something. If a piece of software violates the XML recommendation in one respect, it probably violates it in other respects. If it breaks when you feed it attributes in the wrong order, it probably also breaks if you delimit attributes with single quotes, or if the attribute values contain character entities, or any of a dozen other things that the XML recommendation says that an XML document can do that the author of this software probably didn't think about.

但我想强调一点。如果一个软件在一个方面违反了 XML 建议,它可能在其他方面也违反了它。如果在以错误的顺序提供属性时它会中断,如果您用单引号分隔属性,或者如果属性值包含字符实体,或者 XML 建议所说的 XML 文档中的任何其他东西,它也可能会中断可以做到这个软件的作者可能没有想到。

回答by John Saunders

It's not possible to over-emphasize what Robert Rossney just said, but I'll try. ;-)

罗伯特·罗斯尼(Robert Rossney)刚刚说的话怎么强调都不为过,但我会努力的。;-)

The benefit of International Standards is that, when everybody follows them, life is good. All our software gets along peacefully.

国际标准的好处在于,当每个人都遵循这些标准时,生活就会变得美好。我们所有的软件都能和平相处。

XML has to be one of the most important standards we have. It's the basis of "old web" stuff like SOAP, and still 'web 2.0' stuff like RSS and Atom. It's because of clear standards that XML is able to interoperate between different platforms.

XML 必须是我们拥有的最重要的标准之一。它是 SOAP 等“旧网络”内容的基础,也是 RSS 和 Atom 等“Web 2.0”内容的基础。正是由于明确的标准,XML 才能够在不同平台之间进行互操作。

If we give up on XML, little by little, we'll get into a situation where a producer of XML will not be able to assume that a consumer of XML will be able to consumer their content. This would have a disasterous affect on the industry.

如果我们一点一点地放弃 XML,我们就会陷入这样一种情况:XML 的生产者将无法假设 XML 的消费者能够消费他们的内容。这将对行业产生灾难性的影响。

We should push back very forcefully, on anyone who writes code that does not process XML according to the standard. I understand that, in these economic times, there is a reluctance to offend customers and business partners by saying "no". But in this case, I think it's worth it. We would be in much worse financial shape if we had to hand-craft XML for each business partner.

我们应该非常有力地反对任何编写不按照标准处理 XML 的代码的人。我了解,在当前经济时代,人们不愿意通过说“不”来冒犯客户和业务合作伙伴。但在这种情况下,我认为这是值得的。如果我们必须为每个业务合作伙伴手工制作 XML,我们的财务状况会更糟。

So, don't "enable" companies who do not understand XML. Send them the standard, with the appropriate lines highlighted. They need to stop thinking that XML is just text with angle brackets in it. It simply does not behave like text with angle brackets in it.

所以,不要“启用”不理解 XML 的公司。将标准发送给他们,并突出显示相应的行。他们需要停止认为 XML 只是带有尖括号的文本。它的行为不像带有尖括号的文本。

It's not like there's an excuse for this. Even the smallest embedded devices can have full-featured XML parser implementations in them. I have not yet heard a good reason for not being able to parse standard XML, even if one can't afford a fully-featured DOM implementation.

这不像是一个借口。即使是最小的嵌入式设备也可以在其中实现功能齐全的 XML 解析器。我还没有听到不能解析标准 XML 的充分理由,即使人们买不起全功能的 DOM 实现。

回答by Dan Breslau

Robert Rossney said it well: if you're relying on the ordering of attributes, you're not really processing XML, but rather, something that looks like XML.

Robert Rossney 说得很好:如果您依赖于属性的排序,那么您实际上并不是在处理 XML,而是处理看起来像 XML 的东西。

I can think of at least two reasons why you might care about attribute ordering. There may be others, but at least for these two I can suggest alternatives:

我可以想到至少有两个原因您可能会关心属性排序。可能还有其他的,但至少对于这两个我可以提出替代方案:

  1. You're using multiple instances of attributes with the same name:

    <foo myAttribute="a" myAttribute="b" myAttribute="c"/>
    

    This is just plain invalid XML; a DOM processor will probably drop all but one of these values – if it processes the document at all. Instead of this, you want to use child elements:

    <foo>
        <myChild="a"/>
        <myChild="b"/>
        <myChild="c"/>
    </foo>
    
  2. You're assuming that some sort of distinction applies to the attribute(s) that come first. Make this explicit, either through other attributes or through child elements. For example:

    <foo attr1="a" attr2="b" attr3="c" theMostImportantAttribute="attr1" />
    
  1. 您正在使用多个具有相同名称的属性实例:

    <foo myAttribute="a" myAttribute="b" myAttribute="c"/>
    

    这只是简单的无效 XML;DOM 处理器可能会删除所有这些值中的一个——如果它完全处理文档。取而代之的是,您想使用子元素:

    <foo>
        <myChild="a"/>
        <myChild="b"/>
        <myChild="c"/>
    </foo>
    
  2. 您假设某种区别适用于首先出现的属性。通过其他属性或通过子元素明确这一点。例如:

    <foo attr1="a" attr2="b" attr3="c" theMostImportantAttribute="attr1" />
    

回答by Jon Hanna

XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.

XML 规范化导致一致的属性排序,主要是为了允许人们检查部分或全部 XML 的签名,尽管还有其他潜在用途。这可能适合您的目的。

回答by Haroldo_OK

I think I can find some valid justifications for caring about attribute order:

我想我可以找到一些关心属性顺序的有效理由:

  • You may be expecting humans to have to manually read, diagnose or edit the XML data one time or another; readability would be important in that instance, and a consistent and logical ordering of the attributes helps with that;
  • You may have to communicate with some tool or service that (admitedly erroneously) cares about the order; asking the provider to correct its code may not be an option: try to ask that from a government agency while your user's deadline for electronically delivering a bunch of fiscal documents looms closer and closer!
  • 您可能希望人们一次或多次手动读取、诊断或编辑 XML 数据;在这种情况下,可读性很重要,属性的一致和逻辑排序有助于实现这一点;
  • 您可能需要与一些(诚然错误地)关心订单的工具或服务进行沟通;要求提供商更正其代码可能不是一种选择:尝试向政府机构询问这一点,而您的用户以电子方式交付一堆财务文件的截止日期越来越近!

It seems like Alain Pannetier's solutionis the way to go.

似乎Alain Pannetier 的解决方案是要走的路。

Also, you may want to take a look at DecentXML; it gives you full control of how the XML is formatted, even though it's not DOM-compatible. Specially useful if you want to modify some hand-edited XML without losing the formatting.

另外,您可能想看看DecentXML;它使您可以完全控制 XML 的格式,即使它与 DOM 不兼容。如果您想在不丢失格式的情况下修改一些手动编辑的 XML,则特别有用。

回答by Bashir

I had the same exact problem. I wanted to modify XML attributes but wanted to keep the order because of diff. I used StAXto achieve this. You have to use XMLStreamReader and XMLStreamWriter (the Cursor based solution). When you get a START_ELEMENT event type, the cursor keeps the index of the attributes. Hence, you can make appropriate modifications and write them to the output file "in order".

我有同样的问题。我想修改 XML 属性,但由于差异而想保持顺序。我使用StAX来实现这一点。您必须使用 XMLStreamReader 和 XMLStreamWriter(基于 Cursor 的解决方案)。当您获得 START_ELEMENT 事件类型时,光标会保留属性的索引。因此,您可以进行适当的修改并将它们“按顺序”写入输出文件。

Look at this article/discussion. You can see how to read the attributes of the start elements in order.

看看这篇文章/讨论。您可以看到如何按顺序读取开始元素的属性。

回答by Roberto Taschetto

I have a quite similar problem. I need to have always the same attribute for first. Example :

我有一个非常相似的问题。我首先需要始终具有相同的属性。例子 :

<h50row a="1" xidx="1" c="1"></h50row>
<h50row a="2" b="2" xidx="2"></h50row>

must become

必须成为

<h50row xidx="1" a="1" c="1"></h50row>
<h50row xidx="2" a="2" b="2"></h50row>

I found a solution with a regex:

我找到了一个正则表达式的解决方案:

test = "<h50row a=\"1\" xidx=\"1\" c=\"1\"></h50row>";
test = test.replaceAll("(<h5.*row)(.*)(.xidx=\"\w*\")([^>]*)(>)", "");

Hope you find this usefull

希望你觉得这很有用

回答by Radu Simionescu

You can still do this using the standard DOM and Transformation API by using a quick and dirty solution like the one I am describing:

您仍然可以使用标准的 DOM 和 Transformation API 来执行此操作,方法是使用我所描述的快速而肮脏的解决方案:

We know that the transformation API solution orders the attributes alphabetically. You can prefix the attributes names with some easy-to-strip-later strings so that they will be output in the order you want. Simple prefixes as "a_" "b_" etc should suffice in most situations and can be easily stripped from the output xml using a one liner regex.

我们知道转换 API 解决方案按字母顺序排列属性。您可以使用一些易于删除的字符串作为属性名称的前缀,以便它们按照您想要的顺序输出。在大多数情况下,“a_”“b_”等简单前缀就足够了,并且可以使用单行正则表达式轻松地从输出 xml 中删除。

If you are loading an xml and resave and want to preserve attributes order, you can use the same principle, by first modifying the attribute names in the input xml text and then parsing it into a Document object. Again, make this modification based on a textual processing of the xml. This can be tricky but can be done by detecting elements and their attributes strings, again, using regex. Note that this is a dirty solution. There are many pitfalls when parsing XML on your own, even for something as simple as this, so be careful if you decide to implement this.

如果您正在加载 xml 并重新保存并希望保留属性顺序,您可以使用相同的原则,首先修改输入 xml 文本中的属性名称,然后将其解析为 Document 对象。同样,根据 xml 的文本处理进行此修改。这可能很棘手,但可以通过再次使用正则表达式检测元素及其属性字符串来完成。请注意,这是一个肮脏的解决方案。在您自己解析 XML 时有很多陷阱,即使是像这样简单的事情,所以如果您决定实现它,请务必小心。