我可以使用架构强制执行 XML 属性的顺序吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1682131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:49:45  来源:igfitidea点击:

Can I enforce the order of XML attributes using a schema?

xmlperformancexsdexpat-parser

提问by Mike Willekes

Our C++ application reads configuration data from XML files that look something like this:

我们的 C++ 应用程序从 XML 文件中读取配置数据,如下所示:

<data>
 <value id="FOO1" name="foo1" size="10" description="the foo" ... />
 <value id="FOO2" name="foo2" size="10" description="the other foo" ... />
 ...
 <value id="FOO300" name="foo300" size="10" description="the last foo" ... />
</data>

The complete application configuration consist of ~2500 of these XML files (which translates into more than 1.5 million key/value attribute pairs). The XML files come from many different sources/teams and are validated against a schema. However, sometimes the <value/>nodes look like this:

完整的应用程序配置包含大约 2500 个这样的 XML 文件(转换为超过 150 万个键/值属性对)。XML 文件来自许多不同的来源/团队,并根据架构进行验证。但是,有时<value/>节点看起来像这样:

<value name="bar1" id="BAR1" description="the bar" size="20" ... />

or this:

或这个:

<value id="BAT1" description="the bat" name="bat1"  size="25" ... />

To make this process fast, we are using Expatto parse the XML documents. Expat exposes the attributes as an array - like this:

为了加快这个过程,我们使用Expat来解析 XML 文档。Expat 将属性公开为数组 - 如下所示:

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 for (int i = 0; atts[i]; i += 2) 
 {
  std::string key = atts[i];
  std::string value = atts[i + 1];
  ProcessAttribute (key, value);
 }
}

This puts all the responsibility onto our ProcessAttribute()function to read the 'key' and decide what to do with the value. Profiling the app has shown that ~40% of the total XML Parsing time is dealing with these attributes by name/string.

这将所有责任都放在我们的ProcessAttribute()函数上,以读取“键”并决定如何处理该值。 分析应用程序表明,大约 40% 的 XML 解析总时间是按名称/字符串处理这些属性的。

The overall process could be sped up dramatically if I could guarantee/enforce the order of the attributes (for starters, no string comparisons in ProcessAttribute()). For example, if 'id' attribute was alwaysthe 1st attribute we could deal with it directly:

如果我可以保证/强制执行属性的顺序(对于初学者来说,在 中没有字符串比较ProcessAttribute()),整个过程可以大大加快。例如,如果 'id' 属性始终是第一个属性,我们可以直接处理它:

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 ProcessID (atts[1]);
 ProcessName (atts[3]);
 //etc.
}

According to the W3C schema specs, I can use <xs:sequence>in an XML schema to enforce the order of elements - but it doesn't seem to work for attributes - or perhaps I'm using it incorrectly:

根据 W3C 模式规范,我可以<xs:sequence>在 XML 模式中使用来强制执行元素的顺序 - 但它似乎不适用于属性 - 或者我使用它不正确:

<xs:element name="data">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="value" type="value_type" minOccurs="1" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
</xs:element>

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" />
  <xs:attribute name="name" type="xs:string" />
  <xs:attribute name="description" type="xs:string" />
 </xs:sequence>
</xs:complexType>

Is there a way to enforce attribute order in an XML document? If the answer is "no" - could anyone perhaps suggest a alternative that wouldn't carry a huge runtime performance penalty?

有没有办法在 XML 文档中强制执行属性顺序?如果答案是“否” - 任何人都可以提出一种不会带来巨大运行时性能损失的替代方案吗?

回答by Stanislav Stoyanov

According to the xml specification,

根据xml规范,

the order of attribute specifications in a start-tag or empty-element tag is not significant

开始标签或空元素标签中属性规范的顺序并不重要

You can check it at section 3.1

您可以在第 3.1 节查看

回答by J?rg W Mittag

XML attributes don't havean order, therefore there is no order to enforce.

XML属性不具备的订单,因此没有订单执行。

If you want something ordered, you need XML elements. Or something different from XML. JSON, YAML and bEncode, e.g. have both maps (which are unordered) and sequences (which are ordered).

如果您想要订购某些东西,则需要 XML 元素。或者与 XML 不同的东西。JSON、YAML 和 bEncode,例如同时具有映射(无序)和序列(有序)。

回答by Robert Rossney

As others have pointed out, no, you can't rely on attribute ordering.

正如其他人指出的那样,不,您不能依赖属性排序。

If I had any process at all involving 2,500 XML files and 1.5 million key/value pairs, I would get that data out of XML and into a more usable form as soon as I possibly could. A database, a binary serialization format, whatever. You're not getting any advantage out of using XML (other than schema validation). I'd update my store every time I got a new XML file, and take parsing 1.5 million XML elements out of the main flow of my process.

如果我有任何涉及 2,500 个 XML 文件和 150 万个键/值对的过程,我会尽快将这些数据从 XML 中提取出来并转换成更有用的形式。一个数据库,一个二进制序列化格式,等等。使用 XML(架构验证除外)没有任何好处。每次获得新的 XML 文件时,我都会更新我的商店,并从我的流程的主要流程中解析 150 万个 XML 元素。

回答by Gary McGill

The answer isno, alas. I'm shocked by your 40% figure. I find it hard to believe that turning "foo" into ProcessFoo takes that long. Are you sure the 40% doesn't include the time taken to executeProcessFoo?

答案否定的,唉。我对你 40% 的数字感到震惊。我发现很难相信将“foo”转换为 ProcessFoo 需要这么长时间。您确定 40% 不包括执行ProcessFoo所需的时间吗?

Is it possible to access the attributes by name using this Expat thing? That's the more traditional way to access attributes. I'm not saying it's going to be faster, but it might be worth a try.

是否可以使用这个 Expat 东西按名称访问属性?这是访问属性的更传统的方式。我并不是说它会更快,但可能值得一试。

回答by Chris McCall

I'm pretty sure there's no way to enforce attribute order in an XML document. I'm going to assume that you can insist on it via a business process or other human factors, such as a contract or other document.

我很确定没有办法在 XML 文档中强制执行属性顺序。我将假设您可以通过业务流程或其他人为因素(例如合同或其他文件)来坚持它。

What if you just assumed that the first attribute was "id", and tested the name to be sure? If yes, use the value, if not, then you can try to get the attribute by name or throw out the document.

如果您只是假设第一个属性是“id”,并测试名称以确保会怎样?如果是,则使用该值,如果不是,那么您可以尝试通过名称获取属性或丢弃文档。

While not as efficient as calling out the attribute by its ordinal, some non-zero number of times you'll be able to guess that your data providers have delivered XML to spec. The rest of the time, you can take other action.

虽然不如按顺序调用属性那么有效,但在某些非零次数中,您将能够猜测您的数据提供者已将 XML 交付给规范。其余时间,您可以采取其他行动。

回答by marc_s

I don't think XML Schema supports that - attributes are just defined and restricted by name, e.g. they have to match a particular name - but I don't see how you could define an order for those attributes in XSD.

我不认为 XML Schema 支持 - 属性只是按名称定义和限制,例如它们必须匹配特定名称 - 但我不知道如何在 XSD 中为这些属性定义顺序。

I don't know of any other way to make sure attributes on a XML node come in a particular order - not sure if any of the other XML schema mechanisms like Schematron or Relax NG would support that....

我不知道还有什么其他方法可以确保 XML 节点上的属性按特定顺序出现 - 不确定其他任何 XML 模式机制(如 Schematron 或 Relax NG)是否会支持...

回答by James Cronen

Just a guess, but can you try adding use="required"to each of your attribute specifications?

只是猜测,但您可以尝试添加use="required"到每个属性规范中吗?

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" use="required" />
  <xs:attribute name="name" type="xs:string" use="required" />
  <xs:attribute name="description" type="xs:string" use="required" />
 </xs:sequence>
</xs:complexType>

I'm wondering if the parser is being slowed down by allowing optional attributes, when it appears your attributes will always be there.

我想知道解析器是否因允许可选属性而变慢,当它出现时,您的属性将始终存在。

Again, just a guess.

再次,只是一个猜测。

EDIT:XML 1.0 spec says that attribute order is not significant. http://www.w3.org/TR/REC-xml/#sec-starttags

编辑:XML 1.0 规范说属性顺序并不重要。 http://www.w3.org/TR/REC-xml/#sec-starttags

Therefore, XSD won't enforce any order. But that doesn't mean that parsers can't be fooled into working quickly, so I'm keeping the above answer published in case it actually works.

因此,XSD 不会强制执行任何命令。但这并不意味着解析器不能被愚弄而快速工作,所以我保留上面的答案,以防它真的有效。

回答by rama-jka toti

From what I recall, Expat is a non validating parser and better for it.. so you can probably scrap that XSD idea. Neither is the order-dependent a good idea in many XML approaches (XSD got criticised on element order a heck of a lot back in the day, for example, by pro or anti- sellers of XML Web Services at MSFT).

据我所知,Expat 是一个非验证解析器,而且更适合它......所以你可能会放弃 XSD 的想法。在许多 XML 方法中,依赖于顺序的想法也不是一个好主意(XSD 在当天受到了很多关于元素顺序的批评,例如,MSFT 的 XML Web 服务的支持者或反对者)。

Do your custom encoding and simply extend either your logic for more efficient lookup or dig into the parser source. It is trivial to write the tooling around encoding efficient replacement whilst shielding the software agents and users from it.. you want do to this so it is easily migrated while preserving backward compatibility and reversibility. Also, go for fixed-size constraints/attribute-name-translation.

执行您的自定义编码并简单地扩展您的逻辑以获得更有效的查找或深入解析器源。围绕编码高效替换编写工具,同时屏蔽软件代理和用户,这是微不足道的。您想要这样做,以便在保留向后兼容性和可逆性的同时轻松迁移。另外,请使用固定大小的约束/属性名称翻译。

[ Consider yourself lucky with Expat :) and its raw speed. Imagine how CLR devs love XML scaling facilities, they routinely send 200MB on the wire in process of 'just querying the database' .. ]

[ 认为自己很幸运拥有 Expat :) 及其原始速度。想象一下 CLR 开发人员是多么喜欢 XML 扩展工具,他们在“只是查询数据库”的过程中经常在线发送 200MB 的数据......]