编码以数字开头的 XML 元素名称?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2087108/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:56:03  来源:igfitidea点击:

Encoding XML element name beginning with a number?

xml

提问by Anonym

I'm looking at the output of a tool, dumping a database table to XML. One of the columns is named 64kbit, the tool encodes that as such, and I need to replicate that:

我正在查看工具的输出,将数据库表转储到 XML。其中一列名为64kbit,该工具对其进行编码,我需要复制它:

 <_x0036_4kbit>0</_x0036_4kbit>

Is this some sort of standard encoding ? Where can I learn more about it ?

这是某种标准编码吗?我在哪里可以了解更多信息?

采纳答案by Joey

Well, it doesn't seem to be too standard, but XML explicitly disallows numbers (and some other things) as the first character of an element name:

好吧,它似乎不太标准,但 XML 明确不允许数字(和其他一些东西)作为元素名称的第一个字符:

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                  [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
                  [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
                  [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
                  [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

This encoding here just kinda escapes the first character if it doesn't fit that requirements. It uses the hexadecimal value of that character. _x0036_obviously corresponds to hexadeximal 0x36which is 54in decimal and represents the digit 6.

如果第一个字符不符合该要求,这里的编码只是对它进行转义。它使用该字符的十六进制值。_x0036_显然对应于hexadeximal0x36其是54在十进制和表示数字6

回答by WonderWorker

The official word is that the restriction imposed on Xml naming conventions are inherited from Xml's parent-set SGML, with one exception only: In Xml, as an additional option, names may begin with an underscore '_' character.

官方说法是 Xml 命名约定的限制是从 Xml 的父集 SGML 继承而来的,只有一个例外:在 Xml 中,作为附加选项,名称可以以下划线 '_' 字符开头。

SGML was developed by IBM in the 1960s, by a group of minds that were thinking '1960s style'.

SGML 是由 IBM 在 1960 年代由一群思考“1960 年代风格”的思想开发的。

As a result, the brain-storm that lead to the creation of SGML was likely to have been distracted by the overwhelming notion that space-ships, time-travel and flairs made of kitchen foil to protect against 'them aliens' and their fool-hardy attempts at thought-provocation and mind-control were justified thought processes.

结果,导致创建 SGML 的头脑风暴很可能被一种压倒性的观念分散了注意力,即宇宙飞船、时间旅行和厨房箔制成的天赋可以防止“他们外星人”和他们的傻瓜——在思想挑衅和精神控制方面的顽强尝试是合理的思维过程。

So. The question still remains. Why doesn't SGML allow numbers? Furthermore, why would there be any sort of restriction imposed on the use of any character other than the control-characters; <, >, & and empty space? It would be madness, surely to present the computer geek with so many keys for so many different characters, only to prevent him or her from using them.

所以。这个问题仍然存在。为什么 SGML 不允许数字?此外,为什么会对控制字符以外的任何字符的使用施加任何限制?<、>、& 和空格?向计算机极客展示这么多不同字符的这么多键,只是为了阻止他或她使用它们,这肯定是疯狂的。

The most significant reason is the 1960s thinking parser, and it's following of the complexity rule to a degree of outright pedantry.

最重要的原因是 1960 年代的思维解析器,它遵循复杂性规则到一定程度的彻头彻尾的迂腐。

'The simpler the parser is, the faster it will perform'

'解析器越简单,执行速度就越快'

The alphabet is 26 capital + 26 uncapital characters big in total, and that's 52. Allowing numbers is an additional ten digits, which is about a sixth more!

字母表是 26 个大写字母 + 26 个非大写字母,总共是 52 个。允许数字是额外的十位数字,大约是六分之一!

In human terms, this would be like having to wash six hideously filth-encrusted pots, each one taking an hour to clean, and then hidden underneath the last pot is an extra bonus pot to wash, and you must wash it! You have to repeat this routine every single day for the rest of your life, and that's exactly what it like. Precisely!

用人的话说,这就像要洗六个脏兮兮的锅,每个都要花一个小时来清洗,然后在最后一个锅底下藏着一个额外的奖励锅,你必须洗它!你必须在你的余生中每天重复这个程序,这正是它喜欢的。恰恰!

Mark-up language documents have a tendency to bulge in content. So, the less jobs for the parser, mean a direct increase in performance speed. The benefits then trickle down through the ranks until they metamorphose into pure lucrative performance.

标记语言文档有内容膨胀的趋势。因此,解析器的工作越少,意味着性能速度的直接提高。然后,这些好处会逐渐渗透到各个级别,直到它们转变为纯粹的利润丰厚的表现。

In the 'Ye olde days of horse, carriage and a Commodore 64' it was far more the user's responsibility to count their bits and bytes manually, in order for the kilobytes to take care of themselves. However, as the modern CPU is more able to cope than its ancient predecessor, the restrictions imposed by the parser have become more significant than the performance issues.

在“马、马车和 Commodore 64 的旧时代”中,用户有更多的责任手动计算比特和字节数,以便千字节自行处理。然而,由于现代 CPU 比其古老的前身更能应对,解析器施加的限制变得比性能问题更重要。

If it's any consolation, if I were to design a Mark-up language myself (which for argument's sake, we will call NAM-LIT-MAML, because Nicholas' awesome mark-up language is the most awesome mark-up language (ever!), then it would allow you to use any number of all the characters in the entire history of the world, and indeed universe, without exception, and I would work really hard to create some never been used before characters for the language's own use, which could still be used within the document by use of its own escape character that looks nothing like any other character that's ever been used before by anyone ever.

如果有什么安慰的话,如果我自己设计一种标记语言(为了论证,我们将称之为 NAM-LIT-MAML,因为 Nicholas 很棒的标记语言是最棒的标记语言(有史以来! ),那么它将允许您使用整个世界历史上的所有字符中的任意数量,甚至宇宙,无一例外,我会非常努力地创建一些以前从未使用过的字符供语言自己使用,它仍然可以通过使用自己的转义字符在文档中使用,该字符看起来与以前任何人使用过的任何其他字符完全不同。

The restrictions imposed by Xml are inherited from SGML, and we can all agree that in this day and age of space-ship camels and other useful robotic mammals, they are unnecessary, stupid and go against the grain of Object Oriented programming.

Xml 强加的限制是从 SGML 继承而来的,我们都同意,在这个宇宙飞船骆驼和其他有用的机器人哺乳动物的时代,它们是不必要的、愚蠢的,并且违背了面向对象编程的原则。

Further reading at http://www.w3.org/TR/REC-xml/

进一步阅读http://www.w3.org/TR/REC-xml/

Although the simpliest way that I have found to make a name xml compatible is to include a suffix of '_', there is no standard and as such other methods are in use.

尽管我发现使名称与 xml 兼容的最简单方法是包含后缀“_”,但没有标准,因此正在使用其他方法。

In your example, the first character has been converted into a hex value. This hex value represents the '6' character in both ASCII, Unicode and undoubtedly others.

在您的示例中,第一个字符已转换为十六进制值。这个十六进制值代表 ASCII、Unicode 和毫无疑问的其他字符中的“6”字符。

A good thing about using hex values is that all characters in a code-set e.g. Unicode may be represented.

使用十六进制值的好处是可以表示代码集中的所有字符,例如 Unicode。

A bad thing is that they aren't as readable at a glance.

一件坏事是它们一目了然地不那么可读。

回答by Mark Byers

An XML name cannot start with a digit, so some other representation must be used that can be understood to mean '6'.

XML 名称不能以数字开头,因此必须使用其他一些可以理解为“6”的表示。

The tool has chosen to write the hexadecimal representation of the character instead, surrounded by underscores. The code \x0036is the hexadecimal code for the character '6', which is 54 in decimal. Underscores are valid characters at the start of an XML name so this works.

该工具已选择写入字符的十六进制表示,并用下划线包围。该代码\x0036是字符'6'的十六进制代码,十进制为54。下划线是 XML 名称开头的有效字符,因此可以使用。

This same technique could be used to escape other characters which are invalid in XML names. This technique is used for example by Microsoft's XmlConvert, as described here, but I'm sure there are other tools which use the same technique too.

可以使用相同的技术来转义 XML 名称中无效的其他字符。这项技术是由微软的XmlConvert用于例如,描述在这里,但我敢肯定有它使用相同的技术太等工具。

回答by Tim Bray

IIRC (I was there, but it was a long time ago) the thinking was that it would be very common to map XML element & attributes to programming-language constructs, which are represented by variables, and very few (any?) programming languages allow variable names that begin with numbers. So, the idea is that XML element/attribute names should fit nicely into most languages' variable-naming rules. Do I still believe this? If we were doing XML again, would I be OK with this? Dunno; it'd be an interesting discussion though.

IIRC(我在那里,但很久以前)的想法是将 XML 元素和属性映射到由变量表示的编程语言结构是很常见的,并且很少(任何?)编程语言允许以数字开头的变量名。所以,这个想法是 XML 元素/属性名称应该很好地适应大多数语言的变量命名规则。我还相信这个吗?如果我们再次使用 XML,我会接受吗?不知道; 不过这会是一个有趣的讨论。

回答by Rubens Farias

That encoding isn't default to XML, but seems required by your tool, since elements must start with a small character set.

该编码不是 XML 的默认编码,但似乎是您的工具所必需的,因为元素必须以小字符集开头。

That _x0036_sequence represents haxadecimal number 36 (decimal 54), which represents your 6character in ASCII table.

_x0036_序列表示十六进制数 36(十进制 54),它表示您6在 ASCII 表中的字符。