php 如何检查字符串是否是有效的 XML 元素名称?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2519845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to check if string is a valid XML element name?
提问by Mike Starov
I need a regex or a function in PHP that will validate a string to be a good XML element name.
我需要一个正则表达式或 PHP 中的函数来验证字符串是否是一个好的 XML 元素名称。
Form w3schools:
表格 w3schools:
XML elements must follow these naming rules:
- Names can contain letters, numbers, and other characters
- Names cannot start with a number or punctuation character
- Names cannot start with the letters xml (or XML, or Xml, etc)
- Names cannot contain spaces
XML 元素必须遵循以下命名规则:
- 名称可以包含字母、数字和其他字符
- 名称不能以数字或标点符号开头
- 名称不能以字母 xml(或 XML、或 Xml 等)开头
- 名称不能包含空格
I can write a basic regex that will check for rules 1,2 and 4, but it won't account for all punctuation allowed and won't account for 3rd rule
我可以编写一个基本的正则表达式来检查规则 1,2 和 4,但它不会考虑所有允许的标点符号,也不会考虑第三条规则
\w[\w0-9-]
Friendly Update
友情更新
Here is the more authoritative source for well-formed XML Element names:
以下是格式良好的 XML 元素名称的更权威来源:
Names and Tokens
名称和令牌
NameStartChar ::=
":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
[#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
[#x10000-#xEFFFF]
NameChar ::=
NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
Name ::=
NameStartChar (NameChar)*
Also a separate non-tokenized rule is specified:
还指定了单独的非标记化规则:
Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.
以字符串“xml”或任何匹配 (('X'|'x') ('M'|'m') ('L'|'l')) 开头的名称被保留用于标准化在本规范的本版本或未来版本中。
采纳答案by Leo
How about
怎么样
/\A(?!XML)[a-z][\w0-9-]*/i
Usage:
用法:
if (preg_match('/\A(?!XML)[a-z][\w0-9-]*/i', $subject)) {
# valid name
} else {
# invalid name
}
Explanation:
解释:
\A Beginning of the string
(?!XML) Negative lookahead (assert that it is impossible to match "XML")
[a-z] Match a non-digit, non-punctuation character
[\w0-9-]* Match an arbitrary number of allowed characters
/i make the whole thing case-insensitive
回答by Gordon
If you want to create valid XML, use the DOM Extension. This way you don't have to bother about any Regex. If you try to put in an invalid name to a DomElement, you'll get an error.
如果要创建有效的 XML,请使用DOM 扩展。这样您就不必担心任何正则表达式。如果您尝试向 DomElement 输入无效名称,则会出现错误。
function isValidXmlName($name)
{
try {
new DOMElement($name);
return TRUE;
} catch(DOMException $e) {
return FALSE;
}
}
This will give
这会给
var_dump( isValidXmlName('foo') ); // true valid localName
var_dump( isValidXmlName(':foo') ); // true valid localName
var_dump( isValidXmlName(':b:c') ); // true valid localName
var_dump( isValidXmlName('b:c') ); // false assumes QName
and is likely good enough for what you want to do.
并且可能足以满足您的需求。
Pedantic note 1
学究注1
Note the distinction between localName and QName. ext/dom assumes you are using a namespaced element if there is a prefix before the colon, which adds constraints to how the name may be formed. Technically, b:b is a valid local name though because NameStartChar is part of NameChar. If you want to include these, change the function to
请注意 localName 和QName之间的区别。如果冒号前有前缀,则 ext/dom 假定您正在使用命名空间元素,这会增加对名称形成方式的限制。从技术上讲, b:b 是一个有效的本地名称,因为NameStartChar 是 NameChar 的一部分。如果要包含这些,请将函数更改为
function isValidXmlName($name)
{
try {
new DOMElement(
$name,
null,
strpos($name, ':') >= 1 ? 'http://example.com' : null
);
return TRUE;
} catch(DOMException $e) {
return FALSE;
}
}
Pedantic note 2
学究注2
Note that elements may start with "xml". W3schools (who is not affiliated with the W3c) apparently got this part wrong (wouldn't be the first time). If you really want to exclude elements starting with xml add
请注意,元素可能以“xml”开头。W3schools(不隶属于 W3c)显然把这部分弄错了(不是第一次)。如果您真的想排除以 xml 开头的元素,请添加
if(stripos($name, 'xml') === 0) return false;
before the try/catch.
之前try/catch。
回答by hakre
This has been missed so far despite the fact the question is that old: Name validation via PHP's pcre functions that are streamlined with the XML specification.
尽管问题很老,但到目前为止,这已经被遗漏了:通过 PHP 的 pcre 函数进行名称验证,这些函数通过 XML 规范进行了简化。
XML's definition is pretty clear about the element name in it's specs (Extensible Markup Language (XML) 1.0 (Fifth Edition)):
XML 的定义非常清楚其规范中的元素名称(可扩展标记语言 (XML) 1.0(第五版)):
[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5] Name ::= NameStartChar (NameChar)*
This notation can be transposed into a UTF-8 compatible regular expression to be used with preg_match, here as single-quoted PHP string to be copied verbatim:
此表示法可以转换为与 UTF-8 兼容的正则表达式以与 一起使用preg_match,这里作为要逐字复制的单引号 PHP 字符串:
'~^[:A-Z_a-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}][:A-Z_a-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}.\-0-9\xB7\x{0300}-\x{036F}\x{203F}-\x{2040}]*$~u'
Or as another variant with named subpatterns in a more readable fashion:
或者作为另一个具有更可读方式的命名子模式的变体:
'~
# XML 1.0 Name symbol PHP PCRE regex <http://www.w3.org/TR/REC-xml/#NT-Name>
(?(DEFINE)
(?<NameStartChar> [:A-Z_a-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}])
(?<NameChar> (?&NameStartChar) | [.\-0-9\xB7\x{0300}-\x{036F}\x{203F}-\x{2040}])
(?<Name> (?&NameStartChar) (?&NameChar)*)
)
^(?&Name)$
~ux'
Note that this pattern contains the colon :which you might want to exclude (two appereances in the first pattern, one in the second) for XML Namespace validation reasons (e.g. a test for NCName).
请注意,此模式包含:您可能希望排除的冒号(第一个模式中的两个外观,第二个中的一个)出于 XML 命名空间验证的原因(例如测试NCName)。
Usage Example:
用法示例:
$name = '::...';
$pattern = '~
# XML 1.0 Name symbol PHP PCRE regex <http://www.w3.org/TR/REC-xml/#NT-Name>
(?(DEFINE)
(?<NameStartChar> [:A-Z_a-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}])
(?<NameChar> (?&NameStartChar) | [.\-0-9\xB7\x{0300}-\x{036F}\x{203F}-\x{2040}])
(?<Name> (?&NameStartChar) (?&NameChar)*)
)
^(?&Name)$
~ux';
$valid = 1 === preg_match($pattern, $name); # bool(true)
The saying that an element name starting with XML(in lower or uppercase letters) would not be possible is not correct. <XML/>is a perfectly well-formed XML and XMLis a perfectly well-formed element name.
以XML(小写或大写字母)开头的元素名称不可能的说法是不正确的。<XML/>是一个完美格式的 XML 并且XML是一个完美格式的元素名称。
It is just that such names are in the subset of well-formed element names that are reserved for standardization(XML version 1.0 and above). It is easy to test if a (well-formed) element name is reserved with a string comparison:
只是这些名称位于为标准化(XML 1.0 及更高版本)保留的格式良好的元素名称的子集中。很容易通过字符串比较来测试(格式良好的)元素名称是否保留:
$reserved = $valid && 0 === stripos($name, 'xml'));
or alternatively another regular expression:
或者另一个正则表达式:
$reserved = $valid && 1 === preg_match('~^[Xx][Mm][Ll]~', $name);
PHP's DOMDocumentcan nottest for reserved names at least I don't know any way how to do that and I've been looking a lot.
PHP的DOMDocument能不能测试保留的名称,至少我不知道任何方式如何做到这一点,我一直在寻找很多。
A valid element name needs a Unique Element Type Declarationwhich seems to be out of the scope of the question here as no such declaration has been provided. Therefore the answer does not take care of that. If there would be an element type declaration, you would only need to validate against a white-list of all (case-sensitive) names, so this would be a simple case-sensitive string-comparison.
一个有效的元素名称需要一个唯一元素类型声明,这似乎超出了这里的问题范围,因为没有提供这样的声明。因此,答案并没有考虑到这一点。如果有元素类型声明,您只需要针对所有(区分大小写)名称的白名单进行验证,因此这将是一个简单的区分大小写的字符串比较。
Excursion: What does DOMDocumentdo different to the Regular Expression?
Excursion:DOMDocument与正则表达式有什么不同?
In comparison with a DOMDocument/ DOMElement, there are some differences what qualifies a valid element name. The DOM extension is in some kind of mixed-mode which makes it less predictable what it validates. The following excursion illustrates the behavior and shows how to control it.
与DOMDocument/相比DOMElement,在限定有效元素名称方面存在一些差异。DOM 扩展处于某种混合模式,这使得它验证的内容更难预测。以下游览说明了该行为并展示了如何控制它。
Let's take $nameand instantiate an element:
让我们获取$name并实例化一个元素:
$element = new DOMElement($name);
The outcome depends:
结果取决于:
- if the first character is a colon, it just validates the XML 1.0
Namesymbol. - if the first character is not a colon, it validates the XMLNS 1.0
QNamesymbol
- 如果第一个字符是冒号,则它仅验证XML 1.0
Name符号。 - 如果第一个字符不是冒号,则验证XMLNS 1.0
QName符号
So the first character decides about the comparison mode.
所以第一个字符决定了比较模式。
A regular expression is specifically written what to check for, here the XML 1.0 Namesymbol.
正则表达式是专门编写要检查的内容,这里是 XML 1.0Name符号。
You can achieve the same with DOMElementby prefixing the name with a colon:
您可以DOMElement通过在名称前加上冒号来实现相同的效果:
function isValidXmlName($name)
{
try {
new DOMElement(":$name");
return TRUE;
} catch (DOMException $e) {
return FALSE;
}
}
To explicitly check for the QNamethis can be achieved by turning it into a PrefixedNamein case it is a UnprefixedName:
要明确检查QNamethis 可以通过将其转换为 a 来实现PrefixedName,以防它是 a UnprefixedName:
function isValidXmlnsQname($qname)
{
$prefixedName = (!strpos($qname, ':') ? 'prefix:' : '') . $qname;
try {
new DOMElement($prefixedName, NULL, 'uri:ns');
return TRUE;
} catch (DOMException $e) {
return FALSE;
}
}
回答by Timo Zimmermann
Use this regex:
使用这个正则表达式:
^_?(?!(xml|[_\d\W]))([\w.-]+)$
^_?(?!(xml|[_\d\W]))([\w.-]+)$
This matches all your four points and allows unicode characters.
这匹配您的所有四个点并允许使用 unicode 字符。
回答by Keith Vinson
If you are using the DotNet framework try XmlConvert.VerifyName. It will tell you if the name is valid, or use XmlConvert.EncodeName to actually convert an invalid name into a valid one...
如果您使用的是 DotNet 框架,请尝试 XmlConvert.VerifyName。它会告诉您名称是否有效,或者使用 XmlConvert.EncodeName 将无效名称实际转换为有效名称...
回答by JamieSee
The expression below should match valid unicode element names excepting xml. Names that start or end with xml will still be allowed. This passes @toscho's ??? test. The one thing I could not figure out a regex for was extenders. The xml element name spec says:
下面的表达式应匹配除 xml 之外的有效 unicode 元素名称。仍然允许以 xml 开头或结尾的名称。这通过了@toscho 的???测试。我无法找出正则表达式的一件事是扩展程序。xml 元素名称规范说:
[4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
[5] Name ::= (Letter | '_' | ':') (NameChar)*
[4] NameChar ::= 字母 | 数字 | '.' | '-' | '_' | ':' | 组合字符 | 扩展器
[5] Name ::= (Letter | '_' | ':') (NameChar)*
But there's no clear definition for a unicode category or class containing extenders.
但是对于包含扩展程序的 unicode 类别或类没有明确的定义。
^[\p{L}_:][\p{N}\p{L}\p{Mc}.\-|:]*((?<!xml)|xml)$
回答by Viktor Stolbovoy
XML, xml and etc are valid tags, they are just "reserved for standardization in this or future versions of this specification" which likely will never happen. Please check the real standard at https://www.w3.org/TR/REC-xml/. The w3school article is inaccurate.
XML、xml 等是有效的标签,它们只是“保留用于本规范的此版本或未来版本中的标准化”,这可能永远不会发生。请在https://www.w3.org/TR/REC-xml/查看真正的标准。w3school 的文章不准确。
回答by Sean Vieira
This should give you roughly what you need [Assuming you are using Unicode]:
(Note:This is completely untested.)
这应该给你大概你需要什么[假设你正在使用Unicode]:
(注:这是完全未经测试)
[^\p{P}xX0-9][^mMlL\s]{2}[\w\p{P}0-9-]
\p{P}is the syntax for Unicode Punctuation marksin PHP's regular expression syntax.
\p{P}是PHP 正则表达式语法中Unicode 标点符号的语法。
回答by Amy B
if (substr(strtolower($text), 0, 3) != 'xml') && (1 === preg_match('/^\w[^<>]+$/', $text)))
{
// valid;
}

