java - 如何检查字符串是否是有效的 XML 元素名称?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5396164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java - How to check if string is a valid XML element name?
提问by ekeren
do you know function in java that will validate a string to be a good XML element name.
你知道 java 中的函数会验证一个字符串是否是一个好的 XML 元素名称。
Form w3schools:
表格 w3schools:
XML elements must follow these naming rules:
- Names can contain letters, numbers, and other characters
- Names cannot start with a number or punctuation character
- Names cannot start with the letters xml (or XML, or Xml, etc)
- Names cannot contain spaces
XML 元素必须遵循以下命名规则:
- 名称可以包含字母、数字和其他字符
- 名称不能以数字或标点符号开头
- 名称不能以字母 xml(或 XML、或 Xml 等)开头
- 名称不能包含空格
I found other questions that offered regex solutions, isn't there a function that already does that?
我发现了其他提供正则表达式解决方案的问题,是不是已经有一个功能可以做到这一点?
回答by lavinio
If you are using Xerces XML parser, you can use the XMLChar (or XML11Char) class isValidName()
method, like this:
如果您使用 Xerces XML 解析器,则可以使用 XMLChar(或 XML11Char)类isValidName()
方法,如下所示:
org.apache.xerces.util.XMLChar.isValidName(String name)
There is also sample code available herefor isValidName
.
这里也有示例代码可用于isValidName
.
回答by Mike Samuel
The relevant production from the spec is http://www.w3.org/TR/xml/#NT-Name
规范中的相关产品是http://www.w3.org/TR/xml/#NT-Name
Name ::== NameStartChar NameChar *
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
名称 ::== NameStartChar NameChar *
NameStartChar ::= ":" | [AZ] | "_" | [az] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | “。” | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
So a regex to match it is
所以匹配它的正则表达式是
"^[:A-Z_a-z\u00C0\u00D6\u00D8-\u00F6\u00F8-\u02ff\u0370-\u037d"
+ "\u037f-\u1fff\u200c\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\ud7ff"
+ "\uf900-\ufdcf\ufdf0-\ufffd\x10000-\xEFFFF]"
+ "[:A-Z_a-z\u00C0\u00D6\u00D8-\u00F6"
+ "\u00F8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c\u200d\u2070-\u218f"
+ "\u2c00-\u2fef\u3001-\udfff\uf900-\ufdcf\ufdf0-\ufffd\-\.0-9"
+ "\u00b7\u0300-\u036f\u203f-\u2040]*\Z"
If you want to deal with namespaced names, you need to make sure that there is at most one colon, so
如果要处理命名空间名称,则需要确保最多有一个冒号,因此
"^[A-Z_a-z\u00C0\u00D6\u00D8-\u00F6\u00F8-\u02ff\u0370-\u037d"
+ "\u037f-\u1fff\u200c\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\udfff"
+ "\uf900-\ufdcf\ufdf0-\ufffd]"
+ "[A-Z_a-z\u00C0\u00D6\u00D8-\u00F6\u00F8-\u02ff\u0370-\u037d"
+ "\u037f-\u1fff\u200c\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\udfff"
+ "\uf900-\ufdcf\ufdf0-\ufffd\-\.0-9\u00b7\u0300-\u036f\u203f-\u2040]*"
+ "(?::[A-Z_a-z\u00C0\u00D6\u00D8-\u00F6\u00F8-\u02ff\u0370-\u037d"
+ "\u037f-\u1fff\u200c\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\udfff"
+ "\uf900-\ufdcf\ufdf0-\ufffd]"
+ "[A-Z_a-z\u00C0\u00D6\u00D8-\u00F6\u00F8-\u02ff\u0370-\u037d"
+ "\u037f-\u1fff\u200c\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\udfff"
+ "\uf900-\ufdcf\ufdf0-\ufffd\-\.0-9\u00b7\u0300-\u036f\u203f-\u2040]*)?\Z"
(missed another 03gf; changed both to 036f)
(又漏了一个 03gf;都改成了 036f)
回答by roghughe
Using the org.apache.xerces utilities is a good way to go; however, if you need to stick to Java code that's part of the standard Java API then the following code will do it:
使用 org.apache.xerces 实用程序是一个不错的方法;但是,如果您需要坚持作为标准 Java API 一部分的 Java 代码,那么以下代码将做到这一点:
public void parse(String xml) throws Exception {
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(new DefaultHandler());
InputSource source = new InputSource(new ByteArrayInputStream(xml.getBytes()));
parser.parse(source);
}
回答by pemistahl
As a current addition to the accepted answer:
作为已接受答案的当前补充:
At least Oracle's JDK 1.8 (probably older ones as well) use the Xerces parser internally in the non-public com.sun.*
packages. You should never directly use any implementations from those classes as they may change without further notice in future versions of the JDK! However, the required code for the xml element name validity check is very well encapsulated and can be copied out to your own code. This way, you can avoid another dependency to an external library.
至少 Oracle 的 JDK 1.8(也可能是较旧的)在非公共com.sun.*
包中内部使用 Xerces 解析器。永远不要直接使用这些类的任何实现,因为它们可能会在 JDK 的未来版本中更改,恕不另行通知!但是,xml元素名称有效性检查所需的代码封装得很好,可以复制到自己的代码中。这样,您可以避免对外部库的另一个依赖。
This is the required code taken from the internal class com.sun.org.apache.xerces.internal.util.XMLChar
:
这是从内部类中获取的所需代码com.sun.org.apache.xerces.internal.util.XMLChar
:
public class XMLChar {
/** Character flags. */
private static final byte[] CHARS = new byte[1 << 16];
/** Name start character mask. */
public static final int MASK_NAME_START = 0x04;
/** Name character mask. */
public static final int MASK_NAME = 0x08;
static {
// Initializing the Character Flag Array
// Code generated by: XMLCharGenerator.
CHARS[9] = 35;
CHARS[10] = 19;
CHARS[13] = 19;
// ...
// the entire static block must be copied
}
/**
* Check to see if a string is a valid Name according to [5]
* in the XML 1.0 Recommendation
*
* @param name string to check
* @return true if name is a valid Name
*/
public static boolean isValidName(String name) {
final int length = name.length();
if (length == 0) {
return false;
}
char ch = name.charAt(0);
if (!isNameStart(ch)) {
return false;
}
for (int i = 1; i < length; ++i) {
ch = name.charAt(i);
if (!isName(ch)) {
return false;
}
}
return true;
}
/**
* Returns true if the specified character is a valid name start
* character as defined by production [5] in the XML 1.0
* specification.
*
* @param c The character to check.
*/
public static boolean isNameStart(int c) {
return c < 0x10000 && (CHARS[c] & MASK_NAME_START) != 0;
}
/**
* Returns true if the specified character is a valid name
* character as defined by production [4] in the XML 1.0
* specification.
*
* @param c The character to check.
*/
public static boolean isName(int c) {
return c < 0x10000 && (CHARS[c] & MASK_NAME) != 0;
}
}