如何将字符从Oracle编码为XML?
时间:2020-03-06 14:57:59 来源:igfitidea点击:
在我的环境中,我使用Java将结果集序列化为XML。
它基本上是这样的:
//foreach column of each row xmlHandler.startElement(uri, lname, "column", attributes); String chars = rs.getString(i); xmlHandler.characters(chars.toCharArray(), 0, chars.length()); xmlHandler.endElement(uri, lname, "column");
XML在Firefox中如下所示:
<row num="69004"> <column num="1">10069</column> <column num="2">sd</column> <column num="3">FCVolume </column> </row>
但是,当我解析XML时,我得到了一个
org.xml.sax.SAXParseException: Character reference "" is an invalid XML character.
现在我的问题是:我必须替换哪些字符,或者如何编码我的字符,以使它们成为有效的XML?
解决方案
可扩展标记语言(XML)1.0说:
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.
如果使用CDATA,则可以跳过编码:
<column num="1"><![CDATA[10069]]></column> <column num="2"><![CDATA[sd&]]></column>
我在Xml Spec中找到了一个有趣的列表:
根据该列表,不鼓励使用字符#26(十六进制:#x1A)。
The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters
查看完整范围。
此代码从字符串中替换所有无效的Xml Utf8:
public String stripNonValidXMLCharacters(String in) { StringBuffer out = new StringBuffer(); // Used to hold the output. char current; // Used to reference the current character. if (in == null || ("".equals(in))) return ""; // vacancy test. for (int i = 0; i < in.length(); i++) { current = in.charAt(i); if ((current == 0x9) || (current == 0xA) || (current == 0xD) || ((current >= 0x20) && (current <= 0xD7FF)) || ((current >= 0xE000) && (current <= 0xFFFD)) || ((current >= 0x10000) && (current <= 0x10FFFF))) out.append(current); } return out.toString(); }
它取自无效的XML字符:当有效的UTF8并不意味着有效的XML时
但是与此同时,我仍然遇到UTF-8兼容性问题:
org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence
从servlet读取XML并将XML返回为UTF-8的XML后,我只是尝试了将Contenttype设置为以下情况会发生的情况:
response.setContentType("text/xml;charset=utf-8");
它奏效了....
我们正在运行哪个版本的JRE?萨克斯项目说:
J2SE 1.4 bundles an old version of SAX2. How do I make SAX2 r2 or later available?