java 使用 DOM 解析 xml,DOCTYPE 被删除

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6637076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 16:44:18  来源:igfitidea点击:

Parsing xml with DOM, DOCTYPE gets erased

javaxmldomdoctype

提问by KitAndKat

how come dom with java erases doctype when editing xml ?

编辑 xml 时,dom with java 如何擦除 doctype?

got this xml file :

得到这个 xml 文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

my function is very basic :

我的功能非常基本:

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id ! how can we keep the doctype ? why does it erase it? I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...

它正在工作,但文档类型被删除了!我刚刚得到了整个文档,但没有 doctype 部分,这对我来说很重要,因为它允许我通过 id 检索!我们如何保留文档类型?为什么它会删除它?我尝试了许多带有 outputkeys 或 omImpl.createDocumentType 的解决方案,但这些都不起作用......

thank you !

谢谢 !

采纳答案by jasso

(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)

(这个回复在某种程度上只是对@Grzegorz Szpetkowski 的回答的补充,为什么它有效)

You lose the doctype definition because you use the Transformclass which produces an XSL transformation. There is no DOCTYPEdeclaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an <!DOCTYPE ... >declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text with disable-output-escaping="yes").

您丢失了 doctype 定义,因为您使用了Transform生成 XSL 转换的类。DOCTYPEXSLT 树模型中没有声明或 docytype 定义对象/节点。当解析器将文档移交给 XSLT 处理器时,文档类型信息将丢失,因此无法保留或复制。XSLT 对输出树的序列化提供了一些控制,包括添加<!DOCTYPE ... >带有公共或系统标识符的声明。这些标识符的值需要事先知道,不能从输入树中读取。也不支持创建或保留嵌入的 DTD 或实体声明(尽管解决此障碍的一种方法是将其输出为带有 的文本disable-output-escaping="yes")。

In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.

为了保留 DTD,您需要使用 XML 序列化程序而不是 XSL 转换来输出文档,就像 Grzegorz 已经建议的那样。

回答by Grzegorz Szpetkowski

Your input XML is not valid. That should be:

您的输入 XML 无效。那应该是:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

As @DevNull wrote to be fully valid you can't write <station id="5">test1</station>(however for Java it works anyway even with that issue).

正如@DevNull 写的完全有效一样,你不能写<station id="5">test1</station>(但是对于 Java,即使有这个问题,它仍然可以工作)。



DOCTYPEis erased in output XML document:

DOCTYPE在输出 XML 文档中被删除:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

I didn't find solution to missing DTD yet, but as workaround you can set external DTD:

我还没有找到丢失 DTD 的解决方案,但作为解决方法,您可以设置外部 DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

Result (example) document:

结果(示例)文档:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>


EDIT:

编辑:

I don't think it's possible to save inline DTD using Transformerclass (vide here). If you can't use external DTD reference, then you can DOM Level 3 LSSerializerclass instead:

我认为不可能使用Transformer类来保存内联 DTD (请参阅此处)。如果您不能使用外部 DTD 引用,那么您可以LSSerializer改为使用DOM Level 3类:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();

Output with wanted DTD (I can't see any option to add standalone="yes"using LSSerializer...):

输出与希望DTD(我看不到任何选项,添加standalone="yes"使用LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 

Another approach is to use Apache Xerces2-J XMLSerializerclass:

另一种方法是使用 Apache Xerces2-JXMLSerializer类:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);

Result:

结果:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

回答by Daniel Haley

@Grzegorz Szpetkowski has a good idea with using an external DTD. However, the XML is still invalid if you keep those station/@id values.

@Grzegorz Szpetkowski 有一个使用外部 DTD 的好主意。但是,如果您保留这些 station/@id 值,XML 仍然无效。

Any attribute with the type "ID" can't have a value that starts with a digit. You'll have to add something to it, like "s" for station:

任何类型为“ID”的属性都不能有以数字开头的值。你必须添加一些东西,比如“s”代表车站:

<!DOCTYPE favoris [
<!ELEMENT favoris (station*)      > 
<!ELEMENT station (#PCDATA)       > 
<!ATTLIST station 
          id       ID   #REQUIRED > 
]>
<favoris>
  <station id="s5">test1</station>
  <station id="s6">test1</station>
  <station id="s8">test1</station>
</favoris>

回答by mmarinero

I had almost the same problem and found thiswhich works with transform. It is limited since it only allows to reference the dtd and it will require some work if the doctype of the document can vary. It was enough in my case though, I only needed to hardcode the xhtml doctype after a transformation.

我几乎同样的问题,结果发现与改造工程。它是有限的,因为它只允许引用 dtd,如果文档的 doctype 可能会有所不同,它将需要一些工作。不过就我而言,这已经足够了,我只需要在转换后对 xhtml doctype 进行硬编码。

xformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "publicId");
xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "systemId");