如何使用带有包含的 XSD 的 Java 验证 XML 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2342808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 06:23:05  来源:igfitidea点击:

How to validate an XML file using Java with an XSD having an include?

javaxmlvalidationxsd

提问by Melanie

I'm using Java 5 javax.xml.validation.Validator to validate XML file. I've done it for one schema that uses only imports and everything works fine. Now I'm trying to validate with another schema that uses import and one include. The problem I have is that element in the main schema are ignored, the validation says it cannot find their declaration.

我正在使用 Java 5 javax.xml.validation.Validator 来验证 XML 文件。我已经为一个只使用导入的模式完成了它,并且一切正常。现在我正在尝试使用另一个使用导入和一个包含的模式进行验证。我遇到的问题是主模式中的元素被忽略,验证说它找不到它们的声明。

Here is how I build the Schema:

这是我构建架构的方式:

InputStream includeInputStream = getClass().getClassLoader().getResource("include.xsd").openStream();
InputStream importInputStream = getClass().getClassLoader().getResource("import.xsd").openStream();
InputStream mainInputStream = getClass().getClassLoader().getResource("main.xsd").openStream();
Source[] sourceSchema = new SAXSource[]{includeInputStream , importInputStream, 
mainInputStream };
Schema schema = factory.newSchema(sourceSchema);

Now here is the extract of the declaration in main.xsd

现在这里是 main.xsd 中声明的摘录

<xsd:schema xmlns="http://schema.omg.org/spec/BPMN/2.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:import="http://www.foo.com/import" targetNamespace="http://main/namespace" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xsd:import namespace="http://www.foo.com/import" schemaLocation="import.xsd"/>
    <xsd:include schemaLocation="include.xsd"/>
    <xsd:element name="element" type="tElement"/>
    <...>
</xsd:schema>

If I copy the code of my included XSD in the main.xsd, it works fine. If I don't, validation doesn't find the declaration of "Element".

如果我将包含的 XSD 的代码复制到 main.xsd 中,它就可以正常工作。如果我不这样做,验证将找不到“元素”的声明。

采纳答案by Stefan De Boey

you need to use an LSResourceResolverfor this to work. please take a look at the sample code below.

您需要使用LSResourceResolver才能使其工作。请看下面的示例代码。

a validate method:

验证方法:

// note that if your XML already declares the XSD to which it has to conform, then there's no need to declare the schemaName here
void validate(String xml, String schemaName) throws Exception {

    DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
    builderFactory.setNamespaceAware(true);

    DocumentBuilder parser = builderFactory
            .newDocumentBuilder();

    // parse the XML into a document object
    Document document = parser.parse(new StringInputStream(xml));

    SchemaFactory factory = SchemaFactory
            .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

    // associate the schema factory with the resource resolver, which is responsible for resolving the imported XSD's
    factory.setResourceResolver(new ResourceResolver());

            // note that if your XML already declares the XSD to which it has to conform, then there's no need to create a validator from a Schema object
    Source schemaFile = new StreamSource(getClass().getClassLoader()
            .getResourceAsStream(schemaName));
    Schema schema = factory.newSchema(schemaFile);

    Validator validator = schema.newValidator();
    validator.validate(new DOMSource(document));
}

the resource resolver implementation:

资源解析器实现:

public class ResourceResolver  implements LSResourceResolver {

public LSInput resolveResource(String type, String namespaceURI,
        String publicId, String systemId, String baseURI) {

     // note: in this sample, the XSD's are expected to be in the root of the classpath
    InputStream resourceAsStream = this.getClass().getClassLoader()
            .getResourceAsStream(systemId);
    return new Input(publicId, systemId, resourceAsStream);
}

 }

The Input implemetation returned by the resource resolver:

资源解析器返回的输入实现:

public class Input implements LSInput {

private String publicId;

private String systemId;

public String getPublicId() {
    return publicId;
}

public void setPublicId(String publicId) {
    this.publicId = publicId;
}

public String getBaseURI() {
    return null;
}

public InputStream getByteStream() {
    return null;
}

public boolean getCertifiedText() {
    return false;
}

public Reader getCharacterStream() {
    return null;
}

public String getEncoding() {
    return null;
}

public String getStringData() {
    synchronized (inputStream) {
        try {
            byte[] input = new byte[inputStream.available()];
            inputStream.read(input);
            String contents = new String(input);
            return contents;
        } catch (IOException e) {
            e.printStackTrace();
            System.out.println("Exception " + e);
            return null;
        }
    }
}

public void setBaseURI(String baseURI) {
}

public void setByteStream(InputStream byteStream) {
}

public void setCertifiedText(boolean certifiedText) {
}

public void setCharacterStream(Reader characterStream) {
}

public void setEncoding(String encoding) {
}

public void setStringData(String stringData) {
}

public String getSystemId() {
    return systemId;
}

public void setSystemId(String systemId) {
    this.systemId = systemId;
}

public BufferedInputStream getInputStream() {
    return inputStream;
}

public void setInputStream(BufferedInputStream inputStream) {
    this.inputStream = inputStream;
}

private BufferedInputStream inputStream;

public Input(String publicId, String sysId, InputStream input) {
    this.publicId = publicId;
    this.systemId = sysId;
    this.inputStream = new BufferedInputStream(input);
}
}

回答by valerian

SchemaFactory schemaFactory = SchemaFactory
                                .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source schemaFile = new StreamSource(getClass().getClassLoader()
                                .getResourceAsStream("cars-fleet.xsd"));
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
StreamSource source = new StreamSource(xml);
validator.validate(source);

回答by AMegmondoEmber

For us the resolveResource looked like this. After some prolog exception and weird Element type "xs:schema" must be followed by either attribute specifications, ">" or "/>". Element type "xs:element" must be followed by either attribute specifications, ">" or "/>". (because of the breakdown of multiple lines)

对我们来说,resolveResource 看起来像这样。在一些 prolog 异常和奇怪的元素类型“xs:schema”之后必须跟随属性规范,“>”或“/>”。元素类型“xs:element”必须后跟属性规范、“>”或“/>”。(因为多条线路的故障)

The path history was needed because of the structure of includes

由于包含的结构,需要路径历史记录

main.xsd (this has include "includes/subPart.xsd")
/includes/subPart.xsd (this has include "./subSubPart.xsd")
/includes/subSubPart.xsd

So the code looks like:

所以代码看起来像:

String pathHistory = "";

@Override
public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
    systemId = systemId.replace("./", "");// we dont need this since getResourceAsStream cannot understand it
    InputStream resourceAsStream = Message.class.getClassLoader().getResourceAsStream(systemId);
    if (resourceAsStream == null) {
        resourceAsStream = Message.class.getClassLoader().getResourceAsStream(pathHistory + systemId);
    } else {
        pathHistory = getNormalizedPath(systemId);
    }
    Scanner s = new Scanner(resourceAsStream).useDelimiter("\A");
    String s1 = s.next()
            .replaceAll("\n"," ") //the parser cannot understand elements broken down multiple lines e.g. (<xs:element \n name="buxing">) 
            .replace("\t", " ") //these two about whitespaces is only for decoration
            .replaceAll("\s+", " ") 
            .replaceAll("[^\x20-\x7e]", ""); //some files has a special character as a first character indicating utf-8 file
    InputStream is = new ByteArrayInputStream(s1.getBytes());

    return new LSInputImpl(publicId, systemId, is);
}

private String getNormalizedPath(String baseURI) {
    return baseURI.substring(0, baseURI.lastIndexOf(System.getProperty("file.separator"))+ 1) ;
}

回答by Ramakrishna

If you wont find an element in xml you will get xml:lang exception. Elements are case sensitive

如果您在 xml 中找不到元素,您将收到 xml:lang 异常。元素区分大小写

回答by burcakulug

I had to make some modifications to this postby AMegmondoEmber

我不得不对AMegmondoEmber 的这篇文章进行一些修改

My main schema file had some includes from sibling folders, and the included files also had some includes from their local folders. I also had to track down the base resource path and relative path of the current resource. This code works for me know, but please keep in mind that it assumes all xsd files have a unique name. If you have some xsd files with same name, but different content at different paths, it will probably give you problems.

我的主架构文件有一些来自同级文件夹的包含,包含的文件也有一些来自其本地文件夹的包含。我还必须追踪当前资源的基本资源路径和相对路径。这段代码对我有用,但请记住,它假定所有 xsd 文件都有唯一的名称。如果您有一些同名的 xsd 文件,但不同路径的内容不同,则可能会给您带来问题。

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.w3c.dom.ls.LSInput;
import org.w3c.dom.ls.LSResourceResolver;

/**
 * The Class ResourceResolver.
 */
public class ResourceResolver implements LSResourceResolver {

    /** The logger. */
    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    /** The schema base path. */
    private final String schemaBasePath;

    /** The path map. */
    private Map<String, String> pathMap = new HashMap<String, String>();

    /**
     * Instantiates a new resource resolver.
     *
     * @param schemaBasePath the schema base path
     */
    public ResourceResolver(String schemaBasePath) {
        this.schemaBasePath = schemaBasePath;
        logger.warn("This LSResourceResolver implementation assumes that all XSD files have a unique name. "
                + "If you have some XSD files with same name but different content (at different paths) in your schema structure, "
                + "this resolver will fail to include the other XSD files except the first one found.");
    }

    /* (non-Javadoc)
     * @see org.w3c.dom.ls.LSResourceResolver#resolveResource(java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String)
     */
    @Override
    public LSInput resolveResource(String type, String namespaceURI,
            String publicId, String systemId, String baseURI) {
        // The base resource that includes this current resource
        String baseResourceName = null;
        String baseResourcePath = null;
        // Extract the current resource name
        String currentResourceName = systemId.substring(systemId
                .lastIndexOf("/") + 1);

        // If this resource hasn't been added yet
        if (!pathMap.containsKey(currentResourceName)) {
            if (baseURI != null) {
                baseResourceName = baseURI
                        .substring(baseURI.lastIndexOf("/") + 1);
            }

            // we dont need "./" since getResourceAsStream cannot understand it
            if (systemId.startsWith("./")) {
                systemId = systemId.substring(2, systemId.length());
            }

            // If the baseResourcePath has already been discovered, get that
            // from pathMap
            if (pathMap.containsKey(baseResourceName)) {
                baseResourcePath = pathMap.get(baseResourceName);
            } else {
                // The baseResourcePath should be the schemaBasePath
                baseResourcePath = schemaBasePath;
            }

            // Read the resource as input stream
            String normalizedPath = getNormalizedPath(baseResourcePath, systemId);
            InputStream resourceAsStream = this.getClass().getClassLoader()
                    .getResourceAsStream(normalizedPath);

            // if the current resource is not in the same path with base
            // resource, add current resource's path to pathMap
            if (systemId.contains("/")) {
                pathMap.put(currentResourceName, normalizedPath.substring(0,normalizedPath.lastIndexOf("/")+1));
            } else {
                // The current resource should be at the same path as the base
                // resource
                pathMap.put(systemId, baseResourcePath);
            }
            Scanner s = new Scanner(resourceAsStream).useDelimiter("\A");
            String s1 = s.next().replaceAll("\n", " ") // the parser cannot understand elements broken down multiple lines e.g. (<xs:element \n name="buxing">)
                    .replace("\t", " ") // these two about whitespaces is only for decoration
                    .replaceAll("\s+", " ").replaceAll("[^\x20-\x7e]", ""); // some files has a special character as a first character indicating utf-8 file
            InputStream is = new ByteArrayInputStream(s1.getBytes());

            return new LSInputImpl(publicId, systemId, is); // same as Input class
        }

        // If this resource has already been added, do not add the same resource again. It throws
        // "org.xml.sax.SAXParseException: sch-props-correct.2: A schema cannot contain two global components with the same name; this schema contains two occurrences of ..."
        // return null instead.
        return null;
    }

    /**
     * Gets the normalized path.
     *
     * @param basePath the base path
     * @param relativePath the relative path
     * @return the normalized path
     */
    private String getNormalizedPath(String basePath, String relativePath){
        if(!relativePath.startsWith("../")){
            return basePath + relativePath;
        }
        else{
            while(relativePath.startsWith("../")){
                basePath = basePath.substring(0,basePath.substring(0, basePath.length()-1).lastIndexOf("/")+1);
                relativePath = relativePath.substring(3);
            }
            return basePath+relativePath;
        }
    }
}

回答by teknopaul

The accepted answer is very verbose, and builds a DOM in memory first, includes seems to work out of the box for me, including relative references.

接受的答案非常冗长,首先在内存中构建一个 DOM,包括对我来说似乎是开箱即用的,包括相对引用。

    SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
    Schema schema = schemaFactory.newSchema(new File("../foo.xsd"));
    Validator validator = schema.newValidator();
    validator.validate(new StreamSource(new File("./foo.xml")));

回答by gil.fernandes

The accepted answer is perfectly ok, but does not work with Java 8 without some modifications. It would also be nice to be able to specify a base path from which the imported schemas are read.

接受的答案是完全可以的,但如果不进行一些修改,则不适用于 Java 8。能够指定从中读取导入模式的基本路径也很好。

I have used in my Java 8 the following code which allows to specify an embedded schema path other than the root path:

我在 Java 8 中使用了以下代码,它允许指定除根路径之外的嵌入式模式路径:

import com.sun.org.apache.xerces.internal.dom.DOMInputImpl;
import org.w3c.dom.ls.LSInput;
import org.w3c.dom.ls.LSResourceResolver;

import java.io.InputStream;
import java.util.Objects;

public class ResourceResolver implements LSResourceResolver {

    private String basePath;

    public ResourceResolver(String basePath) {
        this.basePath = basePath;
    }

    @Override
    public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
        // note: in this sample, the XSD's are expected to be in the root of the classpath
        InputStream resourceAsStream = this.getClass().getClassLoader()
                .getResourceAsStream(buildPath(systemId));
        Objects.requireNonNull(resourceAsStream, String.format("Could not find the specified xsd file: %s", systemId));
        return new DOMInputImpl(publicId, systemId, baseURI, resourceAsStream, "UTF-8");
    }

    private String buildPath(String systemId) {
        return basePath == null ? systemId : String.format("%s/%s", basePath, systemId);
    }
}

This implementation also gives to the user a meaningful message in case the schema cannot be read.

在无法读取模式的情况下,此实现还会向用户提供有意义的消息。

回答by Gordon Daugherty

As user "ulab" points out in a comment on another answer the solution described in this answer(to a separate stackoverflow question) will work for many. Here's the rough outline of that approach:

正如用户“ulab”在对另一个答案的评论中指出的那样,此答案中描述的解决方案(针对单独的 stackoverflow 问题)将适用于许多人。这是该方法的粗略概述:

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
URL xsdURL = this.getResource("/xsd/my-schema.xsd");
Schema schema = schemaFactory.newSchema(xsdURL);

The key to this approach is avoiding handing the schema factory a stream and instead giving it a URL. This way it gets information about the location of the XSD file.

这种方法的关键是避免将模式工厂交给一个流,而是给它一个 URL。通过这种方式,它可以获取有关 XSD 文件位置的信息。

One thing to keep in mind here is that the "schemaLocation" attribute on include and/or import elements will be treated as relative to the classpath location of the XSD file whose URL you've handed to the validator when you use simple file paths in the form "my-common.xsd" or "common/some-concept.xsd".

这里要记住的一件事是,include 和/或 import 元素上的“schemaLocation”属性将被视为相对于 XSD 文件的类路径位置,当您在形式“my-common.xsd”或“common/some-concept.xsd”。

Notes: - In the example above I've placed the schema file into a jar file under an "xsd" folder. - The leading slash in the "getResource" argument tells Java to start at the root of the classloader instead of at the "this" object's package name.

注意: - 在上面的示例中,我将架构文件放入“xsd”文件夹下的 jar 文件中。- “getResource”参数中的前导斜杠告诉 Java 从类加载器的根开始,而不是从“this”对象的包名开始。