在 C# 中测试某些东西是否可解析 XML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18704586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 12:57:25  来源:igfitidea点击:

Testing whether or not something is parseable XML in C#

c#.netxml

提问by user978122

Does anyone know of a quick way to check if a string is parseable as XML in C#? Preferably something quick, low resource, which returns a boolean whether or not it will parse.

有谁知道在 C# 中检查字符串是否可解析为 XML 的快速方法?最好是一些快速,低资源的东西,它返回一个布尔值,无论它是否会解析。

I'm working on a database app which deals with errors that are sometimes stored as XML, and sometimes not. Hence, I'd like to just be able to test the string I grab from the database (contained in a DataTable) very quickly...and not have to resort to any try / catch {} statements or other kludges...unless those are the only way to make it happen.

我正在开发一个数据库应用程序,它处理有时存储为 XML,有时不存储的错误。因此,我希望能够非常快速地测试我从数据库中获取的字符串(包含在 DataTable 中)......而不必求助于任何 try / catch {} 语句或其他杂项......除非这些是实现它的唯一方法。

采纳答案by D Stanley

It sounds like that you sometimes get back XML and sometimes you get back "plain" (non-XML) text.

听起来您有时会返回 XML,有时会返回“纯”(非 XML)文本。

If that's the case you could just check that the text starts with <:

如果是这种情况,您可以检查文本是否以<

if (!string.IsNullOrEmpty(str) && str.TrimStart().StartsWith("<"))
    var doc = XDocument.Parse(str);

Since "plain" messages seem unlikely to start with <this may be reasonable. The only thing you need to decide is what to do in the edge case that you have non-XML text that starts with a <?

由于“普通”消息似乎不太可能<从这开始,这可能是合理的。您唯一需要决定的是在具有以<?开头的非 XML 文本的极端情况下该怎么做。

If it were me I would default to trying to parse it and catching the exception:

如果是我,我会默认尝试解析它并捕获异常:

if (!string.IsNullOrEmpty(str) && str.TrimStart().StartsWith("<"))
{
    try
    {
        var doc = XDocument.Parse(str);
        return //???
    }   
    catch(Exception ex)
        return str;
}
else
{
    return str;   
}

That way the only time you have the overhead of a thrown exception is when you have a message that starts with <but is not valid XML.

这样,只有当您有一条以<XML开头但不是有效 XML的消息时,才会有抛出异常的开销。

回答by John Kraft

You could try to parse the string into an XDocument. If it fails to parse, then you know that it is not valid.

您可以尝试将字符串解析为 XDocument。如果它无法解析,那么您就知道它无效。

string xml = "";
XDocument document = XDocument.Parse(xml);

And if you don't want to have the ugly try/catch visible, you can throw it into an extension method on the string class...

如果你不想让丑陋的 try/catch 可见,你可以把它扔到字符串类的扩展方法中......

public static bool IsValidXml(this string xml)
{
    try
    {
        XDocument.Parse(xml);
        return true;
    }
    catch
    {
        return false;
    }
}

Then your code simply looks like if (mystring.IsValidXml()) {

然后你的代码看起来像 if (mystring.IsValidXml()) {

回答by Gary Walker

The best answer I've seem for test well-formed XML I know of is What is the fastest way to programatically check the well-formedness of XML files in C#?formedness-of-xml-file" It covers using an XMLReader to do this efficiently.

我所知道的测试格式良好的 XML 的最佳答案是什么是在 C# 中以编程方式检查 XML 文件格式良好的最快方法?formness-of-xml-file” 它涵盖了使用 XMLReader 来有效地执行此操作。

回答by Nicholas Carey

The only way you can really find out if something will actually parse is to...try and parse it.

您真正可以确定某些内容是否会真正解析的唯一方法是...尝试解析它。

An XMl document should(but may not) have an XML declaration at the head of the file, following the BOM (if present). It should look something like this:

XMl 文档应该(但可能不)在文件的头部有一个 XML 声明,紧跟在 BOM(如果存在)之后。它应该是这样的:

<?xml version="1.0" encoding="UTF-8" ?>

Though the encoding attribute is, I believe, optional (defaulting to UTF-8. It might also have a standaloneattribute whose value is yesor no. If that is present, that's a pretty good indicator that the document is supposedto be valid XML.

虽然我相信 encoding 属性是可选的(默认为 UTF-8。它也可能有一个standalone值为yes或的属性no。如果存在,这是一个很好的指示,表明文档应该是有效的 XML。

Riffing on @GaryWalker's excellent answer, something like this is about as good as it gets, I think (though the settings might need some tweaking, a custom no-op resolver perhaps). Just for kicks, I generated a 300mb random XML file using XMark xmlgen(http://www.xml-benchmark.org/): validating it with the code below takes 1.7–1.8 seconds elapsed time on my desktop machine.

我认为,@GaryWalker 的出色回答已经很好了(尽管设置可能需要一些调整,也许是自定义的无操作解析器)。只是为了好玩,我使用 XMark xmlgen( http://www.xml-benchmark.org/)生成了一个 300mb 的随机 XML 文件:在我的台式机上用下面的代码验证它需要 1.7-1.8 秒的时间。

public static bool IsMinimallyValidXml( Stream stream )
{
  XmlReaderSettings settings = new XmlReaderSettings
    {
      CheckCharacters              = true                          ,
      ConformanceLevel             = ConformanceLevel.Document     ,
      DtdProcessing                = DtdProcessing.Ignore          ,
      IgnoreComments               = true                          ,
      IgnoreProcessingInstructions = true                          ,
      IgnoreWhitespace             = true                          ,
      ValidationFlags              = XmlSchemaValidationFlags.None ,
      ValidationType               = ValidationType.None           ,
    } ;
  bool isValid ;

  using ( XmlReader xmlReader = XmlReader.Create( stream , settings ) )
  {
    try
    {
      while ( xmlReader.Read() )
      {
        ; // This space intentionally left blank
      }
      isValid = true ;
    }
    catch (XmlException)
    {
      isValid = false ;
    }
  }
  return isValid ;
}

static void Main( string[] args )
{
  string text = "<foo>This &SomeEntity; is about as simple as it gets.</foo>" ;
  Stream stream = new MemoryStream( Encoding.UTF8.GetBytes(text) ) ;
  bool isValid = IsMinimallyValidXml( stream ) ;
  return ;
}