使用 C# 解析 XML 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16901828/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 08:06:06  来源:igfitidea点击:

Parsing XML file using C#?

c#xmlxml-parsingtags

提问by jerryh91

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).

我是 XML 和 C# 的新手;我试图找到一种方法来有效解析给定的 xml 文件以检索相关数值,基于“proj_title”value=heat_run 或任何其他可能的值。例如,计算特定测试运行的持续时间 (proj_end val-proj_start val)。

ex.xml:

<proj ID="2">
      <proj_title>heat_run</proj_title>
      <proj_start>100</proj_start>
      <proj_end>200</proj_end>
</proj>

... We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??

...我们无法通过项目 ID 进行搜索,因为此值在测试运行之间不是固定的。上面的文件很大:~8mb,有~2000 个标签,名称为 proj_title。有没有一种有效的方法可以首先找到所有带有 proj_title="heat_run" 的标签名称,然后使用 C# 检索此特定 proj_title 的 proj 开始和结束值?

Here's my current C# code:

这是我当前的 C# 代码:

public class parser
{
     public static void Main()
     {
         XmlDocument xmlDoc= new XmlDocument();
         xmlDoc.Load("ex.xml");

         //~2000 tags w/ proj_title
         //any more efficient way to just look for proj_title="heat_run" specifically?
         XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
     }
}    

采纳答案by wgraham

You can use XPath to find all nodes that match, for example:

您可以使用 XPath 查找所有匹配的节点,例如:

XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")

XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")

matcheswill contain all projnodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp

matches将包含proj与标准匹配的所有节点。了解有关 XPath 的更多信息:http: //www.w3schools.com/xsl/xpath_syntax.asp

MSDN Documentation on SelectNodes

SelectNodes 上的 MSDN 文档

回答by Philip Stuyck

Use XDocument and use the LINQ api. http://msdn.microsoft.com/en-us/library/bb387098.aspx

使用 XDocument 并使用 LINQ api。 http://msdn.microsoft.com/en-us/library/bb387098.aspx

If the performance is not what you expect after trying it, you have to look for a sax parser. A Sax parser will not load the whole document in memory and try to apply an xpath expression on everything in memory. It works more in an event driven approach and in some cases this can be a lot faster and does not use as much memory.

如果尝试后性能不是您所期望的,则必须寻找 sax 解析器。Sax 解析器不会将整个文档加载到内存中,而是尝试对内存中的所有内容应用 xpath 表达式。它在事件驱动的方法中更有效,在某些情况下,这可以更快,并且不会使用那么多内存。

There are probably sax parsers for .NET around there, haven't used them myself for .NET but I did for C++.

那里可能有用于 .NET 的 sax 解析器,我自己没有将它们用于 .NET,但我为 C++ 使用过。

回答by Jon Skeet

8MB really isn't very large at all by modern standards. Personally I'd use LINQ to XML:

按照现代标准,8MB 真的不是很大。我个人会使用 LINQ to XML:

XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
                  .Where(x => (string) x == "heat_run")
                  .Select(x => x.Parent) // Just for simplicity
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });

foreach (var project in projects)
{
    Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}

(Obviously adjust this to your own requirements - it's not really clear what you need to do based on the question.)

(显然,根据您自己的要求进行调整 - 根据问题,您需要做什么并不是很清楚。)

Alternative query:

替代查询:

var projects = doc.Descendants("proj")
                  .Where(x => (string) x.Element("proj_title") == "heat_run")
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });