如何在 C# 中仅反序列化 XML 文档的一部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/369792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to deserialize only part of an XML document in C#
提问by Mike
Here's a fictitious example of the problem I'm trying to solve. If I'm working in C#, and have XML like this:
这是我试图解决的问题的一个虚构示例。如果我在 C# 中工作,并且有这样的 XML:
<?xml version="1.0" encoding="utf-8"?>
<Cars>
<Car>
<StockNumber>1020</StockNumber>
<Make>Nissan</Make>
<Model>Sentra</Model>
</Car>
<Car>
<StockNumber>1010</StockNumber>
<Make>Toyota</Make>
<Model>Corolla</Model>
</Car>
<SalesPerson>
<Company>Acme Sales</Company>
<Position>
<Salary>
<Amount>1000</Amount>
<Unit>Dollars</Unit>
... and on... and on....
</SalesPerson>
</Cars>
the XML inside SalesPerson can be very long, megabytes in size. I want to deserialize the tag, butnot deserialize the SalesPerson XML element instead keeping it in raw form "for later on".
SalesPerson 中的 XML 可能很长,大小为兆字节。我想反序列化标记,但不反序列化 SalesPerson XML 元素,而是将其保留为“稍后”的原始形式。
Essentially I would like to be able to use this as a Objects representation of the XML.
本质上,我希望能够将其用作 XML 的对象表示。
[System.Xml.Serialization.XmlRootAttribute("Cars", Namespace = "", IsNullable = false)]
public class Cars
{
[XmlArrayItem(typeof(Car))]
public Car[] Car { get; set; }
public Stream SalesPerson { get; set; }
}
public class Car
{
[System.Xml.Serialization.XmlElementAttribute("StockNumber")]
public string StockNumber{ get; set; }
[System.Xml.Serialization.XmlElementAttribute("Make")]
public string Make{ get; set; }
[System.Xml.Serialization.XmlElementAttribute("Model")]
public string Model{ get; set; }
}
where the SalesPerson property on the Cars object would contain a stream with the raw xml that is within the <SalesPerson> xml element after being run through an XmlSerializer.
其中 Cars 对象上的 SalesPerson 属性将包含一个流,该流带有原始 xml,该流在通过 XmlSerializer 运行后位于 <SalesPerson> xml 元素内。
Can this be done? Can I choose to only deserialize "part of" an xml document?
这能做到吗?我可以选择仅反序列化 xml 文档的“一部分”吗?
Thanks! -Mike
谢谢!-麦克风
p.s. example xml stolen from How to Deserialize XML document
从如何反序列化 XML 文档中窃取的 ps 示例 xml
采纳答案by user271807
It might be a bit old thread, but i will post anyway. i had the same problem (needed to deserialize like 10kb of data from a file that had more than 1MB). In main object (which has a InnerObject that needs to be deserializer) i implemented a IXmlSerializable interface, then changed the ReadXml method.
这可能是一个有点旧的线程,但无论如何我会发布。我遇到了同样的问题(需要从超过 1MB 的文件中反序列化 10kb 的数据)。在主对象(它有一个需要反序列化的 InnerObject)中,我实现了一个 IXmlSerializable 接口,然后更改了 ReadXml 方法。
We have xmlTextReader as input , the first line is to read till a XML tag:
我们有 xmlTextReader 作为输入,第一行是读取直到一个 XML 标签:
reader.ReadToDescendant("InnerObjectTag"); //tag which matches the InnerObject
Then create XMLSerializer for a type of the object we want to deserialize and deserialize it
然后为我们想要反序列化的对象类型创建 XMLSerializer 并反序列化它
XmlSerializer serializer = new XmlSerializer(typeof(InnerObject));
this.innerObject = serializer.Deserialize(reader.ReadSubtree()); //this gives serializer the part of XML that is for the innerObject data
reader.close(); //now skip the rest
this saved me a lot of time to deserialize and allows me to read just a part of XML (just some details that describe the file, which might help the user to decide if the file is what he wants to load).
这为我节省了大量反序列化时间,并允许我只读取 XML 的一部分(只是描述文件的一些细节,这可能有助于用户确定文件是否是他想要加载的内容)。
回答by Tim Jarvis
You can control how your serialization is done by implementing the ISerializable interface in your class. Note this will also imply a constructor with the method signature (SerializationInfo info, StreamingContext context) and sure you can do what you are asking with that.
您可以通过在类中实现 ISerializable 接口来控制序列化的完成方式。请注意,这也意味着具有方法签名(SerializationInfo 信息、StreamingContext 上下文)的构造函数,并确保您可以使用它来完成您的要求。
However have a close look at whether or not you really need to do this with streaming because if you don't have to use the streaming mechanism, achieving the same thing with Linq to XML will be easier, and, simpler to maintain in the long term (IMO)
然而,仔细看看你是否真的需要用流来做这件事,因为如果你不必使用流机制,用 Linq to XML 实现同样的事情会更容易,而且更容易长期维护期限 (IMO)
回答by MikeD
Typically XML deserialization is an all-or-nothing proposition out of the box, so you'll probably need to customize. If you don't do a full deserialization, you run the risk that the xml is malformed within the SalesPerson element, and so the document is invalid.
通常,XML 反序列化是一种开箱即用的全有或全无的提议,因此您可能需要进行自定义。如果您不进行完整的反序列化,您将面临 SalesPerson 元素中 xml 格式错误的风险,因此文档无效。
If you are willing to accept that risk, you'll probably want to do some basic text parsing to break out the SalesPerson elements into a different document using plain text processing facilities, then process the XML.
如果您愿意接受这种风险,您可能需要进行一些基本的文本解析,以使用纯文本处理工具将 SalesPerson 元素分解为不同的文档,然后处理 XML。
This is a good example of why XML is not always the correct answer.
这是一个很好的例子,说明为什么 XML 并不总是正确的答案。
回答by Anderson Imes
I think the previous commenter is correct in his comment that XML might not be the best choice of a backing store here.
我认为之前的评论者在他的评论中是正确的,即 XML 可能不是此处后备存储的最佳选择。
If you are having issues of scale and aren't taking advantage of some of the other niceties you get with XML, like transforms, you might be better off using a database for your data. The operations you are doing really seem to fit more into that model.
如果您遇到规模问题并且没有利用 XML 获得的其他一些优点,例如转换,那么最好为数据使用数据库。您正在执行的操作似乎更适合该模型。
I know this doesn't really answer your question, but I thought I would highlight an alternate solution you might use. A good database and an appropriate OR mapper like .netTiers, NHibernate, or more recently LINQ to SQL / Entity Framework would probably get you back up and running with minimal changes to the rest of your codebase.
我知道这并不能真正回答您的问题,但我想我会强调您可能使用的替代解决方案。一个好的数据库和一个合适的 OR 映射器(如 .netTiers、NHibernate 或最近的 LINQ to SQL / Entity Framework)可能会让您在对其余代码库进行最少更改的情况下重新启动并运行。
回答by Oppositional
You may control what parts of the Cars class are deserialized by implementing the IXmlSerializableinterface on the Cars class, and then within the ReadXml(XmlReader)method you would read and deserialize the Car elements but when you reach the SalesPerson element you would read its subtree as a string and then construct a Stream over the the textual content using a StreamWriter.
您可以通过在 Cars 类上实现IXmlSerializable接口来控制反序列化 Cars 类的哪些部分,然后在ReadXml(XmlReader)方法中读取和反序列化 Car 元素,但是当您到达 SalesPerson 元素时,您将读取其子树作为字符串,然后使用 StreamWriter 在文本内容上构造一个 Stream。
If you never want the XmlSerializer to write out the SalesPerson element, use the [XmlIgnore] attribute. I am not sure what you want to happen when you seriailize the Cars class to its XML representation. Are you trying to only prevent deserialization of the SalesPerson while still being able to serialize the XML representation of the SalesPerson represented by the Stream?
如果您从不希望 XmlSerializer 写出 SalesPerson 元素,请使用 [XmlIgnore] 属性。当您将 Cars 类序列化为其 XML 表示时,我不确定您想要发生什么。您是否试图只阻止 SalesPerson 的反序列化,同时仍然能够序列化由 Stream 表示的 SalesPerson 的 XML 表示?
I could probably provide a code example of this if you want a concrete implementation.
如果你想要一个具体的实现,我可能会提供一个代码示例。
回答by devlord
If all you want to do is parse out the SalesPerson element but keep it as a string, you should use Xsl Transform rather than "Deserialization". If, on the other hand, you want to parse out the SalesPerson element and only populate an object in memory from all the other non-SalesPerson elements, then Xsl Transform might also be the way to go. If the files are way big, you may consider separating them and using Xsl to combine different xml files so that the SalesPerson I/O only occurs when you need it to.
如果您只想解析 SalesPerson 元素但将其保留为字符串,则应使用 Xsl 转换而不是“反序列化”。另一方面,如果您想解析 SalesPerson 元素并仅从所有其他非 SalesPerson 元素填充内存中的对象,那么 Xsl 转换也可能是可行的方法。如果文件很大,您可以考虑将它们分开并使用 Xsl 组合不同的 xml 文件,以便 SalesPerson I/O 仅在您需要时发生。
回答by John Saunders
Please try defining the SalesPerson property as type XmlElement
. This works for output from ASMX web services, which use XML Serialization. I would think it would work on input as well. I would expect the entire <SalesPerson>
element to wind up in the XmlElement
.
请尝试将 SalesPerson 属性定义为 type XmlElement
。这适用于使用 XML 序列化的 ASMX Web 服务的输出。我认为它也适用于输入。我希望整个<SalesPerson>
元素都在XmlElement
.
回答by Bogdan_Ch
I would suggest you to manually read from Xml, using any lightweight methods, like XmlReader, XPathDocument or LINQ-to-XML.
我建议您使用任何轻量级方法(如 XmlReader、XPathDocument 或 LINQ-to-XML)手动从 Xml 读取。
When you have to read only 3 properties, I suppose you can write code that manually read from that node and have a full control of how it is executed instead of relying on Serialization/Deserialization
当您只需要读取 3 个属性时,我想您可以编写从该节点手动读取的代码并完全控制它的执行方式,而不是依赖序列化/反序列化
回答by Stig Schmidt Nielsson
The accepted answerfrom user271807 is a great solution but I found, that I also needed to set the xml root of the fragment to avoid an exception with an inner exception saying something like this:
来自 user271807的接受答案是一个很好的解决方案,但我发现,我还需要设置片段的 xml 根目录以避免异常,内部异常如下所示:
...xmlns=''> was not expected
This exception was trown when I tried to deserialize only the inner Authentication element of this xml document:
当我尝试仅反序列化此 xml 文档的内部 Authentication 元素时,此异常就出现了:
<?xml version=""1.0"" encoding=""UTF-8""?>
<Api>
<Authentication>
<sessionid>xxx</sessionid>
<errormessage>xxx</errormessage>
</Authentication>
</ApI>
So I ended up creating this extension method as a reusable solution - warning contains a memory leak, see below:
所以我最终创建了这个扩展方法作为一个可重用的解决方案- 警告包含内存泄漏,见下文:
public static T DeserializeXml<T>(this string @this, string innerStartTag = null)
{
using (var stringReader = new StringReader(@this))
using (var xmlReader = XmlReader.Create(stringReader)) {
if (innerStartTag != null) {
xmlReader.ReadToDescendant(innerStartTag);
var xmlSerializer = new XmlSerializer(typeof(T), new XmlRootAttribute(innerStartTag));
return (T)xmlSerializer.Deserialize(xmlReader.ReadSubtree());
}
return (T)new XmlSerializer(typeof(T)).Deserialize(xmlReader);
}
}
Update 20th March 2017:As the comment below points out, there is a memory leak problem when using one of the constructors of XmlSerializer, so I ended up using a caching solution as shown below:
2017 年 3 月 20 日更新:正如下面的评论所指出的,使用 XmlSerializer 的构造函数之一时存在内存泄漏问题,因此我最终使用了如下所示的缓存解决方案:
/// <summary>
/// Deserialize XML string, optionally only an inner fragment of the XML, as specified by the innerStartTag parameter.
/// </summary>
public static T DeserializeXml<T>(this string @this, string innerStartTag = null) {
using (var stringReader = new StringReader(@this)) {
using (var xmlReader = XmlReader.Create(stringReader)) {
if (innerStartTag != null) {
xmlReader.ReadToDescendant(innerStartTag);
var xmlSerializer = CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute(innerStartTag));
return (T) xmlSerializer.Deserialize(xmlReader.ReadSubtree());
}
return (T) CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute("AutochartistAPI")).Deserialize(xmlReader);
}
}
}
/// <summary>
/// A caching factory to avoid memory leaks in the XmlSerializer class.
/// See http://dotnetcodebox.blogspot.dk/2013/01/xmlserializer-class-may-result-in.html
/// </summary>
public static class CachingXmlSerializerFactory {
private static readonly ConcurrentDictionary<string, XmlSerializer> Cache = new ConcurrentDictionary<string, XmlSerializer>();
public static XmlSerializer Create(Type type, XmlRootAttribute root) {
if (type == null) {
throw new ArgumentNullException(nameof(type));
}
if (root == null) {
throw new ArgumentNullException(nameof(root));
}
var key = string.Format(CultureInfo.InvariantCulture, "{0}:{1}", type, root.ElementName);
return Cache.GetOrAdd(key, _ => new XmlSerializer(type, root));
}
public static XmlSerializer Create<T>(XmlRootAttribute root) {
return Create(typeof (T), root);
}
public static XmlSerializer Create<T>() {
return Create(typeof (T));
}
public static XmlSerializer Create<T>(string defaultNamespace) {
return Create(typeof (T), defaultNamespace);
}
public static XmlSerializer Create(Type type) {
return new XmlSerializer(type);
}
public static XmlSerializer Create(Type type, string defaultNamespace) {
return new XmlSerializer(type, defaultNamespace);
}
}