使用 C# 和 .net 3.5 读取 RSS 的问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/210375/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Problems Reading RSS with C# and .net 3.5
提问by dan90266
I have been attempting to write some routines to read RSS and ATOM feeds using the new routines available in System.ServiceModel.Syndication, but unfortunately the Rss20FeedFormatter bombs out on about half the feeds I try with the following exception:
我一直在尝试使用 System.ServiceModel.Syndication 中可用的新例程编写一些例程来读取 RSS 和 ATOM 提要,但不幸的是,Rss20FeedFormatter 在我尝试的大约一半提要上失败了,但有以下例外:
An error was encountered when parsing a DateTime value in the XML.
An error was encountered when parsing a DateTime value in the XML.
This seems to occur whenever the RSS feed expresses the publish date in the following format:
每当 RSS 提要以以下格式表示发布日期时,似乎就会发生这种情况:
Thu, 16 Oct 08 14:23:26 -0700
10 月 16 日星期四 14:23:26 -0700
If the feed expresses the publish date as GMT, things go fine:
如果提要将发布日期表示为 GMT,则一切正常:
Thu, 16 Oct 08 21:23:26 GMT
格林威治标准时间 10 月 16 日星期四 21:23:26
If there's some way to work around this with XMLReaderSettings, I have not found it. Can anyone assist?
如果有某种方法可以使用 XMLReaderSettings 解决此问题,我还没有找到。任何人都可以提供帮助吗?
采纳答案by Oppositional
RSS 2.0 formatted syndication feeds utilize the RFC 822 date-time specificationwhen serializing elements like pubDateand lastBuildDate. The RFC 822 date-time specification is unfortunately a very 'flexible' syntax for expressing the time-zone component of a DateTime.
RSS 2.0 格式的联合提要在序列化pubDate和lastBuildDate等元素时利用RFC 822 日期时间规范。不幸的是,RFC 822 日期时间规范是一种非常“灵活”的语法,用于表达 DateTime 的时区组件。
Time zone may be indicated in several ways. "UT" is Universal Time (formerly called "Greenwich Mean Time"); "GMT" is permitted as a reference to Universal Time. The military standard uses a single character for each zone. "Z" is Universal Time. "A" indicates one hour earlier, and "M" indicates 12 hours earlier; "N" is one hour later, and "Y" is 12 hours later. The letter "J" is not used. The other remaining two forms are taken from ANSI standard X3.51-1975. One allows explicit indication of the amount of offset from UT; the other uses common 3-character strings for indicating time zones in North America.
时区可以以多种方式表示。“UT”是世界时(以前称为“格林威治标准时间”);允许“GMT”作为世界时的参考。军用标准为每个区域使用一个字符。“Z”是世界时。“A”表示提前一小时,“M”表示提前12小时;“N”是一小时后,“Y”是 12 小时后。不使用字母“J”。其余两种形式取自 ANSI 标准 X3.51-1975。一种允许明确指示与 UT 的偏移量;另一种使用常见的 3 字符字符串来指示北美的时区。
I believe the issue involves how the zonecomponent of the RFC 822 date-time value is being processed. The feed formatter appears to not be handling date-times that utilize a local differentialto indicate the time zone.
我相信这个问题涉及如何处理RFC 822 日期时间值的区域组件。提要格式化程序似乎没有处理利用本地差异来指示时区的日期时间。
As RFC 1123 extends the RFC 822 specification, you could try using the DateTimeFormatInfo.RFC1123Pattern("r") to handle converting problamatic date-times, or write your own parsing code for RFC 822 formatted dates. Another option would be to use a third party framework instead of the System.ServiceModel.Syndication namespace classes.
由于 RFC 1123 扩展了 RFC 822 规范,您可以尝试使用DateTimeFormatInfo.RFC1123Pattern("r") 来处理转换有问题的日期时间,或者为 RFC 822 格式的日期编写自己的解析代码。另一种选择是使用第三方框架而不是 System.ServiceModel.Syndication 命名空间类。
It appears there are some known issueswith date-time parsing and the Rss20FeedFormatter that are in the process of being addressed by Microsoft.
Microsoft 正在解决日期时间解析和 Rss20FeedFormatter方面的一些已知问题。
回答by smaclell
Interesting. It would looks like the datetime formatting is not one of the ones naturally expected by the datetime parser. After looking at the feed classes it does not look like you can inject in your own formatting convention for the parser and they it likely uses a specific scheme for validating the feel.
有趣的。看起来日期时间格式不是日期时间解析器自然期望的格式之一。在查看提要类之后,您似乎无法为解析器注入自己的格式约定,并且它们可能使用特定的方案来验证感觉。
You may be able to change how the datetime parser behaves by modifying the culture. I have never done it before so I can't say for sure it would work.
您可以通过修改文化来更改日期时间解析器的行为方式。我以前从未这样做过,所以我不能肯定它会起作用。
Another solution night be to first transform the feed you are trying to read. Likely not the greatest but it could get you around the issue.
另一个解决方案之夜是首先转换您尝试阅读的提要。可能不是最好的,但它可以让你解决这个问题。
Good luck.
祝你好运。
回答by CleverPatrick
Based on the workaround posted in the bug report to Microsoft about thisI made an XmlReader specifically for reading SyndicationFeeds that have non-standard dates.
根据提交给 Microsoft的错误报告中发布的解决方法,我制作了一个 XmlReader,专门用于读取具有非标准日期的 SyndicationFeeds。
The code below is slightly different than the code in the workaround at Microsoft's site. It also takes Oppositional's adviceon using the RFC 1123 pattern.
下面的代码与 Microsoft 站点上的解决方法中的代码略有不同。它还采纳了反对派关于使用 RFC 1123 模式的建议。
Instead of simply calling XmlReader.Create() you need to create the XmlReader from a Stream. I use the WebClient class to get that stream:
您需要从流创建 XmlReader,而不是简单地调用 XmlReader.Create()。我使用 WebClient 类来获取该流:
WebClient client = new WebClient();
using (XmlReader reader = new SyndicationFeedXmlReader(client.OpenRead(feedUrl)))
{
SyndicationFeed feed = SyndicationFeed.Load(reader);
....
//do things with the feed
....
}
Below is the code for the SyndicationFeedXmlReader:
以下是 SyndicationFeedXmlReader 的代码:
public class SyndicationFeedXmlReader : XmlTextReader
{
readonly string[] Rss20DateTimeHints = { "pubDate" };
readonly string[] Atom10DateTimeHints = { "updated", "published", "lastBuildDate" };
private bool isRss2DateTime = false;
private bool isAtomDateTime = false;
public SyndicationFeedXmlReader(Stream stream) : base(stream) { }
public override bool IsStartElement(string localname, string ns)
{
isRss2DateTime = false;
isAtomDateTime = false;
if (Rss20DateTimeHints.Contains(localname)) isRss2DateTime = true;
if (Atom10DateTimeHints.Contains(localname)) isAtomDateTime = true;
return base.IsStartElement(localname, ns);
}
public override string ReadString()
{
string dateVal = base.ReadString();
try
{
if (isRss2DateTime)
{
MethodInfo objMethod = typeof(Rss20FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Static);
Debug.Assert(objMethod != null);
objMethod.Invoke(null, new object[] { dateVal, this });
}
if (isAtomDateTime)
{
MethodInfo objMethod = typeof(Atom10FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Instance);
Debug.Assert(objMethod != null);
objMethod.Invoke(new Atom10FeedFormatter(), new object[] { dateVal, this });
}
}
catch (TargetInvocationException)
{
DateTimeFormatInfo dtfi = CultureInfo.CurrentCulture.DateTimeFormat;
return DateTimeOffset.UtcNow.ToString(dtfi.RFC1123Pattern);
}
return dateVal;
}
}
Again, this is copied almost exactly from the workaround posted on the Microsoft site in the link above. ...except that this one works for me, and the one posted at Microsoft did not.
同样,这几乎完全是从上面链接中 Microsoft 站点上发布的解决方法中复制的。...除了这个对我有用,而在微软发布的那个没有。
NOTE: One bit of customization you may need to do is in the two arrays at the start of the class. Depending on any extraneous fields your non-standard feed might add, you may need to add more items to those arrays.
注意:您可能需要做的一点自定义是在课程开始时的两个数组中。根据您的非标准提要可能添加的任何无关字段,您可能需要向这些数组添加更多项目。
回答by Den Delimarsky
A similar problem still persists in .NET 4.0 and I decided to work with XDocumentinstead of directly invoking SyndicationFeed. I described the applied method (specific to my project here). Can't say it is the best solution, but it certainly can be considered a "backup plan" in case SyndicationFeedfails.
.NET 4.0 中仍然存在类似的问题,我决定使用XDocument而不是直接调用SyndicationFeed。我在这里描述了应用的方法(特定于我的项目)。不能说它是最好的解决方案,但它肯定可以被视为万一SyndicationFeed失败的“备份计划” 。