C#中文本文件解析的最佳方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best method of Textfile Parsing in C#?
提问by Bernard
I want to parse a config file sorta thing, like so:
我想解析一个配置文件,就像这样:
[KEY:Value]
[SUBKEY:SubValue]
Now I started with a StreamReader
, converting lines into character arrays, when I figured there's gotta be a better way. So I ask you, humble reader, to help me.
现在我开始使用StreamReader
,将行转换为字符数组,当我认为必须有更好的方法时。所以我请求你,谦卑的读者,帮助我。
One restriction is that it has to work in a Linux/Mono environment (1.2.6 to be exact). I don't have the latest 2.0 release (of Mono), so try to restrict language features to C# 2.0 or C# 1.0.
一个限制是它必须在 Linux/Mono 环境中工作(确切地说是 1.2.6)。我没有最新的 2.0 版本(Mono),所以尝试将语言功能限制为 C# 2.0 或 C# 1.0。
采纳答案by Orion Edwards
I considered it, but I'm not going to use XML. I am going to be writing this stuff by hand, and hand editing XML makes my brain hurt. :')
我考虑过,但我不打算使用 XML。我将要手工编写这些东西,手工编辑 XML 使我的大脑受到伤害。:')
Have you looked at YAML?
你看过YAML吗?
You get the benefits of XML without all the pain and suffering. It's used extensively in the ruby community for things like config files, pre-prepared database data, etc
您可以享受 XML 的好处,而不必经历所有的痛苦。它在 ruby 社区中被广泛用于配置文件、预先准备的数据库数据等
here's an example
这是一个例子
customer:
name: Orion
age: 26
addresses:
- type: Work
number: 12
street: Bob Street
- type: Home
number: 15
street: Secret Road
There appears to be a C# library here, which I haven't used personally, but yaml is pretty simple, so "how hard can it be?" :-)
这里好像有一个C# 库,我个人没有用过,但是 yaml 很简单,所以“它有多难?” :-)
I'd say it's preferable to inventing your own ad-hoc format (and dealing with parser bugs)
我会说最好是发明自己的临时格式(并处理解析器错误)
回答by Ed S.
It looks to me that you would be better off using an XML based config file as there are already .NET classes which can read and store the information for you relatively easily. Is there a reason that this is not possible?
在我看来,您最好使用基于 XML 的配置文件,因为已经有 .NET 类可以相对轻松地为您读取和存储信息。有没有理由这是不可能的?
@Bernard: It is true that hand editing XML is tedious, but the structure that you are presenting already looks very similar to XML.
@伯纳德: 手动编辑 XML 确实很乏味,但您所呈现的结构看起来已经与 XML 非常相似。
Then yes, has a good method there.
那么是的,那里有一个很好的方法。
回答by Ed S.
You can also use a stack, and use a push/pop algorithm. This one matches open/closing tags.
您还可以使用堆栈,并使用推送/弹出算法。这个匹配打开/关闭标签。
public string check()
{
ArrayList tags = getTags();
int stackSize = tags.Count;
Stack stack = new Stack(stackSize);
foreach (string tag in tags)
{
if (!tag.Contains('/'))
{
stack.push(tag);
}
else
{
if (!stack.isEmpty())
{
string startTag = stack.pop();
startTag = startTag.Substring(1, startTag.Length - 1);
string endTag = tag.Substring(2, tag.Length - 2);
if (!startTag.Equals(endTag))
{
return "Fout: geen matchende eindtag";
}
}
else
{
return "Fout: geen matchende openeningstag";
}
}
}
if (!stack.isEmpty())
{
return "Fout: geen matchende eindtag";
}
return "Xml is valid";
}
You can probably adapt so you can read the contents of your file. Regular expressions are also a good idea.
您可能可以进行调整,以便您可以阅读文件的内容。正则表达式也是一个好主意。
回答by eplawless
I was looking at almost this exact problem the other day: this articleon string tokenizing is exactly what you need. You'll want to define your tokens as something like:
前几天我正在研究几乎这个确切的问题:这篇关于字符串标记化的文章正是您所需要的。您需要将您的令牌定义为:
@"(?<level>\s) | " +
@"(?<term>[^:\s]) | " +
@"(?<separator>:)"
The article does a pretty good job of explaining it. From there you just start eating up tokens as you see fit.
这篇文章很好地解释了它。从那里你开始吃你认为合适的代币。
Protip: For an LL(1) parser(read: easy), tokens cannot share a prefix. If you have abc
as a token, you cannot have ace
as a token
提示:对于LL(1) 解析器(阅读:简单),令牌不能共享前缀。如果你有abc
作为令牌,你不能ace
作为令牌
Note: The article's missing the | characters in its examples, just throw them in.
注意:文章缺少 | 示例中的字符,只需将它们放入即可。
回答by ICR
Using a library is almost always preferably to rolling your own. Here's a quick list of "Oh I'll never need that/I didn't think about that" points which will end up coming to bite you later down the line:
使用库几乎总是比滚动你自己的库更好。这是“哦,我永远不需要那个/我没想过那个”要点的快速列表,这些要点最终会在以后咬你:
- Escaping characters. What if you want a : in the key or ] in the value?
- Escaping the escape character.
- Unicode
- Mix of tabs and spaces (see the problems with Python's white space sensitive syntax)
- Handling different return character formats
- Handling syntax error reporting
- 转义字符。如果你想要一个 : 键或 ] 值怎么办?
- 转义转义字符。
- 统一码
- 混合制表符和空格(请参阅 Python 对空格敏感的语法的问题)
- 处理不同的返回字符格式
- 处理语法错误报告
Like others have suggested, YAML looks like your best bet.
就像其他人建议的那样,YAML 看起来是您最好的选择。
回答by Gishu
Regardless of the persisted format, using a Regex would be the fastest way of parsing. In ruby it'd probably be a few lines of code.
无论持久化格式如何,使用正则表达式将是最快的解析方式。在 ruby 中,它可能是几行代码。
\[KEY:(.*)\]
\[SUBKEY:(.*)\]
These two would get you the Value and SubValue in the first group. Check out MSDN on how to match a regex against a string.
这两个将为您提供第一组中的 Value 和 SubValue。查看 MSDN,了解如何将正则表达式与字符串匹配。
This is something everyone should have in their kitty. Pre-Regex days would seem like the Ice Age.
这是每个人都应该拥有的小猫。Pre-Regex 的日子看起来像是冰河时代。
回答by ICR
@Gishu
@Gishu
Actually once I'd accommodated for escaped characters my regex ran slightly slower than my hand written top down recursive parser and that's without the nesting (linking sub-items to their parents) and error reporting the hand written parser had.
实际上,一旦我适应了转义字符,我的正则表达式运行速度比我手写的自上而下递归解析器稍慢,而且没有嵌套(将子项链接到它们的父项)和手写解析器的错误报告。
The regex was a slightly faster to write (though I do have a bit of experience with hand parsers) but that's without good error reporting. Once you add that it becomes slightly harder and longer to do.
正则表达式的编写速度稍快(尽管我确实有一些手动解析器的经验),但是没有很好的错误报告。一旦你添加了它,它就会变得稍微困难和更长的时间。
I also find the hand written parser easier to understand the intention of. For instance, here is the a snippet of the code:
我也发现手写解析器更容易理解其意图。例如,这里是代码片段:
private static Node ParseNode(TextReader reader)
{
Node node = new Node();
int indentation = ParseWhitespace(reader);
Expect(reader, '[');
node.Key = ParseTerminatedString(reader, ':');
node.Value = ParseTerminatedString(reader, ']');
}
回答by Antoine Aubry
There is another YAML library for .NETwhich is under development. Right now it supports reading YAML streams and has been tested on Windows and Mono. Write support is currently being implemented.
还有另一个用于 .NET 的 YAML 库正在开发中。现在它支持读取 YAML 流,并且已经在 Windows 和 Mono 上进行了测试。目前正在实施写入支持。