C# 在大文本模板中替换标记的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to replace tokens in a large text template
提问by jeremcc
I have a large text template which needs tokenized sections replaced by other text. The tokens look something like this: ##USERNAME##. My first instinct is just to use String.Replace(), but is there a better, more efficient way or is Replace() already optimized for this?
我有一个大文本模板,需要用其他文本替换标记化部分。令牌看起来像这样:##USERNAME##。我的第一直觉就是使用 String.Replace(),但是否有更好、更有效的方法,或者 Replace() 是否已经为此进行了优化?
采纳答案by Greg Hurlman
System.Text.RegularExpressions.Regex.Replace()is what you seek - IF your tokens are odd enough that you need a regex to find them.
System.Text.RegularExpressions.Regex.Replace()就是您所寻求的 - 如果您的令牌足够奇怪以至于您需要一个正则表达式来找到它们。
Some kind soul did some performance testing, and between Regex.Replace(), String.Replace(), and StringBuilder.Replace(), String.Replace() actually came out on top.
一些好心人做了一些性能测试,在 Regex.Replace()、String.Replace() 和 StringBuilder.Replace() 之间,String.Replace() 实际上名列前茅。
回答by Greg Hurlman
string.Replace is fine. I'd prefer using a Regex, but I'm *** for regular expressions.
string.Replace 很好。我更喜欢使用正则表达式,但我最喜欢正则表达式。
The thing to keep in mind is how big these templates are. If its real big, and memory is an issue, you might want to create a custom tokenizer that acts on a stream. That way you only hold a small part of the file in memory while you manipulate it.
要记住的是这些模板有多大。如果它真的很大,并且内存是一个问题,您可能想要创建一个作用于流的自定义标记器。这样,您在操作文件时只将一小部分文件保存在内存中。
But, for the naiive implementation, string.Replace should be fine.
但是,对于幼稚的实现, string.Replace 应该没问题。
回答by samjudson
If you are doing multiple replaces on large strings then it might be better to use StringBuilder.Replace(), as the usual performance issues with strings will appear.
如果您对大字符串进行多次替换,那么使用 StringBuilder.Replace() 可能会更好,因为通常会出现字符串的性能问题。
回答by Erik van Brakel
Had to do something similar recently. What I did was:
最近不得不做类似的事情。我所做的是:
- make a method that takes a dictionary (key = token name, value = the text you need to insert)
- Get all matches to your token format (##.+?## in your case I guess, not that good at regular expressions :P) using Regex.Matches(input, regular expression)
- foreach over the results, using the dictionary to find the insert value for your token.
- return result.
- 制作一个接受字典的方法(键=令牌名称,值=您需要插入的文本)
- 使用 Regex.Matches(input, regular expression) 获取与您的令牌格式的所有匹配项(我猜在您的情况下为##.+?##,在正则表达式方面不太擅长:P)
- 对结果进行 foreach,使用字典查找令牌的插入值。
- 返回结果。
Done ;-)
完毕 ;-)
If you want to test your regexes I can suggest the regulator.
如果你想测试你的正则表达式,我可以建议监管者。
回答by Factor Mystic
This is an ideal use of Regular Expressions. Check out this helpful website, the .Net Regular Expressions class, and this very helpful book Mastering Regular Expressions.
这是正则表达式的理想用途。查看这个有用的网站、.Net 正则表达式课程和这本非常有用的书掌握正则表达式。
回答by Slavo
The only situation in which I've had to do this is sending a templated e-mail. In .NET this is provided out of the box by the MailDefinition class. So this is how you create a templated message:
我不得不这样做的唯一情况是发送模板化的电子邮件。在 .NET 中,这是由MailDefinition 类提供的开箱即用的。这就是您创建模板化消息的方式:
MailDefinition md = new MailDefinition();
md.BodyFileName = pathToTemplate;
md.From = "[email protected]";
ListDictionary replacements = new ListDictionary();
replacements.Add("<%To%>", someValue);
// continue adding replacements
MailMessage msg = md.CreateMailMessage("[email protected]", replacements, this);
After this, msg.Body would be created by substituting the values in the template. I guess you can take a look at MailDefinition.CreateMailMessage() with Reflector :). Sorry for being a little off-topic, but if this is your scenario I think it's the easiest way.
在此之后,将通过替换模板中的值来创建 msg.Body。我想您可以使用 Reflector 看一下 MailDefinition.CreateMailMessage() :)。抱歉有点跑题了,但如果这是您的情况,我认为这是最简单的方法。
回答by Gareth Farrington
Regular expressions would be the quickest solution to code up but if you have many different tokens then it will get slower. If performance is not an issue then use this option.
正则表达式将是最快的编码解决方案,但如果您有许多不同的标记,那么它会变慢。如果性能不是问题,则使用此选项。
A better approach would be to define token, like your "##" that you can scan for in the text. Then select what to replace from a hash table with the text that follows the token as the key.
更好的方法是定义标记,例如您可以在文本中扫描的“##”。然后从哈希表中选择要替换的内容,并将令牌后面的文本作为键。
If this is part of a build script then nAnt has a great feature for doing this called Filter Chains. The code for that is open source so you could look at how its done for a fast implementation.
如果这是构建脚本的一部分,那么 nAnt 有一个很棒的功能,称为Filter Chains。该代码是开源的,因此您可以查看它是如何快速实现的。
回答by dguaraglia
Well, depending on how many variables you have in your template, how many templates you have, etc. this might be a work for a full template processor. The only one I've ever used for .NET is NVelocity, but I'm sure there must be scores of others out there, most of them linked to some web framework or another.
好吧,这取决于你的模板中有多少变量,你有多少模板等等。这可能是一个完整的模板处理器的工作。我曾经用于 .NET 的唯一一个是NVelocity,但我相信肯定有很多其他的,其中大多数链接到某个 Web 框架或其他框架。
回答by ThisGuy
If your template is large and you have lots of tokens, you probably don't want walk it and replace the token in the template one by one as that would result in an O(N * M) operation where N is the size of the template and M is the number of tokens to replace.
如果您的模板很大并且您有很多标记,您可能不想遍历它并一个一个替换模板中的标记,因为这会导致 O(N * M) 操作,其中 N 是模板,M 是要替换的标记数。
The following method accepts a template and a dictionary of the keys value pairs you wish to replace. By initializing the StringBuilder to slightly larger than the size of the template, it should result in an O(N) operation (i.e. it shouldn't have to grow itself log N times).
以下方法接受要替换的键值对的模板和字典。通过将 StringBuilder 初始化为略大于模板的大小,它应该会导致 O(N) 操作(即它不应该将自身增长 N 倍)。
Finally, you can move the building of the tokens into a Singleton as it only needs to be generated once.
最后,您可以将令牌的构建移动到单例中,因为它只需要生成一次。
static string SimpleTemplate(string template, Dictionary<string, string> replacements)
{
// parse the message into an array of tokens
Regex regex = new Regex("(##[^#]+##)");
string[] tokens = regex.Split(template);
// the new message from the tokens
var sb = new StringBuilder((int)((double)template.Length * 1.1));
foreach (string token in tokens)
sb.Append(replacements.ContainsKey(token) ? replacements[token] : token);
return sb.ToString();
}
回答by Eric J.
FastReplacerimplements token replacement in O(n*log(n) + m) time and uses 3x the memory of the original string.
FastReplacer在 O(n*log(n) + m) 时间内实现令牌替换,并使用原始字符串的 3 倍内存。
FastReplacer is good for executing many Replace operations on a large string when performance is important.
The main idea is to avoid modifying existing text or allocating new memory every time a string is replaced.
We have designed FastReplacer to help us on a project where we had to generate a large text with a large number of append and replace operations. The first version of the application took 20 seconds to generate the text using StringBuilder. The second improved version that used the String class took 10 seconds. Then we implemented FastReplacer and the duration dropped to 0.1 seconds.
当性能很重要时,FastReplacer 非常适合在大字符串上执行许多替换操作。
主要思想是避免每次替换字符串时修改现有文本或分配新内存。
我们设计了 FastReplacer 来帮助我们完成一个项目,在该项目中我们必须生成包含大量追加和替换操作的大型文本。应用程序的第一个版本使用 StringBuilder 生成文本需要 20 秒。使用 String 类的第二个改进版本需要 10 秒。然后我们实现了 FastReplacer,持续时间下降到 0.1 秒。