C# String.Replace .NET Framework 的内存效率和性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/399798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 01:58:00  来源:igfitidea点击:

Memory Efficiency and Performance of String.Replace .NET Framework

c#.netstring

提问by nick_alot

 string str1 = "12345ABC...\...ABC100000"; 
 // Hypothetically huge string of 100000 + Unicode Chars
 str1 = str1.Replace("1", string.Empty);
 str1 = str1.Replace("22", string.Empty);
 str1 = str1.Replace("656", string.Empty);
 str1 = str1.Replace("77ABC", string.Empty);

 // ...  this replace anti-pattern might happen with upto 50 consecutive lines of code.

 str1 = str1.Replace("ABCDEFGHIJD", string.Empty);

I have inherited some code that does the same as the snippet above. It takes a huge string and replaces (removes) constant smaller strings from the large string.

我继承了一些与上面的代码片段相同的代码。它需要一个巨大的字符串并从大字符串中替换(删除)常量较小的字符串。

I believe this is a very memory intensive process given that new large immutable strings are being allocated in memory for each replace, awaiting death via the GC.

我相信这是一个非常内存密集型的过程,因为每次替换都会在内存中分配新的大型不可变字符串,等待通过 GC 死亡。

1. What is the fastest way of replacing these values, ignoring memory concerns?

1. 忽略内存问题,替换这些值的最快方法是什么?

2. What is the most memory efficient way of achieving the same result?

2. 达到相同结果的最节省内存的方法是什么?

I am hoping that these are the same answer!

我希望这些是相同的答案!

Practical solutions that fit somewhere in between these goals are also appreciated.

适合介于这些目标之间的实用解决方案也受到赞赏。

Assumptions:

假设:

  • All replacements are constant and known in advance
  • Underlying characters do contain some unicode [non-ascii] chars
  • 所有替换都是不变的,并且提前知道
  • 底层字符确实包含一些 unicode [non-ascii] 字符

采纳答案by Jon Skeet

Allcharacters in a .NET string are "unicode chars". Do you mean they're non-ascii? That shouldn't make any odds - unless you run into composition issues, e.g. an "e + acute accent" not being replaced when you try to replace an "e acute".

.NET 字符串中的所有字符都是“unicode 字符”。你的意思是他们是非ASCII?这应该不会有任何问题 - 除非您遇到构图问题,例如,当您尝试替换“e 锐音”时未替换“e + 锐音”。

You could try using a regular expression with Regex.Replace, or StringBuilder.Replace. Here's sample code doing the same thing with both:

您可以尝试使用带有Regex.Replace, 或的正则表达式StringBuilder.Replace。这是对两者执行相同操作的示例代码:

using System;
using System.Text;
using System.Text.RegularExpressions;

class Test
{
    static void Main(string[] args)
    {
        string original = "abcdefghijkl";

        Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);

        string removedByRegex = regex.Replace(original, "");
        string removedByStringBuilder = new StringBuilder(original)
            .Replace("a", "")
            .Replace("c", "")
            .Replace("e", "")
            .Replace("g", "")
            .Replace("i", "")
            .Replace("k", "")
            .ToString();

        Console.WriteLine(removedByRegex);
        Console.WriteLine(removedByStringBuilder);
    }
}

I wouldn't like to guess which is more efficient - you'd have to benchmark with your specific application. The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder.

我不想猜测哪个更有效 - 您必须对您的特定应用程序进行基准测试。regex 方式可能可以一次完成所有操作,但与 StringBuilder 中的许多替换中的每一个相比,该过程将相对占用 CPU。

回答by nick_alot

StringBuilder: http://msdn.microsoft.com/en-us/library/2839d5h5.aspx

StringBuilder:http: //msdn.microsoft.com/en-us/library/2839d5h5.aspx

The performance of the Replace operation itself should be roughly same as string.Replace and according to Microsoft no garbage should be produced.

Replace 操作本身的性能应该与 string.Replace 大致相同,并且根据 Microsoft 的说法,不应产生垃圾。

回答by Ed S.

StringBuilder sb = new StringBuilder("Hello string");
sb.Replace("string", String.Empty);
Console.WriteLine(sb);  

StringBuilder, a mutable string.

StringBuilder,一个可变字符串。

回答by Ahmed Said

if you want a built in class in dotnet i think StringBuilder is the best. to make it manully you can use unsafe code with char* and iterate through your string and replace based on your criteria

如果你想在 dotnet 中创建一个内置类,我认为 StringBuilder 是最好的。为了手动操作,您可以使用带有 char* 的不安全代码并遍历您的字符串并根据您的标准进行替换

回答by dmajkic

Since you have multiple replaces on one string, I wolud recomend you to use RegEx over StringBuilder.

由于您对一个字符串进行了多次替换,因此我建议您在 StringBuilder 上使用 RegEx。

回答by John Leidegren

If you want to be really fast, and I mean really fast you'll have to look beyond the StringBuilder and just write well optimized code.

如果你想非常快,我的意思是非常快,你必须超越 StringBuilder 并编写优化好的代码。

One thing your computer doesn't like to do is branching, if you can write a replace method which operates on a fixed array (char *) and doesn't branch you have great performance.

您的计算机不喜欢做的一件事是分支,如果您可以编写一个对固定数组 (char *) 进行操作且不分支的替换方法,那么您将获得出色的性能。

What you'll be doing is that the replace operation is going to search for a sequence of characters and if it finds any such sub string it will replace it. In effect you'll copy the string and when doing so, preform the find and replace.

您将要做的是替换操作将搜索一个字符序列,如果找到任何这样的子字符串,它将替换它。实际上,您将复制字符串,并在执行此操作时执行查找和替换。

You'll rely on these functions for picking the index of some buffer to read/write. The goal is to preform the replace method such that when nothing has to change you write junk instead of branching.

您将依赖这些函数来选择要读/写的某些缓冲区的索引。目标是执行替换方法,以便在没有任何更改时编写垃圾而不是分支。

You should be able to complete this without a single if statement and remember to use unsafe code. Otherwise you'll be paying for index checking for every element access.

您应该能够在没有单个 if 语句的情况下完成此操作,并记住使用不安全的代码。否则,您将为每个元素访问的索引检查付费。

unsafe
{
    fixed( char * p = myStringBuffer )
    {
        // Do fancy string manipulation here
    }
}

I've written code like this in C# for fun and seen significant performance improvements, almost 300% speed up for find and replace. While the .NET BCL (base class library) performs quite well it is riddled with branching constructs and exception handling this will slow down you code if you use the built-in stuff. Also these optimizations while perfectly sound are not preformed by the JIT-compiler and you'll have to run the code as a release build without any debugger attached to be able to observe the massive performance gain.

为了好玩,我在 C# 中编写了这样的代码,并看到了显着的性能改进,查找和替换的速度几乎提高了 300%。虽然 .NET BCL(基类库)执行得相当好,但它充满了分支结构和异常处理,如果您使用内置的东西,这会减慢您的代码速度。此外,这些优化虽然完美无缺,但不是由 JIT 编译器执行的,您必须将代码作为发布版本运行,而无需附加任何调试器才能观察到巨大的性能提升。

I could provide you with more complete code but it is a substantial amount of work. However, I can guarantee you that it will be faster than anything else suggested so far.

我可以为您提供更完整的代码,但这是大量的工作。但是,我可以向您保证,它会比迄今为止建议的任何其他方法都快。

回答by yano

Here's a quick benchmark...

这是一个快速基准...

        Stopwatch s = new Stopwatch();
        s.Start();
        string replace = source;
        replace = replace.Replace("$TS$", tsValue);
        replace = replace.Replace("$DOC$", docValue);
        s.Stop();

        Console.WriteLine("String.Replace:\t\t" + s.ElapsedMilliseconds);

        s.Reset();

        s.Start();
        StringBuilder sb = new StringBuilder(source);
        sb = sb.Replace("$TS$", tsValue);
        sb = sb.Replace("$DOC$", docValue);
        string output = sb.ToString();
        s.Stop();

        Console.WriteLine("StringBuilder.Replace:\t\t" + s.ElapsedMilliseconds);

I didn't see much difference on my machine (string.replace was 85ms and stringbuilder.replace was 80), and that was against about 8MB of text in "source"...

我在我的机器上没有看到太大的区别(string.replace 是 85 毫秒,stringbuilder.replace 是 80),这与“源”中大约 8MB 的文本相对应......

回答by Robear

1. What is the fastest way of replacing these values, ignoring memory concerns?

1. 忽略内存问题,替换这些值的最快方法是什么?

The fastest way is to build a custom component that's specific to your use case. As of .NET 4.6, There's no class in the BCL designed for multiple string replacements.

最快的方法是构建一个特定于您的用例的自定义组件。从 .NET 4.6 开始,BCL 中没有为多个字符串替换设计的类。

If you NEED something fast out of the BCL, StringBuilder is the fastest BCL component for simple string replacement. The source code can be found here: It's pretty efficient for replacing a single string. Only use Regex if you really need the pattern-matching power of regular expressions. It's slower and a little more cumbersome, even when compiled.

如果您需要从 BCL 中快速获得一些东西,StringBuilder 是用于简单字符串替换的最快的 BCL 组件。源代码可以在这里找到:替换单个字符串非常有效。仅当您确实需要正则表达式的模式匹配功能时才使用正则表达式。即使在编译时,它也更慢且更麻烦。

2. What is the most memory efficient way of achieving the same result?

2. 达到相同结果的最节省内存的方法是什么?

The most memory-efficient way is to perform a filtered stream copy from the source to the destination (explained below). Memory consumption will be limited to your buffer, however this will be more CPU intensive; as a rule of thumb, you're going to trade CPU performance for memory consumption.

最节省内存的方法是执行从源到目标的过滤流复制(如下所述)。内存消耗将仅限于您的缓冲区,但这将更加占用 CPU;根据经验,您将用 CPU 性能换取内存消耗。

Technical Details

技术细节

String replacements are tricky. Even when performing a string replacement in a mutable memory space (such as with StringBuilder), it's expensive. If the replacement string is a different length than original string, you're going to be relocating every character following the replacement string to keep the whole string contiguous. This results in a LOT of memory writes, and even in the case of StringBuilder, causes you to rewrite most of the string in-memory on every call to Replace.

字符串替换很棘手。即使在可变内存空间中执行字符串替换(例如使用StringBuilder),它也是昂贵的。如果替换字符串的长度与原始字符串的长度不同,您将重新定位替换字符串后面的每个字符以保持整个字符串连续。这会导致大量内存写入,甚至在StringBuilder的情况下,也会导致您在每次调用 Replace 时重写内存中的大部分字符串。

So what is the fastest way to do string replacements? Write the new string using a single-pass: Don't let your code go back and have to re-write anything. Writes are more expensive than reads. You're going to have to code this yourself for best results.

那么进行字符串替换的最快方法是什么?使用单次通过编写新字符串:不要让您的代码返回并且必须重新编写任何内容。写入比读取更昂贵。您将不得不自己编写代码以获得最佳结果。

High-Memory Solution

高内存解决方案

The class I've written generates strings based on templates. I place tokens ($ReplaceMe$) in a template which marks places where I want to insert a string later. I use it in cases where XmlWriter is too onerous for XML that's largely static and repetitive, and I need to produce large XML (or JSON) data streams.

我编写的类基于模板生成字符串。我将标记 ($ReplaceMe$) 放在一个模板中,该模板标记了我稍后要插入字符串的位置。我在 XmlWriter 对于主要是静态和重复的 XML 来说过于繁重的情况下使用它,并且我需要生成大型 XML(或 JSON)数据流。

The class works by slicing the template up into parts and places each part into a numbered dictionary. Parameters are also enumerated. The order in which the parts and parameters are inserted into a new string are placed into an integer array. When a new string is generated, the parts and parameters are picked from the dictionary and used to create a new string.

该类的工作原理是将模板分成几部分并将每个部分放入一个编号的字典中。参数也被枚举。部件和参数插入新字符串的顺序被放置到一个整数数组中。生成新字符串时,将从字典中选取部分和参数并用于创建新字符串。

It's neither fully-optimized nor is it bulletproof, but it works great for generating very large data streams from templates.

它既不是完全优化的,也不是防弹的,但它非常适合从模板生成非常大的数据流。

Low-Memory Solution

低内存解决方案

You'll need to read small chunks from the source string into a buffer, search the buffer using an optimized search algorithm, and then write the new string to the destination stream / string. There are a lot of potential caveats here, but it would be memory efficient and a better solution for source data that's dynamic and can't be cached, such as whole-page translations or source data that's too large to reasonably cache. I don't have a sample solution for this handy.

您需要将源字符串中的小块读入缓冲区,使用优化的搜索算法搜索缓冲区,然后将新字符串写入目标流/字符串。这里有很多潜在的警告,但对于动态且无法缓存的源数据,例如整页翻译或太大而无法合理缓存的源数据,它会是内存效率和更好的解决方案。我没有这个方便的示例解决方案。

Sample Code

示例代码

Desired Results

预期结果

<DataTable source='Users'>
  <Rows>
    <Row id='25' name='Administrator' />
    <Row id='29' name='Robert' />
    <Row id='55' name='Amanda' />
  </Rows>
</DataTable>

Template

模板

<DataTable source='$TableName$'>
  <Rows>
    <Row id='
class Program
{
  static string[,] _users =
  {
    { "25", "Administrator" },
    { "29", "Robert" },
    { "55", "Amanda" },
  };

  static StringTemplate _documentTemplate = new StringTemplate(@"<DataTable source='$TableName$'><Rows>$Rows$</Rows></DataTable>");
  static StringTemplate _rowTemplate = new StringTemplate(@"<Row id='
public class StringTemplate
{
  private string _template;
  private string[] _parts;
  private int[] _tokens;
  private string[] _parameters;
  private Dictionary<string, int> _parameterIndices;
  private string[] _replaceGraph;
  private Action<StreamWriter>[] _callbackGraph;
  private bool[] _graphTypeIsReplace;

  public string[] Parameters
  {
    get { return _parameters; }
  }

  public StringTemplate(string template)
  {
    _template = template;
    Prepare();
  }

  public void SetParameter(string name, string replacement)
  {
    int index = _parameterIndices[name] + _parts.Length;
    _replaceGraph[index] = replacement;
    _graphTypeIsReplace[index] = true;
  }

  public void SetParameter(string name, Action<StreamWriter> callback)
  {
    int index = _parameterIndices[name] + _parts.Length;
    _callbackGraph[index] = callback;
    _graphTypeIsReplace[index] = false;
  }

  private static Regex _parser = new Regex(@"$(\w{1,64})$", RegexOptions.Compiled);
  private void Prepare()
  {
    _parameterIndices = new Dictionary<string, int>(64);
    List<string> parts = new List<string>(64);
    List<object> tokens = new List<object>(64);
    int param_index = 0;
    int part_start = 0;

    foreach (Match match in _parser.Matches(_template))
    {
      if (match.Index > part_start)
      {
        //Add Part
        tokens.Add(parts.Count);
        parts.Add(_template.Substring(part_start, match.Index - part_start));
      }


      //Add Parameter
      var param = _template.Substring(match.Index + 1, match.Length - 2);
      if (!_parameterIndices.TryGetValue(param, out param_index))
        _parameterIndices[param] = param_index = _parameterIndices.Count;
      tokens.Add(param);

      part_start = match.Index + match.Length;
    }

    //Add last part, if it exists.
    if (part_start < _template.Length)
    {
      tokens.Add(parts.Count);
      parts.Add(_template.Substring(part_start, _template.Length - part_start));
    }

    //Set State
    _parts = parts.ToArray();
    _tokens = new int[tokens.Count];

    int index = 0;
    foreach (var token in tokens)
    {
      var parameter = token as string;
      if (parameter == null)
        _tokens[index++] = (int)token;
      else
        _tokens[index++] = _parameterIndices[parameter] + _parts.Length;
    }

    _parameters = _parameterIndices.Keys.ToArray();
    int graphlen = _parts.Length + _parameters.Length;
    _callbackGraph = new Action<StreamWriter>[graphlen];
    _replaceGraph = new string[graphlen];
    _graphTypeIsReplace = new bool[graphlen];

    for (int i = 0; i < _parts.Length; i++)
    {
      _graphTypeIsReplace[i] = true;
      _replaceGraph[i] = _parts[i];
    }
  }

  public void GenerateString(Stream output)
  {
    var writer = new StreamWriter(output);
    GenerateString(writer);
    writer.Flush();
  }

  public void GenerateString(StreamWriter writer)
  {
    //Resolve graph
    foreach(var token in _tokens)
    {
      if (_graphTypeIsReplace[token])
        writer.Write(_replaceGraph[token]);
      else
        _callbackGraph[token](writer);
    }
  }

  public void SetReplacements(params string[] parameters)
  {
    int index;
    for (int i = 0; i < _parameters.Length; i++)
    {
      if (!Int32.TryParse(_parameters[i], out index))
        continue;
      else
        SetParameter(index.ToString(), parameters[i]);
    }
  }

  public string GenerateString(int bufferSize = 1024)
  {
    using (var ms = new MemoryStream(bufferSize))
    {
      GenerateString(ms);
      ms.Position = 0;
      using (var reader = new StreamReader(ms))
        return reader.ReadToEnd();
    }
  }

  public string GenerateString(params string[] parameters)
  {
    SetReplacements(parameters);
    return GenerateString();
  }

  public void GenerateString(StreamWriter writer, params string[] parameters)
  {
    SetReplacements(parameters);
    GenerateString(writer);
  }
}
$' name='$' />"); static void Main(string[] args) { _documentTemplate.SetParameter("TableName", "Users"); _documentTemplate.SetParameter("Rows", GenerateRows); Console.WriteLine(_documentTemplate.GenerateString(4096)); Console.ReadLine(); } private static void GenerateRows(StreamWriter writer) { for (int i = 0; i <= _users.GetUpperBound(0); i++) _rowTemplate.GenerateString(writer, _users[i, 0], _users[i, 1]); } }
$' name='$'/> </Rows> </DataTable>

Test Case

测试用例

using System;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

internal static class MeasureTime
{
    internal static TimeSpan Run(Action func, uint count = 1)
    {
        if (count <= 0)
        {
            throw new ArgumentOutOfRangeException("count", "Must be greater than zero");
        }

        long[] arr_time = new long[count];
        Stopwatch sw = new Stopwatch();
        for (uint i = 0; i < count; i++)
        {
            sw.Start();
            func();
            sw.Stop();
            arr_time[i] = sw.ElapsedTicks;
            sw.Reset();
        }

        return new TimeSpan(count == 1 ? arr_time.Sum() : Convert.ToInt64(Math.Round(arr_time.Sum() / (double)count)));
    }
}

public class Program
{
    public static string RandomString(int length)
    {
        Random random = new Random();
        const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        return new String(Enumerable.Range(1, length).Select(_ => chars[random.Next(chars.Length)]).ToArray());
    }

    public static void Main()
    {
        string rnd_str = RandomString(500000);
        Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);
        TimeSpan ts1 = MeasureTime.Run(() => regex.Replace(rnd_str, "!!!"), 10);
        Console.WriteLine("Regex time: {0:hh\:mm\:ss\:fff}", ts1);

        StringBuilder sb_str = new StringBuilder(rnd_str);
        TimeSpan ts2 = MeasureTime.Run(() => sb_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
        Console.WriteLine("StringBuilder time: {0:hh\:mm\:ss\:fff}", ts2);

        TimeSpan ts3 = MeasureTime.Run(() => rnd_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
        Console.WriteLine("String time: {0:hh\:mm\:ss\:fff}", ts3);

        char[] ch_arr = {'a', 'c', 'e', 'g', 'i', 'k'};
        TimeSpan ts4 = MeasureTime.Run(() => new String((from c in rnd_str where !ch_arr.Contains(c) select c).ToArray()), 10);
        Console.WriteLine("LINQ time: {0:hh\:mm\:ss\:fff}", ts4);
    }

}

StringTemplate Source

字符串模板源

##代码##

回答by Ramil Shavaleev

Here is my benchmark:

这是我的基准

##代码##

Regex time: 00:00:00:008

StringBuilder time: 00:00:00:015

String time: 00:00:00:005

正则表达式时间:00:00:00:008

StringBuilder 时间:00:00:00:015

字符串时间:00:00:00:005

LINQ can't process rnd_str (Fatal Error: Memory usage limit was exceeded)

LINQ 无法处理 rnd_str(致命错误:超出内存使用限制)

String.Replace is fastest

String.Replace 是最快的