C# 将可能以空字符结尾的 ascii byte[] 转换为字符串的最快方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/144176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 15:27:13  来源:igfitidea点击:

Fastest way to convert a possibly-null-terminated ascii byte[] to a string?

提问by Wayne Bloss

I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:

我需要将一个(可能)以空字符结尾的 ascii 字节数组转换为 C# 中的字符串,我发现的最快方法是使用我的 UnsafeAsciiBytesToString 方法,如下所示。此方法使用 String.String(sbyte*) 构造函数,该构造函数在其备注中包含警告:

"The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).

“假定 value 参数指向一个表示使用默认 ANSI 代码页(即 Encoding.Default 指定的编码方法)编码的字符串的数组。

Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. *...

注意:* 因为默认的 ANSI 代码页是系统相关的,所以这个构造函数从相同的有符号字节数组创建的字符串在不同的系统上可能会有所不同。*...

* If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. *"

* 如果指定的数组不是以 null 结尾的,则此构造函数的行为取决于系统。例如,这种情况可能会导致访问冲突。*"

Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?

现在,我确信字符串的编码方式永远不会改变……但是运行我的应用程序的系统上的默认代码页可能会改变。那么,有什么理由让我不应该为此目的使用 String.String(sbyte*) 尖叫吗?

using System;
using System.Text;

namespace FastAsciiBytesToString
{
    static class StringEx
    {
        public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
        {
            int maxIndex = offset + maxLength;

            for( int i = offset; i < maxIndex; i++ )
            {
                /// Skip non-nulls.
                if( buffer[i] != 0 ) continue;
                /// First null we find, return the string.
                return Encoding.ASCII.GetString(buffer, offset, i - offset);
            }
            /// Terminating null not found. Convert the entire section from offset to maxLength.
            return Encoding.ASCII.GetString(buffer, offset, maxLength);
        }

        public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
        {
            string result = null;

            unsafe
            {
                fixed( byte* pAscii = &buffer[offset] )
                { 
                    result = new String((sbyte*)pAscii);
                }
            }

            return result;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };

            string result = asciiBytes.AsciiBytesToString(3, 6);

            Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            /// Non-null terminated test.
            asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            Console.ReadLine();
        }
    }
}

采纳答案by Jon Skeet

Any reason not to use the String(sbyte*, int, int)constructor? If you've worked out which portion of the buffer you need, the rest should be simple:

有什么理由不使用String(sbyte*, int, int)构造函数?如果您已经确定了您需要缓冲区的哪一部分,剩下的应该很简单:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, length);
       }
    }
}

If you need to look first:

如果您需要先查看:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
    int end = offset;
    while (end < buffer.Length && buffer[end] != 0)
    {
        end++;
    }
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, end - offset);
       }
    }
}

If this truly is an ASCII string (i.e. all bytes are less than 128) then the codepage problem shouldn't be an issue unless you've got a particularlystrange default codepage which isn't based on ASCII.

如果这确实是一个 ASCII 字符串(即所有字节都小于 128),那么代码页问题应该不是问题,除非您有一个特别奇怪的不是基于 ASCII 的默认代码页。

Out of interest, have you actually profiled your application to make sure that this is really the bottleneck? Do you definitely need the absolute fastest conversion, instead of one which is more readable (e.g. using Encoding.GetString for the appropriate encoding)?

出于兴趣,您是否实际分析了您的应用程序以确保这确实是瓶颈?您是否肯定需要绝对最快的转换,而不是更具可读性的转换(例如,使用 Encoding.GetString 进行适当的编码)?

回答by Jeffrey L Whitledge

One possibility to consider: check that the default code-page is acceptable and use that information to select the conversion mechanism at run-time.

要考虑的一种可能性:检查默认代码页是否可接受,并在运行时使用该信息选择转换机制。

This could also take into account whether the string is in fact null-terminated, but once you've done that, of course, the speed gains my vanish.

这也可以考虑字符串实际上是否以空值结尾,但是一旦你这样做了,当然,速度就会消失。

回答by Pat

I'm not sure of the speed, but I found it easiest to use LINQ to remove the nulls before encoding:

我不确定速度,但我发现在编码之前使用 LINQ 删除空值最简单:

string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());

回答by Adam Pierce

This is a bit ugly but you don't have to use unsafe code:

这有点难看,但您不必使用不安全的代码:

string result = "";
for (int i = 0; i < data.Length && data[i] != 0; i++)
   result += (char)data[i];

回答by Vladimir Poslavskiy

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestProject1
{
    class Class1
    {
    static public string cstr_to_string( byte[] data, int code_page)
    {
        Encoding Enc = Encoding.GetEncoding(code_page);  
        int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
        if (inx >= 0)
          return (Enc.GetString(data, 0, inx));
        else 
          return (Enc.GetString(data)); 
    }

    }
}

回答by euwe

s = s.Substring(0, s.IndexOf((char) 0));

回答by Harald Coppoolse

An easy / safe / fast way to convert byte[] objects to strings containing their ASCII equivalent and vice versa using the .NET class System.Text.Encoding. The class has a static function that returns an ASCII encoder:

使用 .NET 类 System.Text.Encoding 将 byte[] 对象转换为包含其 ASCII 等效项的字符串的简单/安全/快速方法,反之亦然。该类有一个返回 ASCII 编码器的静态函数:

From String to byte[]:

从字符串到字节[]:

string s = "Hello World!"
byte[] b = System.Text.Encoding.ASCII.GetBytes(s);

From byte[] to string:

从字节[]到字符串:

byte[] byteArray = new byte[] {0x41, 0x42, 0x09, 0x00, 0x255};
string s = System.Text.Encoding.ASCII.GetString(byteArray);

回答by user3042599

Oneliner (assuming the buffer actually contains ONE well formatted null terminated string):

Oneliner(假设缓冲区实际上包含一个格式良好的空终止字符串):

String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0);

回答by Heinzi

Just for completeness, you can also use built-in methods of the .NET framework to do this:

为了完整起见,您还可以使用 .NET 框架的内置方法来执行此操作:

var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
try
{
    return Marshal.PtrToStringAnsi(handle.AddrOfPinnedObject());
}
finally
{
    handle.Free();
}

Advantages:

好处:

  • It doesn't require unsafe code (i.e., you can also use this method for VB.NET) and
  • it also works for "wide" (UTF-16) strings, if you use Marshal.PtrToStringUniinstead.
  • 它不需要不安全的代码(即,您也可以将此方法用于 VB.NET)和
  • 如果您使用它,它也适用于“宽”(UTF-16)字符串Marshal.PtrToStringUni