从 C# 中的字节数组中删除尾随空值

Question

提问by Kevin

Ok, I am reading in dat files into a byte array. For some reason, the people who generate these files put about a half meg's worth of useless null bytes at the end of the file. Anybody know a quick way to trim these off the end?

好的，我正在将 dat 文件读入一个字节数组。出于某种原因，生成这些文件的人在文件末尾放置了大约半兆的无用空字节。有谁知道一种快速的方法来修剪这些东西？

First thought was to start at the end of the array and iterate backwards until I found something other than a null, then copy everything up to that point, but I wonder if there isn't a better way.

第一个想法是从数组的末尾开始向后迭代，直到找到除 null 以外的其他内容，然后复制到该点的所有内容，但我想知道是否有更好的方法。

To answer some questions: Are you sure the 0 bytes are definitely in the file, rather than there being a bug in the file reading code? Yes, I am certain of that.

回答一些问题：您确定0字节肯定在文件中，而不是文件读取代码中存在错误？是的，我确信这一点。

Can you definitely trim all trailing 0s? Yes.

你肯定能修剪所有尾随的 0 吗？是的。

Can there be any 0s in the rest of the file? Yes, there can be 0's other places, so, no, I can't start at the beginning and stop at the first 0.

文件的其余部分可以有任何 0 吗？是的，其他地方可以有 0，所以，不，我不能从头开始并在第一个 0 处停止。

Answer 1

采纳答案by Jon Skeet

Given the extra questions now answered, it sounds like you're fundamentally doing the right thing. In particular, you have to touch every byte of the file from the last 0 onwards, to check that it only has 0s.

鉴于现在回答了额外的问题，听起来您从根本上做对了。特别是，您必须从最后一个 0 开始触摸文件的每个字节，以检查它是否只有 0。

Now, whether you have to copy everything or not depends on what you're then doing with the data.

现在，您是否必须复制所有内容取决于您对数据的处理方式。

You could perhaps remember the index and keep it with the data or filename.
You could copy the data into a new byte array
If you want to "fix" the file, you could call FileStream.SetLengthto truncate the file

您也许可以记住索引并将其与数据或文件名一起保存。
您可以将数据复制到新的字节数组中
如果你想“修复”文件，你可以调用FileStream.SetLength来截断文件

The "you haveto read every byte between the truncation point and the end of the file" is the critical part though.

“您必须读取截断点和文件末尾之间的每个字节”是关键部分。

Answer 2

回答by Marc Gravell

Assuming 0=null, that is probably your best bet... as a minor tweak, you might want to use Buffer.BlockCopywhen you finally copy the useful data..

假设 0=null，这可能是您最好的选择……作为一个小调整，您可能希望Buffer.BlockCopy在最终复制有用数据时使用。

Answer 3

回答by Rob

How about this:

这个怎么样：

[Test]
public void Test()
{
   var chars = new [] {'a', 'b', 'byte[] data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
bool data_found = false;
byte[] new_data = data.Reverse().SkipWhile(point =>
{
  if (data_found) return false;
  if (point == 0x00) return true; else { data_found = true; return false; }
}).Reverse().ToArray();
', 'c', 'byte[] foo;
// populate foo
int i = foo.Length - 1;
while(foo[i] == 0)
    --i;
// now foo[i] is the last non-zero byte
byte[] bar = new byte[i+1];
Array.Copy(foo, bar, i+1);
', 'var data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
var new_data = data.TakeWhile((v, index) => data.Skip(index).Any(w => w != 0x00)).ToArray();
'};

   File.WriteAllBytes("test.dat", Encoding.ASCII.GetBytes(chars));

   var content = File.ReadAllText("test.dat");

   Assert.AreEqual(6, content.Length); // includes the null bytes at the end

   content = content.Trim('var data = (byte array of file data...);
var index = data.length / 2;
var jmpsize = data.length/2;
while(true)
{
    jmpsize /= 2;//integer division
    if( jmpsize == 0) break;
    byte b1 = data[index];
    byte b2 = data[index + 1];
    if(b1 == 0 && b2 == 0) //too close to the end, go left
        index -=jmpsize;
    else
        index += jmpsize;
}

if(index == data.length - 1) return data.length;
byte b1 = data[index];
byte b2 = data[index + 1];
if(b2 == 0)
{
    if(b1 == 0) return index;
    else return index + 1;
}
else return index + 2;
');

   Assert.AreEqual(4, content.Length); // no more null bytes at the end
                                       // but still has the one in the middle
}

Answer 4

回答by Factor Mystic

There is always a LINQ answer

总有一个 LINQ 答案

    private byte[] trimByte(byte[] input)
    {
        if (input.Length > 1)
        {
            int byteCounter = input.Length - 1;
            while (input[byteCounter] == 0x00)
            {
                byteCounter--;
            }
            byte[] rv = new byte[(byteCounter + 1)];
            for (int byteCounter1 = 0; byteCounter1 < (byteCounter + 1); byteCounter1++)
            {
                rv[byteCounter1] = input[byteCounter1];
            }
            return rv;
        }

Answer 5

回答by Greg Dean

You could just count the number of zero at the end of the array and use that instead of .Length when iterating the array later on. You could encapsulate this however you like. Main point is you don't really need to copy it into a new structure. If they are big, it may be worth it.

您可以只计算数组末尾的零数，并在稍后迭代数组时使用它而不是 .Length。你可以随意封装它。要点是你并不真的需要将它复制到一个新的结构中。如果它们很大，那可能是值得的。

Answer 6

回答by Coderer

I agree with Jon. The critical bit is that you must "touch" every byte from the last one until the first non-zero byte. Something like this:

我同意乔恩。关键是您必须“触摸”从最后一个字节到第一个非零字节的每个字节。像这样的东西：

    /// <summary>
    /// Gets array of bytes from memory stream.
    /// </summary>
    /// <param name="stream">Memory stream.</param>
    public static byte[] GetAllBytes(this MemoryStream stream)
    {
        byte[] result = new byte[stream.Length];
        Array.Copy(stream.GetBuffer(), result, stream.Length);

        return result;
    }

I'm pretty sure that's about as efficient as you're going to be able to make it.

我很确定这与您将能够做到的效率一样高。

Answer 7

回答by Brian J Cardiff

@Factor Mystic,

@因子神秘主义者，

I think there is a shortest way:

我认为有一个最短的方法：

##代码##

Answer 8

回答by luke

if in the file null bytes can be valid values, do you know that the last byte in the file cannot be null. if so, iterating backwards and looking for the first non-null entry is probably best, if not then there is no way to tell where the actual end of the file is.

如果文件中的空字节可以是有效值，您是否知道文件中的最后一个字节不能为空。如果是这样，向后迭代并查找第一个非空条目可能是最好的，如果不是，则无法确定文件的实际结尾位置。

If you know more about the data format, such as there can be no sequence of null bytes longer than two bytes (or some similar constraint). Then you may be able to actually do a binary search for the 'transition point'. This should be much faster than the linear search (assuming that you can read in the whole file).

如果您对数据格式有更多的了解，例如不能有长于两个字节的空字节序列（或一些类似的约束）。然后，您实际上可以对“转换点”进行二分搜索。这应该比线性搜索快得多（假设您可以读取整个文件）。

The basic idea (using my earlier assumption about no consecutive null bytes), would be:

基本思想（使用我之前关于没有连续空字节的假设）是：

##代码##

Answer 9

回答by A.Yaqin

test this :

测试这个：

##代码##

Answer 10

回答by Kirill

In my case LINQ approach never finished ^))) It's to slow to work with byte arrays!

在我的情况下，LINQ 方法从未完成 ^))) 使用字节数组很慢！

Guys, why won't you use Array.Copy() method?

伙计们，你为什么不使用 Array.Copy() 方法？

##代码##

从 C# 中的字节数组中删除尾随空值

提问by Kevin

采纳答案by Jon Skeet

回答by Marc Gravell

回答by Rob

回答by Factor Mystic

回答by Greg Dean

回答by Coderer

回答by Brian J Cardiff

回答by luke

回答by A.Yaqin

回答by Kirill

相关推荐

最近更新

标签

从 C# 中的字节数组中删除尾随空值

提问by Kevin

采纳答案by Jon Skeet

回答by Marc Gravell

回答by Rob

回答by Factor Mystic

回答by Greg Dean

回答by Coderer

回答by Brian J Cardiff

回答by luke

回答by A.Yaqin

回答by Kirill

相关推荐

C# 在目录中创建应用程序快捷方式

C# Action lambda 代码块的限制

C# 可编辑的 WPF GridView 行

在 C# 中确定会话变量为 null 或为空的最佳方法是什么？

相关推荐

最近更新

标签