比较 C# 中的二进制文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/968935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:29:10  来源:igfitidea点击:

Compare binary files in C#

c#filecompare

提问by Simon Farrow

I want to compare two binary files. One of them is already stored on the server with a pre-calculated CRC32 in the database from when I stored it originally.

我想比较两个二进制文件。从我最初存储它时起,其中一个已经存储在服务器上,并在数据库中预先计算了 CRC32。

I know that if the CRC is different, then the files are definitely different. However, if the CRC is the same, I don't know that the files are. So, I'm looking for a nice efficient way of comparing the two streams: one from the posted file and one from the file system.

我知道如果CRC不同,那么文件肯定不同。但是,如果 CRC 相同,则我不知道这些文件是否相同。所以,我正在寻找一种比较两个流的好方法:一个来自发布的文件,一个来自文件系统。

I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned.

我不是流方面的专家,但我很清楚,就内存使用而言,我可以很容易地在这里开枪。

采纳答案by Mehrdad Afshari

static bool FileEquals(string fileName1, string fileName2)
{
    // Check the file size and CRC equality here.. if they are equal...    
    using (var file1 = new FileStream(fileName1, FileMode.Open))
        using (var file2 = new FileStream(fileName2, FileMode.Open))
            return FileStreamEquals(file1, file2);
}

static bool FileStreamEquals(Stream stream1, Stream stream2)
{
    const int bufferSize = 2048;
    byte[] buffer1 = new byte[bufferSize]; //buffer size
    byte[] buffer2 = new byte[bufferSize];
    while (true) {
        int count1 = stream1.Read(buffer1, 0, bufferSize);
        int count2 = stream2.Read(buffer2, 0, bufferSize);

        if (count1 != count2)
            return false;

        if (count1 == 0)
            return true;

        // You might replace the following with an efficient "memcmp"
        if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
            return false;
    }
}

回答by albertjan

if you change that crc to a sha1 signature the chances of it being different but with the same signature are astronomicly small

如果您将该 crc 更改为 sha1 签名,则它不同但具有相同签名的可能性很小

回答by Josh

You can check the length and dates of the two files even before checking the CRC to possibly avoid the CRC check.

您甚至可以在检查 CRC 之前检查这两个文件的长度和日期,以避免进行 CRC 检查。

But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU. For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. On a 64 bit PC you can read 8 bytes at a time. This is roughly 4 or 8 times as fast as doing it byte by byte. You also would probably wanna use an unsafe code block so that you could use pointers instead of doing a bunch of bit shifting and OR'ing to get the bytes into the native int sizes.

但是如果你必须比较整个文件的内容,我见过的一个巧妙的技巧就是以等于 CPU 位数的步幅读取字节。例如,在 32 位 PC 上,一次读取 4 个字节并将它们作为 int32 进行比较。在 64 位 PC 上,您一次可以读取 8 个字节。这大约是逐字节执行的速度的 4 或 8 倍。您可能还想使用不安全的代码块,以便您可以使用指针而不是进行一堆位移和 OR 运算来将字节转换为本机 int 大小。

You can use IntPtr.Size to determine the ideal size for the current processor architecture.

您可以使用 IntPtr.Size 来确定当前处理器架构的理想大小。

回答by Lars

I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks. This reduced time to about 1/4.

我通过在读取流块上的循环中使用 Int64 比较来加速“memcmp”。这将时间减少到大约 1/4。

    private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)
    {
        const int bufferSize = 2048 * 2;
        var buffer1 = new byte[bufferSize];
        var buffer2 = new byte[bufferSize];

        while (true)
        {
            int count1 = stream1.Read(buffer1, 0, bufferSize);
            int count2 = stream2.Read(buffer2, 0, bufferSize);

            if (count1 != count2)
            {
                return false;
            }

            if (count1 == 0)
            {
                return true;
            }

            int iterations = (int)Math.Ceiling((double)count1 / sizeof(Int64));
            for (int i = 0; i < iterations; i++)
            {
                if (BitConverter.ToInt64(buffer1, i * sizeof(Int64)) != BitConverter.ToInt64(buffer2, i * sizeof(Int64)))
                {
                    return false;
                }
            }
        }
    }

回答by JonPen

This is how I would do it if you didn't want to rely on crc:

如果您不想依赖 crc,我会这样做:

    /// <summary>
    /// Binary comparison of two files
    /// </summary>
    /// <param name="fileName1">the file to compare</param>
    /// <param name="fileName2">the other file to compare</param>
    /// <returns>a value indicateing weather the file are identical</returns>
    public static bool CompareFiles(string fileName1, string fileName2)
    {
        FileInfo info1 = new FileInfo(fileName1);
        FileInfo info2 = new FileInfo(fileName2);
        bool same = info1.Length == info2.Length;
        if (same)
        {
            using (FileStream fs1 = info1.OpenRead())
            using (FileStream fs2 = info2.OpenRead())
            using (BufferedStream bs1 = new BufferedStream(fs1))
            using (BufferedStream bs2 = new BufferedStream(fs2))
            {
                for (long i = 0; i < info1.Length; i++)
                {
                    if (bs1.ReadByte() != bs2.ReadByte())
                    {
                        same = false;
                        break;
                    }
                }
            }
        }

        return same;
    }

回答by Larry

The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested.

接受的答案有一个错误被指出,但从未纠正:流读取调用不能保证返回所有请求的字节。

BinaryReaderReadBytescalls are guaranteed to return as many bytes as requested unless the end of the stream is reached first.

BinaryReader ReadBytes调用保证返回尽可能多的字节,除非首先到达流的末尾。

The following code takes advantage of BinaryReaderto do the comparison:

以下代码利用BinaryReader进行比较:

    static private bool FileEquals(string file1, string file2)
    {
        using (FileStream s1 = new FileStream(file1, FileMode.Open, FileAccess.Read, FileShare.Read))
        using (FileStream s2 = new FileStream(file2, FileMode.Open, FileAccess.Read, FileShare.Read))
        using (BinaryReader b1 = new BinaryReader(s1))
        using (BinaryReader b2 = new BinaryReader(s2))
        {
            while (true)
            {
                byte[] data1 = b1.ReadBytes(64 * 1024);
                byte[] data2 = b2.ReadBytes(64 * 1024);
                if (data1.Length != data2.Length)
                    return false;
                if (data1.Length == 0)
                    return true;
                if (!data1.SequenceEqual(data2))
                    return false;
            }
        }
    }

回答by Mohammad Nikravan

It is SLOW, but so clean!

它很,但很干净!

    static bool StreamEquals(Stream stream1, Stream stream2)
    {
        using (var md5 = MD5.Create())
            return md5.ComputeHash(stream1).SequenceEqual(md5.ComputeHash(stream2));
    }