C语言 如何计算 CRC32 校验和?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2587766/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:04:19  来源:igfitidea点击:

How is a CRC32 checksum calculated?

cchecksumcrc32

提问by aquanar

Maybe I'm just not seeing it, but CRC32 seems either needlessly complicated, or insufficiently explained anywhere I could find on the web.

也许我只是没有看到它,但 CRC32 似乎要么不必要地复杂,要么在我可以在网上找到的任何地方都没有得到充分解释。

I understand that it is the remainder from a non-carry-based arithmetic division of the message value, divided by the (generator) polynomial, but the actual implementation of it escapes me.

我知道它是消息值的非基于进位的算术除法的余数,除以(生成器)多项式,但它的实际实现让我无法理解。

I've read A Painless Guide To CRC Error Detection Algorithms, and I must say it was not painless. It goes over the theory rather well, but the author never gets to a simple "this is it." He does say what the parameters are for the standard CRC32 algorithm, but he neglects to lay out clearly how you get to it.

我读过A Painless Guide To CRC Error Detection Algorithms,我必须说它不是无痛的。它很好地解释了理论,但作者从来没有得到一个简单的“就是这样”。他确实说明了标准 CRC32 算法的参数是什么,但他没有明确说明如何实现它。

The part that gets me is when he says "this is it" and then adds on, "oh by the way, it can be reversed or started with different initial conditions," and doesn't give a clear answer of what the final way of calculating a CRC32 checksum given all of the changes he just added.

让我感动的部分是当他说“就是这样”然后补充说,“哦,顺便说一句,它可以颠倒或以不同的初始条件开始”,并且没有给出最终方式的明确答案考虑到他刚刚添加的所有更改,计算 CRC32 校验和。

  • Is there a simpler explanation of how CRC32 is calculated?
  • 是否有关于如何计算 CRC32 的更简单的解释?

I attempted to code in C how the table is formed:

我试图在 C 中编码表的形成方式:

for (i = 0; i < 256; i++)
{
    temp = i;

    for (j = 0; j < 8; j++)
    {
        if (temp & 1)
        {
            temp >>= 1;
            temp ^= 0xEDB88320;
        }
        else {temp >>= 1;}
    }
    testcrc[i] = temp;
}

but this seems to generate values inconsistent with values I have found elsewhere on the Internet. I coulduse the values I found online, but I want to understand how they were created.

但这似乎产生了与我在 Internet 上其他地方找到的值不一致的值。我可以使用我在网上找到的值,但我想了解它们是如何创建的。

Any help in clearing up these incredibly confusing numbers would be veryappreciated.

在清理这些难以置信的混乱号任何帮助将非常感激。

回答by

The polynomial for CRC32 is:

CRC32 的多项式为:

x32+ x26+ x23+ x22+ x16+ x12+ x11+ x10+ x8+ x7+ x5+ x4+ x2+ x + 1

x 32+ x 26+ x 23+ x 22+ x 16+ x 12+ x 11+ x 10+ x 8+ x 7+ x 5+ x 4+ x 2+ x + 1

Or in hex and binary:

或十六进制和二进制:

0x 01 04 C1 1D B7
1 0000 0100 1100 0001 0001 1101 1011 0111

0x 01 04 C1 1D B7
1 0000 0100 1100 0001 0001 1101 1011 0111

The highest term (x32) is usually not explicitly written, so it can instead be represented in hex just as

最高项 (x 32) 通常没有明确写出,所以它可以用十六进制表示,就像

0x 04 C1 1D B7

0x 04 C1 1D B7

Feel free to count the 1s and 0s, but you'll find they match up with the polynomial, where 1is bit 0 (or the first bit) and xis bit 1 (or the second bit).

随意计算 1 和 0,但您会发现它们与多项式匹配,其中1位 0(或第一位)和x位 1(或第二位)。

Why this polynomial? Because there needs to be a standard given polynomial and the standard was set by IEEE 802.3. Also it is extremely difficult to find a polynomial that detects different bit errors effectively.

为什么是这个多项式?因为需要有一个给定多项式的标准,而该标准是由 IEEE 802.3 制定的。此外,找到有效检测不同位错误的多项式也极其困难。

You can think of the CRC-32 as a series of "Binary Arithmetic with No Carries", or basically "XOR and shift operations". This is technically called Polynomial Arithmetic.

您可以将 CRC-32 视为一系列“无进位二进制算术”,或者基本上是“XOR 和移位运算”。这在技术上称为多项式算术。

To better understand it, think of this multiplication:

为了更好地理解它,想想这个乘法:

(x^3 + x^2 + x^0)(x^3 + x^1 + x^0)
= (x^6 + x^4 + x^3
 + x^5 + x^3 + x^2
 + x^3 + x^1 + x^0)
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0

If we assume x is base 2 then we get:

如果我们假设 x 是基数 2,那么我们得到:

x^7 + x^3 + x^2 + x^1 + x^0

Why? Because 3x^3 is 11x^11 (but we need only 1 or 0 pre digit) so we carry over:

为什么?因为 3x^3 是 11x^11(但我们只需要 1 或 0 前位数字)所以我们结转:

=1x^110 + 1x^101 + 1x^100          + 11x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^100 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^101          + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^110                   + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^111                            + 1x^11 + 1x^10 + 1x^1 + x^0

But mathematicians changed the rules so that it is mod 2. So basically any binary polynomial mod 2 is just addition without carry or XORs. So our original equation looks like:

但是数学家改变了规则,使其成为模 2。所以基本上任何二元多项式模 2 都只是没有进位或异或的加法。所以我们的原始方程看起来像:

=( 1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0 ) MOD 2
=( 1x^110 + 1x^101 + 1x^100 +  1x^11 + 1x^10 + 1x^1 + x^0 )
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0 (or that original number we had)

I know this is a leap of faith but this is beyond my capability as a line-programmer. If you are a hard-core CS-student or engineer I challenge to break this down. Everyone will benefit from this analysis.

我知道这是信仰的飞跃,但这超出了我作为一名程序员的能力。如果您是一名硬核 CS 学生或工程师,我会挑战将其分解。每个人都会从这种分析中受益。

So to work out a full example:

所以要找出一个完整的例子:

   Original message                : 1101011011
   Polynomial of (W)idth 4         :      10011
   Message after appending W zeros : 11010110110000

Now we divide the augmented Message by the Poly using CRC arithmetic. This is the same division as before:

现在我们使用 CRC 算法将增强消息除以 Poly。这是与以前相同的划分:

            1100001010 = Quotient (nobody cares about the quotient)
       _______________
10011 ) 11010110110000 = Augmented message (1101011011 + 0000)
=Poly   10011,,.,,....
        -----,,.,,....
         10011,.,,....
         10011,.,,....
         -----,.,,....
          00001.,,....
          00000.,,....
          -----.,,....
           00010,,....
           00000,,....
           -----,,....
            00101,....
            00000,....
            -----,....
             01011....
             00000....
             -----....
              10110...
              10011...
              -----...
               01010..
               00000..
               -----..
                10100.
                10011.
                -----.
                 01110
                 00000
                 -----
                  1110 = Remainder = THE CHECKSUM!!!!

The division yields a quotient, which we throw away, and a remainder, which is the calculated checksum. This ends the calculation. Usually, the checksum is then appended to the message and the result transmitted. In this case the transmission would be: 11010110111110.

除法产生一个商,我们扔掉它,还有一个余数,即计算出的校验和。计算到此结束。通常,然后将校验和附加到消息并传输结果。在这种情况下,传输将是:11010110111110。

Only use a 32-bit number as your divisor and use your entire stream as your dividend. Throw out the quotient and keep the remainder. Tack the remainder on the end of your message and you have a CRC32.

仅使用 32 位数字作为除数,并使用整个流作为股息。去掉商,保留余数。在您的消息末尾添加剩余部分,您将获得一个 CRC32。

Average guy review:

普通人评论:

         QUOTIENT
        ----------
DIVISOR ) DIVIDEND
                 = REMAINDER
  1. Take the first 32 bits.
  2. Shift bits
  3. If 32 bits are less than DIVISOR, go to step 2.
  4. XOR 32 bits by DIVISOR. Go to step 2.
  1. 取前 32 位。
  2. 移位位
  3. 如果 32 位小于 DIVISOR,则转到步骤 2。
  4. 由 DIVISOR 进行 32 位异或。转到步骤 2。

(Note that the stream has to be dividable by 32 bits or it should be padded. For example, an 8-bit ANSI stream would have to be padded. Also at the end of the stream, the division is halted.)

(请注意,该流必须可被 32 位整除,否则应进行填充。例如,必须填充 8 位 ANSI 流。同样在流的末尾,除法会停止。)

回答by Pavlo Bobrek

For IEEE802.3, CRC-32. Think of the entire message as a serial bit stream, append 32 zeros to the end of the message. Next, you MUST reverse the bits of EVERY byte of the message and do a 1's complement the first 32 bits. Now divide by the CRC-32 polynomial, 0x104C11DB7. Finally, you must 1's complement the 32-bit remainder of this division bit-reverse each of the 4 bytes of the remainder. This becomes the 32-bit CRC that is appended to the end of the message.

对于 IEEE802.3,CRC-32。将整个消息视为串行位流,在消息末尾附加 32 个零。接下来,您必须反转消息的每个字节的位,并对前 32 位进行 1 的补码。现在除以 CRC-32 多项式 0x104C11DB7。最后,您必须对该除法的 32 位余数进行 1 的补码 - 反转余数的 4 个字节中的每一个。这成为附加到消息末尾的 32 位 CRC。

The reason for this strange procedure is that the first Ethernet implementations would serialize the message one byte at a time and transmit the least significant bit of every byte first. The serial bit stream then went through a serial CRC-32 shift register computation, which was simply complemented and sent out on the wire after the message was completed. The reason for complementing the first 32 bits of the message is so that you don't get an all zero CRC even if the message was all zeros.

这个奇怪过程的原因是第一个以太网实现将一次一个字节地序列化消息,并首先传输每个字节的最低有效位。串行位流然后通过串行 CRC-32 移位寄存器计算,在消息完成后,它被简单地补充并通过线路发送出去。对消息的前 32 位进行补码的原因是,即使消息全为零,也不会得到全零的 CRC。

回答by WhirlWind

A CRC is pretty simple; you take a polynomial represented as bits and the data, and divide the polynomial into the data (or you represent the data as a polynomial and do the same thing). The remainder, which is between 0 and the polynomial is the CRC. Your code is a bit hard to understand, partly because it's incomplete: temp and testcrc are not declared, so it's unclear what's being indexed, and how much data is running through the algorithm.

CRC 非常简单;您将多项式表示为位和数据,并将多项式划分为数据(或者您将数据表示为多项式并执行相同的操作)。0 和多项式之间的余数是 CRC。您的代码有点难以理解,部分原因是它不完整:没有声明 temp 和 testcrc,因此不清楚正在索引什么,以及通过算法运行了多少数据。

The way to understand CRCs is to try to compute a few using a short piece of data (16 bits or so) with a short polynomial -- 4 bits, perhaps. If you practice this way, you'll really understand how you might go about coding it.

理解 CRC 的方法是尝试使用一小段数据(16 位左右)和一个短多项式 - 可能是 4 位来计算一些。如果您以这种方式练习,您将真正了解如何进行编码。

If you're doing it frequently, a CRC is quite slow to compute in software. Hardware computation is much more efficient, and requires just a few gates.

如果您经常这样做,那么在软件中计算 CRC 会很慢。硬件计算效率更高,只需要几个门。

回答by jschmier

In addition to the Wikipedia Cyclic redundancy checkand Computation of CRCarticles, I found a paper entitled Reversing CRC - Theory and Practice*to be a good reference.

除了维基百科循环冗余校验CRC计算的文章,我发现一篇题为Reversing CRC - Theory and Practice*的论文是很好的参考。

There are essentially three approaches for computing a CRC: an algebraic approach, a bit-oriented approach, and a table-driven approach. In Reversing CRC - Theory and Practice*, each of these three algorithms/approaches is explained in theory accompanied in the APPENDIX by an implementation for the CRC32 in the C programming language.

基本上有三种计算 CRC 的方法:代数方法、面向位的方法和表驱动的方法。在Reversing CRC - Theory and Practice* 中,这三种算法/方法中的每一种都在理论上解释,附录中附有 C 编程语言中 CRC32 的实现。

* PDF Link
Reversing CRC – Theory and Practice.
HU Berlin Public Report
SAR-PR-2006-05
May 2006
Authors:
Martin Stigge, Henryk Pl?tz, Wolf Müller, Jens-Peter Redlich

* PDF Link
Reversing CRC – 理论与实践。
HU Berlin 公开报告
SAR-PR- 2006-05
May 2006
作者:
Martin Stigge、Henryk Pl?tz、Wolf Müller、Jens-Peter Redlich

回答by vafylec

I spent a while trying to uncover the answer to this question, and I finally published a tutorial on CRC-32 today: CRC-32 hash tutorial - AutoHotkey Community

我花了一段时间试图揭开这个问题的答案,今天终于发布了一个关于 CRC-32 的教程: CRC-32 哈希教程 - AutoHotkey 社区

In this example from it, I demonstrate how to calculate the CRC-32 hash for the ASCII string 'abc':

在这个例子中,我演示了如何计算 ASCII 字符串 'abc' 的 CRC-32 哈希:

calculate the CRC-32 hash for the ASCII string 'abc':

inputs:
dividend: binary for 'abc': 0b011000010110001001100011 = 0x616263
polynomial: 0b100000100110000010001110110110111 = 0x104C11DB7

011000010110001001100011
reverse bits in each byte:
100001100100011011000110
append 32 0 bits:
10000110010001101100011000000000000000000000000000000000
XOR the first 4 bytes with 0xFFFFFFFF:
01111001101110010011100111111111000000000000000000000000

'CRC division':
01111001101110010011100111111111000000000000000000000000
 100000100110000010001110110110111
 ---------------------------------
  111000100010010111111010010010110
  100000100110000010001110110110111
  ---------------------------------
   110000001000101011101001001000010
   100000100110000010001110110110111
   ---------------------------------
    100001011101010011001111111101010
    100000100110000010001110110110111
    ---------------------------------
         111101101000100000100101110100000
         100000100110000010001110110110111
         ---------------------------------
          111010011101000101010110000101110
          100000100110000010001110110110111
          ---------------------------------
           110101110110001110110001100110010
           100000100110000010001110110110111
           ---------------------------------
            101010100000011001111110100001010
            100000100110000010001110110110111
            ---------------------------------
              101000011001101111000001011110100
              100000100110000010001110110110111
              ---------------------------------
                100011111110110100111110100001100
                100000100110000010001110110110111
                ---------------------------------
                    110110001101101100000101110110000
                    100000100110000010001110110110111
                    ---------------------------------
                     101101010111011100010110000001110
                     100000100110000010001110110110111
                     ---------------------------------
                       110111000101111001100011011100100
                       100000100110000010001110110110111
                       ---------------------------------
                        10111100011111011101101101010011

remainder: 0b10111100011111011101101101010011 = 0xBC7DDB53
XOR the remainder with 0xFFFFFFFF:
0b01000011100000100010010010101100 = 0x438224AC
reverse bits:
0b00110101001001000100000111000010 = 0x352441C2

thus the CRC-32 hash for the ASCII string 'abc' is 0x352441C2

回答by don bright

Then there is always Rosetta Code, which shows crc32 implemented in dozens of computer languages. https://rosettacode.org/wiki/CRC-32and has links to many explanations and implementations.

然后总是有 Rosetta Code,它显示了用几十种计算机语言实现的 crc32。https://rosettacode.org/wiki/CRC-32并有许多解释和实现的链接。

回答by Gabriel Furstenheim

In order to reduce crc32 to taking the reminder you need to:

为了减少 crc32 的提醒,您需要:

  1. Invert bits on each byte
  2. xor first four bytes with 0xFF (this is to avoid errors on the leading 0s)
  3. Add padding at the end (this is to make the last 4 bytes take part in the hash)
  4. Compute the reminder
  5. Reverse the bits again
  6. xor the result again.
  1. 反转每个字节的位
  2. 与 0xFF 异或前四个字节(这是为了避免前导 0 上的错误)
  3. 在末尾添加填充(这是为了让最后 4 个字节参与哈希)
  4. 计算提醒
  5. 再次反转位
  6. 再次对结果进行异或。

In code this is:

在代码中,这是:


func CRC32 (file []byte) uint32 {
    for i , v := range(file) {
        file[i] = bits.Reverse8(v)
    }
    for i := 0; i < 4; i++ {
        file[i] ^= 0xFF
    }

    // Add padding
    file = append(file, []byte{0, 0, 0, 0}...)
    newReminder := bits.Reverse32(reminderIEEE(file))

    return newReminder ^ 0xFFFFFFFF
}

where reminderIEEE is the pure reminder on GF(2)[x]

其中,remoteIEEE 是 GF(2)[x] 上的纯提醒