C++ 在位流中扫描位模式的最快方法

Question

提问by

I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.

我需要在位流中扫描 16 位字。不能保证在字节或字边界上对齐。

What is the fastest way of achieving this? There are various brute force methods; using tables and/or shifts but are there any "bit twiddling shortcuts" that can cut down the number of calculations by giving yes/no/maybe contains the flag results for each byte or word as it arrives?

实现这一目标的最快方法是什么？有各种蛮力方法；使用表格和/或移位，但是否有任何“位操作快捷方式”可以通过给出是/否/可能包含每个字节或单词到达时的标志结果来减少计算次数？

C code, intrinsics, x86 machine code would all be interesting.

C 代码、内部函数、x86 机器代码都会很有趣。

Answer 1

采纳答案by Toad

Using simple brute force is sometimes good.

使用简单的蛮力有时是好的。

I think precalc all shifted values of the word and put them in 16 ints so you got an array like this (assuming intis twice as wide as short)

我认为预先计算单词的所有移位值并将它们放入 16 个整数中，这样您就会得到一个这样的数组（假设int宽度是的两倍short）

 unsigned short pattern = 1234;
 unsigned int preShifts[16];
 unsigned int masks[16];
 int i;
 for(i=0; i<16; i++)
 {
      preShifts[i] = (unsigned int)(pattern<<i);  //gets promoted to int
      masks[i] = (unsigned int) (0xffff<<i);
 }

and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.

然后对于每个离开流的无符号短整型，制作该短整型和前一个短整型的 int 并将该无符号整型与 16 个无符号整型进行比较。如果其中任何一个匹配，你就会得到一个。

So basically like this:

所以基本上是这样的：

  int numMatch(unsigned short curWord, unsigned short prevWord)
  {
       int numHits = 0;
       int combinedWords = (prevWord<<16) + curWord;

       int i=0;
       for(i=0; i<16; i++)
       {
             if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
       }
       return numHits;
  }

Do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:

请注意，当在同一位上多次检测到模式时，这可能意味着多次命中：

e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!

例如 32 位的 0 并且您要检测的模式是 16 个 0，那么这意味着该模式被检测到 16 次！

The time cost of this, assuming it compiles approximately as written, is 16 checks per input word. Per input bit, this does one &and ==, and branch or other conditional increment. And also a table lookup for the mask for every bit.

假设它按照编写的方式编译，其时间成本是每个输入字 16 次检查。每个输入位，这会执行一个&和==，以及分支或其他条件增量。以及每个位掩码的表查找。

The table lookup is unnecessary; by instead right-shifting combinedwe get significantly more efficient asm, as shown in another answerwhich also shows how to vectorize this with SIMD on x86.

不需要查表；通过改为右移，combined我们获得了更高效的 asm，如另一个答案所示，该答案也显示了如何在 x86 上使用 SIMD 对其进行矢量化。

Answer 2

回答by Whoever

Here is a trick to speed up the search by a factor of 32, if neither the Knuth-Morris-Pratt algorithm on the alphabet of two characters {0, 1} nor reinier's idea are fast enough.

如果两个字符 {0, 1} 的字母表上的 Knuth-Morris-Pratt 算法和 reinier 的想法都不够快，那么这里有一个将搜索速度提高 32 倍的技巧。

You can first use a table with 256 entries to check for each byte in your bit stream if it is contained in the 16-bit word you are looking for. The table you get with

您可以首先使用具有 256 个条目的表来检查位流中的每个字节是否包含在您要查找的 16 位字中。你得到的桌子

unsigned char table[256];
for (int i=0; i<256; i++)
  table[i] = 0; // initialize with false
for (i=0; i<8; i++)
  table[(word >> i) & 0xff] = 1; // mark contained bytes with true

You can then find possible positions for matches in the bit stream using

然后，您可以使用以下命令在位流中找到匹配项的可能位置

for (i=0; i<length; i++) {
  if (table[bitstream[i]]) {
    // here comes the code which checks if there is really a match
  }
}

As at most 8 of the 256 table entries are not zero, in average you have to take a closer look only at every 32th position. Only for this byte (combined with the bytes one before and one after) you have then to use bit operations or some masking techniques as suggested by reinier to see if there is a match.

由于 256 个表条目中最多有 8 个不为零，因此平均而言，您只需仔细查看每 32 个位置。仅对于这个字节（与前一个和后一个字节相结合），您必须使用位操作或 reinier 建议的一些掩码技术来查看是否存在匹配。

The code assumes that you use little endian byte order. The order of the bits in a byte can also be an issue (known to everyone who already implemented a CRC32 checksum).

该代码假定您使用小端字节序。字节中位的顺序也可能是一个问题（已经实现 CRC32 校验和的每个人都知道）。

Answer 3

回答by Vadakkumpadath

I would like to suggest a solution using 3 lookup tables of size 256. This would be efficient for large bit streams. This solution takes 3 bytes in a sample for comparison. Following figure shows all possible arrangements of a 16 bit data in 3 bytes. Each byte region has shown in different color.

我想建议使用 3 个大小为 256 的查找表的解决方案。这对于大比特流是有效的。此解决方案在示例中占用 3 个字节进行比较。下图显示了 3 个字节的 16 位数据的所有可能排列。每个字节区域都以不同的颜色显示。

alt text http://img70.imageshack.us/img70/8711/80541519.jpg

替代文字 http://img70.imageshack.us/img70/8711/80541519.jpg

Here checking for 1 to 8 will be taken care in first sample and 9 to 16 in next sample and so on. Now when we are searching for a Pattern, we will find all the 8 possible arrangements (as below) of this Patternand will store in 3 lookup tables (Left, Middle and Right).

这里将在第一个样本中检查 1 到 8，在下一个样本中检查 9 到 16，依此类推。现在，当我们正在寻找一个模式，我们会发现所有的8个可能的排列（如下）本的模式，将在3个查找表（左，中，右）存储。

Initializing Lookup Tables:

初始化查找表：

Lets take an example 0111011101110111as a Patternto find. Now consider 4th arrangement. Left part would be XXX01110. Fill all raws of Left lookup table pointing by left part (XXX01110) with 00010000. 1 indicates starting position of arrangement of input Pattern. Thus following 8 raws of Left look up table would be filled by 16 (00010000).

让我们举一个例子0111011101110111作为寻找模式。现在考虑第四种安排。左边的部分是XXX01110. 用 . 填充由左部分 ( XXX01110)指向的左查找表的所有原始数据00010000。1 表示输入Pattern排列的起始位置。因此，Left 查找表的 8 个原始数据将被 16 ( 00010000)填充。

Middle part of arrangement would be 11101110. Raw pointing by this index (238) in Middle look up table will be filled by 16 (00010000).

安排的中间部分是11101110。中间查找表中此索引 (238) 的原始指向将被 16 ( 00010000)填充。

Now Right part of arrangement would be 111XXXXX. All raws (32 raws) with index 111XXXXXwill be filled by 16 (00010000).

现在安排的正确部分是111XXXXX。所有带有索引的原始数据（32 个原始数据）111XXXXX将被 16 ( 00010000)填充。

We should not overwrite elements in look up table while filling. Instead do a bitwise OR operation to update an already filled raw. In above example, all raws written by 3rd arrangement would be updated by 7th arrangement as follows.

我们不应该在填充时覆盖查找表中的元素。而是执行按位 OR 操作来更新已填充的原始数据。在上面的例子中，所有由第 3 排列写入的原始数据将被第 7 排列更新，如下所示。

Thus raws with index XX011101in Left lookup table and 11101110in Middle lookup table and 111XXXXXin Right lookup table will be updated to 00100010by 7th arrangement.

因此，XX011101在左查找表和11101110中间查找表以及111XXXXX右查找表中带有索引的原始数据将被更新为00100010第 7 次排列。

Searching Pattern:

搜索模式：

Take a sample of three bytes. Find Countas follows where Leftis left lookup table, Middleis middle lookup table and Rightis right lookup table.

取三个字节的样本。按如下方式查找计数，Left是左查找表，Middle是中间查找表，Right是右查找表。

Count = Left[Byte0] & Middle[Byte1] & Right[Byte2];

Number of 1 in Countgives the number of matching Patternin taken sample.

Count 中的数字1给出了采样中匹配模式的数量。

I can give some sample code which is tested.

我可以给出一些经过测试的示例代码。

Initializing lookup table:

初始化查找表：

    for( RightShift = 0; RightShift < 8; RightShift++ )
    {
        LeftShift = 8 - RightShift;

        Starting = 128 >> RightShift;

        Byte = MSB >> RightShift;

        Count = 0xFF >> LeftShift;

        for( i = 0; i <= Count; i++ )
        {
            Index = ( i << LeftShift ) | Byte;

            Left[Index] |= Starting;
        }

        Byte = LSB << LeftShift;

        Count = 0xFF >> RightShift;

        for( i = 0; i <= Count; i++ )
        {
            Index = i | Byte;

            Right[Index] |= Starting;
        }

        Index = ( unsigned char )(( Pattern >> RightShift ) & 0xFF );

        Middle[Index] |= Starting;
    }

Searching Pattern:

搜索模式：

Datais stream buffer, Leftis left lookup table, Middleis middle lookup table and Rightis right lookup table.

数据是流缓冲区，Left是左查找表，Middle是中间查找表，Right是右查找表。

for( int Index = 1; Index < ( StreamLength - 1); Index++ )
{
    Count = Left[Data[Index - 1]] & Middle[Data[Index]] & Right[Data[Index + 1]];

    if( Count )
    {
        TotalCount += GetNumberOfOnes( Count );
    }
}

Limitation:

局限性：

Above loop cannot detect a Patternif it is placed at the very end of stream buffer. Following code need to add after loop to overcome this limitation.

如果将模式放置在流缓冲区的最末尾，则上述循环无法检测到模式。下面的代码需要添加 after 循环来克服这个限制。

Count = Left[Data[StreamLength - 2]] & Middle[Data[StreamLength - 1]] & 128;

if( Count )
{
    TotalCount += GetNumberOfOnes( Count );
}

Advantage:

优势：

This algorithm takes only N-1logical steps to find a Patternin an array of Nbytes. Only overhead is to fill the lookup tables initially which is constant in all the cases. So this will be very effective for searching huge byte streams.

该算法只需要N-1 个逻辑步骤就可以在N个字节的数组中找到一个模式。唯一的开销是最初填充查找表，这在所有情况下都是恒定的。所以这对于搜索巨大的字节流将非常有效。

Answer 4

回答by Vadakkumpadath

My money's on Knuth-Morris-Prattwith an alphabet of two characters.

我的钱在Knuth-Morris-Pratt 上，有两个字符的字母表。

Answer 5

回答by mouviciel

I would implement a state machine with 16 states.

我会实现一个有 16 个状态的状态机。

Each state represents how many received bits conform to the pattern. If the next received bit conform to the next bit of the pattern, the machine steps to the next state. If this is not the case, the machine steps back to the first state (or to another state if the beginning of the pattern can be matched with a smaller number of received bits).

每个状态表示有多少接收到的位符合模式。如果下一个接收到的位符合模式的下一位，机器就会进入下一个状态。如果不是这种情况，机器将返回到第一个状态（或者如果模式的开头可以与较少数量的接收位匹配，则返回到另一个状态）。

When the machine reaches the last state, this indicates that the pattern has been identified in the bit stream.

当机器到达最后一个状态时，这表明该模式已经在比特流中被识别。

Answer 6

回答by Ewan Todd

atomice's

原子的

Knuth-Morris-Pratt

克努斯-莫里斯-普拉特

looked good until I considered Luke and MSalter's requests for more information about the particulars.

看起来不错，直到我考虑了 Luke 和 MSalter 关于细节的更多信息的要求。

Turns out the particulars might indicate a quicker approach than KMP. The KMP article links to

事实证明，这些细节可能表明一种比 KMP 更快的方法。KMP 文章链接到

Boyer Moore

博耶摩尔

for a particular case when the search pattern is 'AAAAAA'. For a multiple pattern search, the

对于搜索模式为“AAAAAA”的特定情况。对于多模式搜索，

Rabin-Karp

拉宾-卡普

might be most suitable.

可能是最合适的。

You can find further introductory discussion here.

您可以在此处找到进一步的介绍性讨论。

Answer 7

回答by MSalters

What I would do is create 16 prefixes and 16 suffixes. Then for each 16 bit input chunk determine the longest suffix match. You've got a match if the next chunk has a prefix match of length (16-N)

我要做的是创建 16 个前缀和 16 个后缀。然后为每个 16 位输入块确定最长的后缀匹配。如果下一个块具有长度的前缀匹配，则您已匹配(16-N)

A suffix match doesn't actually 16 comparisons. However, this takes pre-calculation based upon the pattern word. For example, if the patternword is 101010101010101010, you can first test the last bit of your 16 bit input chunk. If that bit is 0, you only need to test the ...10101010 suffices. If the last bit is 1, you need to test the ...1010101 suffices. You've got 8 of each, for a total of 1+8 comparisons. If the patternword is 1111111111110000, you'd still test the last bit of your input for a suffix match. If that bit is 1, you have to do 12 suffix matches (regex: 1{1,12}) but if it's 0 you have only 4 possible matches (regex 1111 1111 1111 0{1,4}), again for an average of 9 tests. Add the 16-Nprefix match, and you see that you only need 10 checks per 16 bit chunk.

后缀匹配实际上不是 16 次比较。但是，这需要基于模式词进行预先计算。例如，如果模式字是 101010101010101010，您可以先测试 16 位输入块的最后一位。如果该位为 0，则只需测试 ...10101010 就足够了。如果最后一位为 1，则需要测试 ...1010101 就足够了。每个都有 8 个，总共有 1+8 个比较。如果模式字是 1111111111110000，您仍然需要测试输入的最后一位是否匹配后缀。如果该位为 1，则必须进行 12 次后缀匹配（正则表达式：1{1,12}），但如果为 0，则只有 4 次可能的匹配（正则表达式 1111 1111 1111 0{1,4}），再次求平均值9 个测试。添加16-N前缀匹配，您会看到每个 16 位块只需要 10 次检查。

Answer 8

回答by moonshadow

For a general-purpose, non-SIMD algorithm you are unlikely to be able to do much better than something like this:

对于通用的非 SIMD 算法，您不太可能比这样的算法做得更好：

unsigned int const pattern = pattern to search for
unsigned int accumulator = first three input bytes

do
{
  bool const found = ( ((accumulator   ) & ((1<<16)-1)) == pattern )
                   | ( ((accumulator>>1) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>2) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>3) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>4) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>5) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>6) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>7) & ((1<<16)-1)) == pattern );
  if( found ) { /* pattern found */ }
  accumulator >>= 8;

  unsigned int const data = next input byte
  accumulator |= (data<<8);
} while( there is input data left );

Answer 9

回答by ajs410

Seems like a good use for SIMD instructions. SSE2 added a bunch of integer instructions for crunching multiple integers at the same time, but I can't imagine many solutions for this that don't involve a lot of bit shifts since your data isn't going to be aligned. This actually sounds like something an FPGA should be doing.

似乎很好地使用了 SIMD 指令。SSE2 添加了一堆用于同时处理多个整数的整数指令，但我无法想象很多解决方案不涉及大量位移，因为您的数据不会对齐。这实际上听起来像是 FPGA 应该做的事情。

Answer 10

回答by ldog

You can use the fast fourier transform for extremely large input (value of n) to find anybit pattern in O(n log n ) time. Compute the cross-correlation of a bit mask with the input. Cross -correlation of a sequence x and a mask y with a size n and n' respectively is defined by

您可以对极大输入（n 的值）使用快速傅立叶变换以在 O(n log n ) 时间内找到任何位模式。计算位掩码与输入的互相关。序列 x 和分别具有大小 n 和 n' 的掩码 y 的互相关定义为

R(m) = sum  _ k = 0 ^ n' x_{k+m} y_k

then occurences of your bit pattern that match the mask exactly where R(m) = Y where Y is the sum of one's in your bit mask.

然后出现与掩码完全匹配的位模式，其中 R(m) = Y 其中 Y 是位掩码中一个的总和。

So if you are trying to match for the bit pattern

所以如果你想匹配位模式

[0 0 1 0 1 0]

in

在

[ 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1]

then you must use the mask

那么你必须使用面具

[-1 -1  1 -1  1 -1]

the -1's in the mask guarantee that those places must be 0.

掩码中的 -1 保证这些位置必须为 0。

You can implement cross-correlation, using the FFT in O(n log n ) time.

您可以在 O(n log n ) 时间内使用 FFT 实现互相关。

I think KMP has O(n + k) runtime, so it beats this out.

我认为 KMP 有 O(n + k) 运行时，所以它击败了这一点。

C++ 在位流中扫描位模式的最快方法

提问by

采纳答案by Toad

回答by Whoever

回答by Vadakkumpadath

回答by Vadakkumpadath

回答by mouviciel

回答by Ewan Todd

回答by MSalters

回答by moonshadow

回答by ajs410

回答by ldog

相关推荐

最近更新

标签

C++ 在位流中扫描位模式的最快方法

提问by

采纳答案by Toad

回答by Whoever

回答by Vadakkumpadath

回答by Vadakkumpadath

回答by mouviciel

回答by Ewan Todd

回答by MSalters

回答by moonshadow

回答by ajs410

回答by ldog

相关推荐

C++中是否有二进制内存流

C++ 运行C++程序时“系统找不到指定的文件”

在 C/C++ 中实现导数

C++ 中的数值积分

相关推荐

最近更新

标签