C语言 用一次乘法提取位

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14547087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:10:28  来源:igfitidea点击:

Extracting bits with a single multiplication

cmultiplicationbit-manipulation

提问by NPE

I saw an interesting technique used in an answerto another question, and would like to understand it a little better.

我看到在使用了一个有趣的技术,答案另一个问题,并想好一点理解。

We're given an unsigned 64-bit integer, and we are interested in the following bits:

我们得到一个无符号的 64 位整数,我们对以下位感兴趣:

1.......2.......3.......4.......5.......6.......7.......8.......

Specifically, we'd like to move them to the top eight positions, like so:

具体来说,我们想将它们移动到前八位,如下所示:

12345678........................................................

We don't care about the value of the bits indicated by ., and they don't have to be preserved.

我们不关心由 指示的位的值.,并且不必保留它们。

The solutionwas to mask out the unwanted bits, and multiply the result by 0x2040810204081. This, as it turns out, does the trick.

溶液是屏蔽掉不需要的位,并且乘以结果0x2040810204081。事实证明,这可以解决问题。

How general is this method? Can this technique be used to extract any subset of bits? If not, how does one figure out whether or not the method works for a particular set of bits?

这种方法有多普遍?这种技术可以用来提取比特的任何子集吗?如果不是,如何确定该方法是否适用于一组特定的位?

Finally, how would one go about finding the (a?) correct multiplier to extract the given bits?

最后,如何找到(a?)正确的乘数来提取给定的位?

采纳答案by Floris

Very interesting question, and clever trick.

非常有趣的问题,和巧妙的技巧。

Let's look at a simple example of getting a single byte manipulated. Using unsigned 8 bit for simplicity. Imagine your number is xxaxxbxxand you want ab000000.

让我们看一个操作单个字节的简单示例。为简单起见,使用无符号 8 位。想象一下您的号码是xxaxxbxx并且您想要ab000000

The solution consisted of two steps: a bit masking, followed by multiplication. The bit mask is a simple AND operation that turns uninteresting bits to zeros. In the above case, your mask would be 00100100and the result 00a00b00.

该解决方案包括两个步骤:位掩码,然后是乘法。位掩码是一个简单的 AND 运算,它将无趣的位变为零。在上述情况下,您的掩码将是00100100和结果00a00b00

Now the hard part: turning that into ab.......

现在是困难的部分:把它变成ab.......

A multiplication is a bunch of shift-and-add operations. The key is to allow overflow to "shift away" the bits we don't need and put the ones we want in the right place.

乘法是一堆移位和加法运算。关键是允许溢出“移走”我们不需要的位,并将我们想要的位放在正确的位置。

Multiplication by 4 (00000100) would shift everything left by 2 and get you to a00b0000. To get the bto move up we need to multiply by 1 (to keep the a in the right place) + 4 (to move the b up). This sum is 5, and combined with the earlier 4 we get a magic number of 20, or 00010100. The original was 00a00b00after masking; the multiplication gives:

乘以 4 ( 00000100) 会将所有内容左移 2 并让您得到a00b0000. 为了让b向上移动,我们需要乘以 1(将 a 保持在正确的位置)+ 4(将 b 向上移动)。这个和是 5,加上前面的 4,我们得到了一个神奇的数字 20,或者00010100。原件是00a00b00经过遮罩后的;乘法给出:

000000a00b000000
00000000a00b0000 +
----------------
000000a0ab0b0000
xxxxxxxxab......

From this approach you can extend to larger numbers and more bits.

通过这种方法,您可以扩展到更大的数字和更多的位。

One of the questions you asked was "can this be done with any number of bits?" I think the answer is "no", unless you allow several masking operations, or several multiplications. The problem is the issue of "collisions" - for example, the "stray b" in the problem above. Imagine we need to do this to a number like xaxxbxxcx. Following the earlier approach, you would think we need {x 2, x {1 + 4 + 16}} = x 42 (oooh - the answer to everything!). Result:

你问的问题之一是“这可以用任意数量的位来完成吗?” 我认为答案是否定的,除非您允许多次屏蔽操作或多次乘法。问题是“碰撞”的问题——例如上面问题中的“杂散b”。想象一下,我们需要对像xaxxbxxcx. 按照之前的方法,您会认为我们需要 {x 2, x {1 + 4 + 16}} = x 42(哦哦 - 一切的答案!)。结果:

00000000a00b00c00
000000a00b00c0000
0000a00b00c000000
-----------------
0000a0ababcbc0c00
xxxxxxxxabc......

As you can see, it still works, but "only just". They key here is that there is "enough space" between the bits we want that we can squeeze everything up. I could not add a fourth bit d right after c, because I would get instances where I get c+d, bits might carry, ...

如您所见,它仍然有效,但“只是”。这里的关键是在我们想要的位之间有“足够的空间”,我们可以把所有的东西都挤起来。我不能在 c 之后添加第四个位 d,因为我会得到我得到 c+d 的实例,位可能携带,...

So without formal proof, I would answer the more interesting parts of your question as follows: "No, this will not work for any number of bits. To extract N bits, you need (N-1) spaces between the bits you want to extract, or have additional mask-multiply steps."

因此,如果没有正式的证明,我将回答您问题中更有趣的部分,如下所示:“不,这对任何数量的位都不起作用。要提取 N 位,您需要在要提取的位之间使用 (N-1) 个空格提取,或有额外的掩码乘法步骤。”

The only exception I can think of for the "must have (N-1) zeros between bits" rule is this: if you want to extract two bits that are adjacent to each other in the original, AND you want to keep them in the same order, then you can still do it. And for the purpose of the (N-1) rule they count as two bits.

对于“位之间必须有 (N-1) 个零”规则,我能想到的唯一例外是:如果您想提取原始中彼此相邻的两个位,并且您想将它们保留在同样的顺序,那么你仍然可以这样做。并且出于 (N-1) 规则的目的,它们算作两位。

There is another insight - inspired by the answer of @Ternary below (see my comment there). For each interesting bit, you only need as many zeros to the right of it as you need space for bits that need to go there. But also, it needs as many bits to the left as it has result-bits to the left. So if a bit b ends up in position m of n, then it needs to have m-1 zeros to its left, and n-m zeros to its right. Especially when the bits are not in the same order in the original number as they will be after the re-ordering, this is an important improvement to the original criteria. This means, for example, that a 16 bit word

还有另一种见解 - 灵感来自下面@Ternary 的回答(见我的评论)。对于每个有趣的位,您只需要它右侧的零与需要去那里的位所需的空间一样多。而且,它需要的左边位数与左边的结果位一样多。因此,如果位 b 最终位于 n 的位置 m,那么它的左侧需要有 m-1 个零,而其右侧需要有 nm 个零。特别是当这些位在原始数字中的顺序与重新排序后的顺序不同时,这是对原始标准的重要改进。这意味着,例如,一个 16 位的字

a...e.b...d..c..

Can be shifted into

可以转成

abcde...........

even though there is only one space between e and b, two between d and c, three between the others. Whatever happened to N-1?? In this case, a...ebecomes "one block" - they are multiplied by 1 to end up in the right place, and so "we got e for free". The same is true for b and d (b needs three spaces to the right, d needs the same three to its left). So when we compute the magic number, we find there are duplicates:

即使 e 和 b 之间只有一个空格,d 和 c 之间只有两个空格,其他之间也只有三个空格。N-1怎么了??在这种情况下,a...e变成了“一个块”——它们乘以 1 到正确的位置,所以“我们免费得到了 e”。b 和 d 也是如此(b 需要右边三个空格,d 需要左边三个空格)。所以当我们计算幻数时,我们发现有重复:

a: << 0  ( x 1    )
b: << 5  ( x 32   )
c: << 11 ( x 2048 )
d: << 5  ( x 32   )  !! duplicate
e: << 0  ( x 1    )  !! duplicate

Clearly, if you wanted these numbers in a different order, you would have to space them further. We can reformulate the (N-1)rule: "It will always work if there are at least (N-1) spaces between bits; or, if the order of bits in the final result is known, then if a bit b ends up in position m of n, it needs to have m-1 zeros to its left, and n-m zeros to its right."

显然,如果您希望这些数字以不同的顺序排列,则必须将它们隔得更远。我们可以重新制定(N-1)规则:“如果位之间至少有 (N-1) 个空格,它将始终有效;或者,如果最终结果中的位顺序已知,那么如果位 b 最终位于n,它的左边需要有 m-1 个零,它的右边需要有 nm 个零。”

@Ternary pointed out that this rule doesn't quite work, as there can be a carry from bits adding "just to the right of the target area" - namely, when the bits we're looking for are all ones. Continuing the example I gave above with the five tightly packed bits in a 16 bit word: if we start with

@Ternary 指出这条规则并不完全有效,因为可能有来自位的进位添加“刚好在目标区域的右侧”——即,当我们正在寻找的位都是 1 时。继续我上面给出的例子,用一个 16 位字中的五个紧密包装的位:如果我们从

a...e.b...d..c..

For simplicity, I will name the bit positions ABCDEFGHIJKLMNOP

为简单起见,我将命名位位置 ABCDEFGHIJKLMNOP

The math we were going to do was

我们要做的数学是

ABCDEFGHIJKLMNOP

a000e0b000d00c00
0b000d00c0000000
000d00c000000000
00c0000000000000 +
----------------
abcded(b+c)0c0d00c00

Until now, we thought anything below abcde(positions ABCDE) would not matter, but in fact, as @Ternary pointed out, if b=1, c=1, d=1then (b+c)in position Gwill cause a bit to carry to position F, which means that (d+1)in position Fwill carry a bit into E- and our result is spoilt. Note that space to the right of the least significant bit of interest (cin this example) doesn't matter, since the multiplication will cause padding with zeros from beyone the least significant bit.

到现在为止,我们认为下面的任何abcde(位置ABCDE)都无关紧要,但实际上,正如@Ternary 指出的那样,如果b=1, c=1, d=1然后(b+c)在位置G将导致位进位到位置F,这意味着(d+1)在位置F将进位E- 和我们的结果被宠坏了。请注意,感兴趣的最低有效位右侧的空间(c在此示例中)无关紧要,因为乘法将导致从最低有效位之外填充零。

So we need to modify our (m-1)/(n-m) rule. If there is more than one bit that has "exactly (n-m) unused bits to the right (not counting the last bit in the pattern - "c" in the example above), then we need to strengthen the rule - and we have to do so iteratively!

所以我们需要修改我们的 (m-1)/(nm) 规则。如果有多个位具有“恰好 (nm) 右侧的未使用位(不计算模式中的最后一位 - 上例中的“c”),那么我们需要加强规则 - 我们必须这样做迭代!

We have to look not only at the number of bits that meet the (n-m) criterion, but also the ones that are at (n-m+1), etc. Let's call their number Q0 (exactly n-mto next bit), Q1 (n-m+1), up to Q(N-1) (n-1). Then we risk carry if

我们不仅要查看满足 (nm) 标准的位数,还要查看位于 (n-m+1) 处的位数,等等。让我们称它们的数量为 Q0(正好n-m到下一位),Q1( n-m+1),直到 Q(N-1) (n-1)。那么我们风险进位如果

Q0 > 1
Q0 == 1 && Q1 >= 2
Q0 == 0 && Q1 >= 4
Q0 == 1 && Q1 > 1 && Q2 >=2
... 

If you look at this, you can see that if you write a simple mathematical expression

如果你看这个,你会发现如果你写一个简单的数学表达式

W = N * Q0 + (N - 1) * Q1 + ... + Q(N-1)

and the result is W > 2 * N, then you need to increase the RHS criterion by one bit to (n-m+1). At this point, the operation is safe as long as W < 4; if that doesn't work, increase the criterion one more, etc.

结果是W > 2 * N,那么您需要将 RHS 标准增加一位到(n-m+1)。此时,只要操作是安全的W < 4;如果这不起作用,请再增加一个标准,等等。

I think that following the above will get you a long way to your answer...

我认为遵循上述内容将使您的答案有很长的路要走...

回答by Syzygy

Very interesting question indeed. I'm chiming in with my two cents, which is that, if you can manage to state problems like this in terms of first-order logic over the bitvector theory, then theorem provers are your friend, and can potentially provide you with very fast answers to your questions. Let's re-state the problem being asked as a theorem:

确实很有趣的问题。我用我的两分钱表示赞同,也就是说,如果您能够根据位向量理论的一阶逻辑来解决这样的问题,那么定理证明者就是您的朋友,并且可能会非常快速地为您提供回答您的问题。让我们将问题重新表述为定理:

"There exists some 64-bit constants 'mask' and 'multiplicand' such that, for all 64-bit bitvectors x, in the expression y = (x & mask) * multiplicand, we have that y.63 == x.63, y.62 == x.55, y.61 == x.47, etc."

“存在一些 64 位常量‘掩码’和‘被乘数’,对于所有 64 位位向量 x,在表达式 y = (x & mask) * 被乘数中,我们有 y.63 == x.63 , y.62 == x.55, y.61 == x.47, 等等"

If this sentence is in fact a theorem, then it is true that some values of the constants 'mask' and 'multiplicand' satisfy this property. So let's phrase this in terms of something that a theorem prover can understand, namely SMT-LIB 2 input:

如果这句话实际上是一个定理,那么常量“掩码”和“被乘数”的某些值确实满足这个性质。因此,让我们用定理证明者可以理解的东西来表达这一点,即 SMT-LIB 2 输入:

(set-logic BV)

(declare-const mask         (_ BitVec 64))
(declare-const multiplicand (_ BitVec 64))

(assert
  (forall ((x (_ BitVec 64)))
    (let ((y (bvmul (bvand mask x) multiplicand)))
      (and
        (= ((_ extract 63 63) x) ((_ extract 63 63) y))
        (= ((_ extract 55 55) x) ((_ extract 62 62) y))
        (= ((_ extract 47 47) x) ((_ extract 61 61) y))
        (= ((_ extract 39 39) x) ((_ extract 60 60) y))
        (= ((_ extract 31 31) x) ((_ extract 59 59) y))
        (= ((_ extract 23 23) x) ((_ extract 58 58) y))
        (= ((_ extract 15 15) x) ((_ extract 57 57) y))
        (= ((_ extract  7  7) x) ((_ extract 56 56) y))
      )
    )
  )
)

(check-sat)
(get-model)

And now let's ask the theorem prover Z3 whether this is a theorem:

现在让我们问定理证明者 Z3 这是否是一个定理:

z3.exe /m /smt2 ExtractBitsThroughAndWithMultiplication.smt2

The result is:

结果是:

sat
(model
  (define-fun mask () (_ BitVec 64)
    #x8080808080808080)
  (define-fun multiplicand () (_ BitVec 64)
    #x0002040810204081)
)

Bingo! It reproduces the result given in the original post in 0.06 seconds.

答对了!它在 0.06 秒内重现了原始帖子中给出的结果。

Looking at this from a more general perspective, we can view this as being an instance of a first-order program synthesis problem, which is a nascent area of research about which few papers have been published. A search for "program synthesis" filetype:pdfshould get you started.

从更一般的角度来看,我们可以将其视为一阶程序综合问题的一个实例,这是一个新兴的研究领域,很少有论文发表。搜索"program synthesis" filetype:pdf应该让你开始。

回答by starblue

Every 1-bit in the multiplier is used to copy one of the bits into its correct position:

乘法器中的每一位用于将其中一位复制到其正确位置:

  • 1is already in the correct position, so multiply by 0x0000000000000001.
  • 2must be shifted 7 bit positions to the left, so we multiply by 0x0000000000000080(bit 7 is set).
  • 3must be shifted 14 bit positions to the left, so we multiply by 0x0000000000000400(bit 14 is set).
  • and so on until
  • 8must be shifted 49 bit positions to the left, so we multiply by 0x0002000000000000(bit 49 is set).
  • 1已经在正确的位置,所以乘以0x0000000000000001
  • 2必须向左移动 7 位位置,因此我们乘以0x0000000000000080(设置第 7 位)。
  • 3必须向左移动 14 位位置,因此我们乘以0x0000000000000400(设置第 14 位)。
  • 依此类推直到
  • 8必须向左移动 49 位位置,因此我们乘以0x0002000000000000(设置第 49 位)。

The multiplier is the sum of the multipliers for the individual bits.

乘数是各个位的乘数之和。

This only works because the bits to be collected are not too close together, so that the multiplication of bits which do not belong together in our scheme either fall beyond the 64 bit or in the lower don't-care part.

这只是因为要收集的位不太靠近,所以在我们的方案中不属于一起的位的乘法要么超出 64 位,要么在较低的不关心部分。

Note that the other bits in the original number must be 0. This can be achieved by masking them with an AND operation.

请注意,原始数字中的其他位必须是0。这可以通过使用 AND 操作屏蔽它们来实现。

回答by Ternary

(I'd never seen it before. This trick is great!)

(我以前从未见过。这个技巧很棒!)

I'll expand a bit on Floris's assertion that when extracting nbits you need n-1space between any non-consecutive bits:

我将对 Floris 的断言进行一些扩展,即在提取n位时,您需要n-1在任何非连续位之间留出空间:

My initial thought (we'll see in a minute how this doesn't quite work) was that you could do better: If you want to extract nbits, you'll have a collision when extracting/shifting bit iif you have anyone (non-consecutive with bit i) in the i-1bits preceding or n-ibits subsequent.

我最初的想法(我们将在一分钟内看到它是如何不起作用的)是你可以做得更好:如果你想提取n位,i如果你有任何人(非-ii-1前面的位或n-i后面的位中与位)连续。

I'll give a few examples to illustrate:

我举几个例子来说明:

...a..b...c...Works (nobody in the 2 bits after a, the bit before and the bit after b, and nobody is in the 2 bits before c):

...a..b...c...有效(在 之后的 2 位中a,在 之前和之后的位中b没有人,并且在 之前的 2 位中没有人c):

  a00b000c
+ 0b000c00
+ 00c00000
= abc.....

...a.b....c...Fails because bis in the 2 bits after a(and gets pulled into someone else's spot when we shift a):

...a.b....c...失败,因为b在后面的 2 位a(当我们移动时被拉到其他人的位置a):

  a0b0000c
+ 0b0000c0
+ 00c00000
= abX.....

...a...b.c...Fails because bis in the 2 bits preceding c(and gets pushed into someone else's spot when we shift c):

...a...b.c...失败,因为b在前面的 2 位中c(并且在我们 shift 时被推到其他人的位置c):

  a000b0c0
+ 0b0c0000
+ b0c00000
= Xbc.....

...a...bc...d...Works because consecutive bits shift together:

...a...bc...d...工作是因为连续的位一起移位:

  a000bc000d
+ 0bc000d000
+ 000d000000
= abcd000000

But we have a problem.If we use n-iinstead of n-1we could have the following scenario: what if we have a collision outside of the part that we care about, something we would mask away at the end, but whose carry bits end up interfering in the important un-masked range? (and note: the n-1requirement makes sure this doesn't happen by making sure the i-1bits after our un-masked range are clear when we shift the the ith bit)

但是我们有一个问题。如果我们使用n-i而不是n-1我们可能有以下情况:如果我们在我们关心的部分之外发生碰撞怎么办,我们会在最后屏蔽掉一些东西,但其进位最终会干扰重要的未屏蔽范围? (并注意:当我们移动第 th 位时,n-1要求通过确保i-1我们未屏蔽范围之后的位是清晰的来确保不会发生这种情况i

...a...b..c...d...Potential failure on carry-bits, cis in n-1after b, but satisfies n-icriteria:

...a...b..c...d...进位位的潜在故障cn-1after 中b,但满足n-i条件:

  a000b00c000d
+ 0b00c000d000
+ 00c000d00000
+ 000d00000000
= abcdX.......

So why don't we just go back to that "n-1bits of space" requirement? Because we can do better:

那么为什么我们不回到那个“n-1位空间”要求呢? 因为我们可以做得更好

...a....b..c...d..Failsthe "n-1bits of space" test, but worksfor our bit-extracting trick:

...a....b..c...d..n-1位空间”测试失败,但适用于我们的位提取技巧:

+ a0000b00c000d00
+ 0b00c000d000000
+ 00c000d00000000
+ 000d00000000000
= abcd...0X......

I can't come up with a good way to characterize these fields that don'thave n-1space between important bits, but still would work for our operation. However, since we know ahead of timewhich bits we're interested in we can check our filter to make sure we don't experience carry-bit collisions:

我不能想出一个好办法来描述这些字段具有n-1重要的位之间的空间,但依然会为我们的运营工作。然而,由于我们提前知道我们对哪些位感兴趣,我们可以检查我们的过滤器以确保我们不会遇到进位冲突:

Compare (-1 AND mask) * shiftagainst the expected all-ones result, -1 << (64-n)(for 64-bit unsigned)

(-1 AND mask) * shift与预期的全 1 结果进行比较-1 << (64-n)(对于 64 位无符号)

The magic shift/multiply to extract our bits works if and only if the two are equal.

当且仅当两者相等时,提取我们位的神奇移位/乘法才有效。

回答by TemplateRex

In addition to the already excellent answers to this very interesting question, it might be useful to know that this bitwise multiplication trick has been known in the computer chess community since 2007, where it goes under the name of Magic BitBoards.

除了对这个非常有趣的问题已经给出了很好的答案之外,知道这种按位乘法技巧自 2007 年以来就在计算机国际象棋社区中为人所知,在那里它以Magic BitBoards的名称命名。

Many computer chess engines use several 64-bit integers (called bitboards) to represent the various piece sets (1 bit per occupied square). Suppose a sliding piece (rook, bishop, queen) on a certain origin square can move to at most Ksquares if no blocking pieces were present. Using bitwise-and of those scattered Kbits with the bitboard of occupied squares gives a specific K-bit word embedded within a 64-bit integer.

许多计算机国际象棋引擎使用几个 64 位整数(称为位板)来表示各种棋子集(每个占据的方块 1 位)。假设K如果不存在阻挡件,则某个起源方格上的滑动件(车、象、后)最多可以移动到方格。使用这些分散K位的按位和与被占用的方块的K位板给出嵌入在 64 位整数中的特定位字。

Magic multiplication can be used to map these scattered Kbits to the lower Kbits of a 64-bit integer. These lower Kbits can then be used to index a table of pre-computed bitboards that representst the allowed squares that the piece on its origin square can actually move to (taking care of blocking pieces etc.)

魔术乘法可用于将这些分散的K位映射到K64 位整数的低位。K然后,这些较低的位可用于索引一个预先计算的位板表,该表表示其原始方块上的棋子实际上可以移动到的允许方块(处理阻塞的棋子等)

A typical chess engine using this approach has 2 tables (one for rooks, one for bishops, queens using the combination of both) of 64 entries (one per origin square) that contain such pre-computed results. Both the highest rated closed source (Houdini) and open source chess engine (Stockfish) currently use this approach for its very high performance.

使用这种方法的典型国际象棋引擎有 2 个表(一个用于车,一个用于主教,皇后使用两者的组合)包含此类预先计算的结果的 64 个条目(每个原始方块一个)。评价最高的封闭源代码 ( Houdini) 和开源国际象棋引擎 ( Stockfish) 目前都使用这种方法,因为它具有非常高的性能。

Finding these magic multipliers is done either using an exhaustive search(optimized with early cutoffs) or with trial and erorr(e.g. trying lots of random 64-bit integers). There have been no bit patterns used during move generation for which no magic constant could be found. However, bitwise carry effects are typically necessary when the to-be-mapped bits have (almost) adjacent indices.

使用穷举搜索(使用早期截止值进行优化)或使用试错法(例如尝试大量随机 64 位整数)来找到这些神奇的乘数。在移动生成期间没有使用任何位模式,无法找到魔术常数。然而,当要映射的位具有(几乎)相邻索引时,按位进位效应通常是必要的。

AFAIK, the very general SAT-solver approachy by @Syzygy has not been used in computer chess, and neither does there appear to be any formal theory regarding existence and uniqueness of such magic constants.

AFAIK,@Syzygy 使用的非常通用的 SAT 求解器方法尚未用于计算机国际象棋,而且似乎也没有任何关于此类魔术常数的存在性和唯一性的正式理论。