C/C++ 位数组或位向量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4604130/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C/C++ Bit Array or Bit Vector
提问by Srikar Appalaraju
I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
我正在学习 C/C++ 编程并遇到过“位数组”或“位向量”的用法。是不是无法理解他们的目的?这是我的疑问 -
- Are they used as boolean flags?
- Can one use
int
arrays instead? (more memory of course, but..) - What's this concept of Bit-Masking?
- If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
- 它们是否用作布尔标志?
- 可以用
int
数组代替吗?(当然更多的内存,但是..) - 位掩码的概念是什么?
- 如果位掩码是简单的位操作以获得适当的标志,那么如何为它们编程?在 head 中执行此操作以查看与十进制数相关的标志是什么并不困难吗?
I am looking for applications, so that I can understand better. for Eg -
我正在寻找应用程序,以便我可以更好地理解。例如 -
Q.You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing numbers?
问:给你一个包含范围(1 到 100 万)整数的文件。有一些重复,因此缺少一些数字。找到查找丢失数字的最快方法?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
对于上述问题,我已经阅读了告诉我使用位数组的解决方案。如何将每个整数存储在位中?
回答by
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
我认为您对数组和数字感到困惑,特别是操作二进制数意味着什么。
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
我将通过示例来解决这个问题。假设您有许多错误消息,并且您希望在函数的返回值中返回它们。现在,您可能会将您的错误标记为 1,2,3,4... 这对您的想法很有意义,但是您如何在仅给定一个数字的情况下找出发生了哪些错误?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000
where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101
. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
现在,尝试标记错误 1,2,4,8,16... 基本上是增加 2 的幂。为什么这样做?好吧,当你以 2 为基数工作时,你正在操纵一个数字,比如00000000
每个数字对应于 2 的幂乘以它从右边的位置。因此,假设发生了错误 1、4 和 8。那么,这可以表示为00001101
. 反过来,第一位=1*2^0,第三位1*2^2,第四位1*2^3。将它们全部加起来就是 13。
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8
has occured, use the bit representation of 8 = 00001000
. Now, in order to extract whether or not that error has occured, use a binary and like so:
现在,我们可以通过应用位掩码来测试是否发生了此类错误。例如,如果您想计算8
是否发生了错误,请使用 8 = 的位表示00001000
。现在,为了提取是否发生了该错误,请使用二进制文件,如下所示:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
我相信你知道 and 是如何工作的,或者可以从上面推断出来 - 以数字方式工作,如果任何两位数字都是 1,则结果为 1,否则为 0。
Now, in C:
现在,在 C 中:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
现在,实用。当内存稀疏并且协议没有冗长的 xml 等时,通常将字段分隔为如此多的位宽。在该字段中,您将各种位(标志、2 的幂)分配给某个含义并应用二元运算来推断它们是否已设置,然后对这些位进行运算。
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
我还应该补充一点,二进制操作与计算机的底层电子设备的想法很接近。想象一下,如果位字段对应于各种电路的输出(载流与否)。通过使用足够多的上述电路组合,您可以制造出……一台计算机。
回答by SivGo
regarding the usage the bits array :
关于位数组的用法:
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
如果你知道“只有”100 万个数字——你使用一个 100 万位的数组。一开始所有位都为零,每次读取数字时 - 使用此数字作为索引并将此索引中的位更改为 1(如果它不是 1)。
after reading all numbers - the missing numbers are the indices of the zeros in the array.
读取所有数字后 - 缺失的数字是数组中零的索引。
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0. if we read the numbers : 3, 2, 2 the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
例如,如果我们只有 0 - 4 之间的数字,则数组在开始时将如下所示:0 0 0 0 0。如果我们读取数字:3, 2, 2,则数组将如下所示: read 3 -- > 0 0 0 1 0. 再次读取 3 --> 0 0 0 1 0. 读取 2 --> 0 0 1 1 0. 检查零的索引:0,1,4 - 这些是缺失的数字
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
顺便说一句,当然你可以使用整数而不是位,但它可能需要(取决于系统)32 倍的内存
Sivan
西万
回答by kriss
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
位向量的位阵列用作从位置到某个位值的映射。是的,它与 Bool 数组基本相同,但典型的 Bool 实现是一到四个字节长,并且使用太多空间。
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
通过使用字数组和二进制掩码操作以及移位来存储和检索它们,我们可以更有效地存储相同数量的数据(使用更少的总内存、更少的内存访问、更少的缓存未命中、更少的内存页面交换)。访问单个位的代码仍然非常简单。
There is also some bit field support builtin in C language (you write things like int i:1;
to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
在 C 语言中也有一些内置的位域支持(你写的东西像是int i:1;
说“只消耗一位”),但它不适用于数组,并且你对整体结果的控制较少(实现的细节取决于编译器和对齐问题)。
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5
instead of / 32
and & 31
instead of % 32
, but that is just to give the idea.
以下是回答“搜索缺失号码”问题的一种可能方法。我将 int 大小固定为 32 位以保持简单,但可以使用 sizeof(int) 编写它以使其可移植。并且(取决于编译器和目标处理器)只能使用>> 5
代替/ 32
和& 31
代替使代码更快% 32
,但这只是为了提供想法。
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
回答by codymanix
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed. This gets handy if you have lots of such data so you save memory at large.
位数组或位向量可以作为布尔值数组。通常一个布尔变量至少需要一个字节存储,但在位数组/向量中只需要一位。如果您有大量此类数据,这会很方便,因此您可以节省大量内存。
Another usage is if you have numbers which do not exactly fit in standard variableswhich are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
另一种用法是,如果您的数字不完全适合8、16、32 或 64 位大小的标准变量。您可以通过这种方式将一个数字存储到 16 位的位向量中,该数字由 4 位组成,一个是 2 位,一个是 10 位。通常,您必须使用 3 个大小分别为 8,8 和 16 位的变量,因此您只会浪费 50% 的存储空间。
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interopfunctions and doing low level programming.
但是所有这些用途都很少用于业务应用程序,在通过pinvoke/interop功能连接驱动程序和进行低级编程时经常使用。
回答by Nickolay Olshevsky
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.
这用于位标志存储,以及解析不同的二进制协议字段,其中 1 个字节分为多个位字段。这在 TCP/IP 等协议中被广泛使用,最高可达 ASN.1 编码、OpenPGP 数据包等。