C++ 为什么地址零用于空指针?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2759845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is address zero used for the null pointer?
提问by Joel
In C (or C++ for that matter), pointers are special if they have the value zero: I am adviced to set pointers to zero after freeing their memory, because it means freeing the pointer again isn't dangerous; when I call malloc it returns a pointer with the value zero if it can't get me memory; I use if (p != 0)
all the time to make sure passed pointers are valid, etc.
在 C(或 C++)中,如果指针的值为零,则它们是特殊的:建议我在释放内存后将指针设置为零,因为这意味着再次释放指针并不危险;当我调用 malloc 时,如果它无法获得内存,它会返回一个值为 0 的指针;我一直使用if (p != 0)
以确保传递的指针有效等。
But since memory addressing starts at 0, isn't 0 just as a valid address as any other? How can 0 be used for handling null pointers if that is the case? Why isn't a negative number null instead?
但是由于内存寻址从 0 开始,0 是不是和其他地址一样有效?如果是这种情况,如何使用 0 来处理空指针?为什么负数不是空的?
Edit:
编辑:
A bunch of good answers. I'll summarize what has been said in the answers expressed as my own mind interprets it and hope that the community will correct me if I misunderstand.
一堆很好的答案。我将根据我自己的想法来总结所表达的答案中所说的内容,如果我误解了,希望社区能够纠正我。
Like everything else in programming it's an abstraction. Just a constant, not really related to the address 0. C++0x emphasizes this by adding the keyword
nullptr
.It's not even an address abstraction, it's the constant specified by the C standard and the compiler can translate it to some other number as long as it makes sure it never equals a "real" address, and equals other null pointers if 0 is not the best value to use for the platform.
In case it's not an abstraction, which was the case in the early days, the address 0 is used by the system and off limits to the programmer.
My negative number suggestion was a little wild brainstorming, I admit. Using a signed integer for addresses is a little wasteful if it means that apart from the null pointer (-1 or whatever) the value space is split evenly between positive integers that make valid addresses and negative numbers that are just wasted.
If any number is always representable by a datatype, it's 0. (Probably 1 is too. I think of the one-bit integer which would be 0 or 1 if unsigned, or just the signed bit if signed, or the two bit integer which would be [-2, 1]. But then you could just go for 0 being null and 1 being the only accessible byte in memory.)
就像编程中的其他一切一样,它是一种抽象。只是一个常量,与地址 0 没有真正的关系。C++0x 通过添加关键字来强调这一点
nullptr
。它甚至不是地址抽象,它是 C 标准指定的常量,编译器可以将其转换为其他数字,只要它确保它永远不等于“真实”地址,并且等于其他空指针,如果 0 不是平台使用的最佳价值。
如果它不是抽象(早期的情况),地址 0 将被系统使用,而不受程序员的限制。
我承认,我的负数建议有点疯狂的头脑风暴。对地址使用有符号整数有点浪费,如果这意味着除了空指针(-1 或其他)之外,值空间在生成有效地址的正整数和只是浪费的负数之间平均分配。
如果任何数字始终可以由数据类型表示,则它为 0。(可能 1 也是如此。我想到了一位整数,如果无符号则为 0 或 1,或者如果有符号则只是有符号位,或者是两位整数将是 [-2, 1]。但是您可以将 0 设为 null,将 1 设为内存中唯一可访问的字节。)
Still there is something that is unresolved in my mind. The Stack Overflow question Pointer to a specific fixed addresstells me that even if 0 for null pointer is an abstraction, other pointer values aren't necessarily. This leads me to post another Stack Overflow question, Could I ever want to access the address zero?.
在我的脑海中仍然有一些未解决的事情。Stack Overflow 问题Pointer to a specific fixed address告诉我,即使空指针的 0 是一个抽象,其他指针值也不一定。这导致我发布另一个堆栈溢出问题,我是否曾经想访问地址零?.
采纳答案by Michael Burr
2 points:
2分:
only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.
只有源代码中的常量值 0 是空指针 - 编译器实现可以在运行代码中使用它想要或需要的任何值。某些平台有一个特殊的指针值,该值是“无效的”,实现可能会将其用作空指针。C FAQ 有一个问题,“说真的,真的有任何实际机器真的使用过非零空指针,或者不同类型的指针的不同表示吗?” ,指出几个平台使用 0 这个属性作为 C 源中的空指针,但在运行时表示不同。C++ 标准有一个注释明确指出,转换“具有零值的整数常量表达式总是会产生一个空指针,
负值可能和地址一样被平台使用——C 标准只需要选择一些东西来指示空指针,然后选择了零。老实说,我不确定是否考虑了其他哨兵值。
The only requirements for a null pointer are:
空指针的唯一要求是:
- it's guaranteed to compare unequal to a pointer to an actual object
- any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)
- 它保证比较不等于指向实际对象的指针
- 任何两个空指针将比较相等(C++ 对此进行了改进,使其只需要保存指向同一类型的指针)
回答by Aviad P.
Historically, the address space starting at 0 was always ROM, used for some operating system or low level interrupt handling routines, nowadays, since everything is virtual (including address space), the operating system can map any allocation to any address, so it can specifically NOT allocate anything at address 0.
从历史上看,从 0 开始的地址空间总是 ROM,用于某些操作系统或低级中断处理例程,现在,由于一切都是虚拟的(包括地址空间),操作系统可以将任何分配映射到任何地址,因此它可以特别是不要在地址 0 分配任何东西。
回答by rmeador
IIRC, the "null pointer" value isn't guaranteed to be zero. The compiler translates 0 into whatever "null" value is appropriate for the system (which in practice is probably always zero, but not necessarily). The same translation is applied whenever you compare a pointer against zero. Because you can only compare pointers against each other and against this special-value-0, it insulates the programmer from knowing anything about the memory representation of the system. As for why they chose 0 instead of 42 or somesuch, I'm going to guess it's because most programmers start counting at 0 :) (Also, on most systems 0 is the first memory address and they wanted it to be convenient, since in practice translations like I'm describing rarely actually take place; the language just allows for them).
IIRC,“空指针”值不能保证为零。编译器将 0 转换为适合系统的任何“空”值(实际上可能始终为零,但不一定)。每当您将指针与零进行比较时,都会应用相同的转换。因为您只能将指针相互比较并与这个特殊值 0 进行比较,所以它使程序员无法了解有关系统内存表示的任何信息。至于为什么他们选择 0 而不是 42 或类似的东西,我猜是因为大多数程序员从 0 开始计数:)(此外,在大多数系统上,0 是第一个内存地址,他们希望它方便,因为在像我所描述的那样练习翻译实际上很少发生;语言只是允许它们)。
回答by AnT
You must be misunderstanding the meaning of constant zero in pointer context.
您一定误解了指针上下文中常量零的含义。
Neither in C nor in C++ pointers can "have value zero". Pointers are not arithmetic objects. They canot have numerical values like "zero" or "negative" or anything of that nature. So your statement about "pointers ... have the value zero" simply makes no sense.
无论是在 C 中还是在 C++ 中,指针都不能“具有零值”。指针不是算术对象。它们不能具有诸如“零”或“负”之类的数值或任何此类性质的数值。所以你关于“指针......具有零值”的陈述根本没有意义。
In C & C++ pointers can have the reserved null-pointer value. The actual representation of null-pointer value has nothing to do with any "zeros". It can be absolutely anything appropriate for a given platform. It is true that on most plaforms null-pointer value is represented physically by an actual zero address value. However, if on some platform address 0 is actually used for some purpose (i.e. you might need to create objects at address 0), the null-pointer value on such platform will most likely be different. It could be physically represented as 0xFFFFFFFF
address value or as 0xBAADBAAD
address value, for example.
在 C 和 C++ 中,指针可以具有保留的空指针值。空指针值的实际表示与任何“零”无关。它绝对可以是适合给定平台的任何东西。确实,在大多数平台上,空指针值在物理上由实际的零地址值表示。但是,如果在某些平台上地址 0 实际用于某些目的(即您可能需要在地址 0 处创建对象),则此类平台上的空指针值很可能会有所不同。例如,它可以物理地表示为0xFFFFFFFF
地址值或0xBAADBAAD
地址值。
Nevertheless, regardless of how the null-pointer value is respresented on a given platform, in your code you will still continue to designate null-pointers by constant 0
. In order to assign a null-pointer value to a given pointer, you will continue to use expressions like p = 0
. It is the compiler's responsibility to realize what you want and translate it into the proper null-pointer value representation, i.e. to translate it into the code that will put the address value of 0xFFFFFFFF
into the pointer p
, for example.
然而,无论在给定平台上如何表示空指针值,在您的代码中,您仍将继续通过 constant 指定空指针0
。为了将空指针值分配给给定的指针,您将继续使用像p = 0
. 编译器有责任实现您想要的内容并将其转换为正确的空指针值表示,例如,将其转换为将 的地址值0xFFFFFFFF
放入指针的代码p
。
In short, the fact that you use 0
in your sorce code to generate null-pointer values does not mean that the null-pointer value is somehow tied to address 0
. The 0
that you use in your source code is just "syntactic sugar" that has absolutely no relation to the actual physical address the null-pointer value is "pointing" to.
简而言之,您0
在源代码中使用来生成空指针值的事实并不意味着空指针值以某种方式与 address 相关联0
。将0
您在源代码中使用的仅仅是“语法糖”是绝对没有关系的空指针值“指点”实际的物理地址。
回答by ChrisW
But since memory addressing starts at 0, isn't 0 just as a valid address as any other?
但是由于内存寻址从 0 开始,0 是不是和其他地址一样有效?
On some/many/all operating systems, memory address 0 is special in some way. For example, it's often mapped to invalid/non-existent memory, which causes an exception if you try to access it.
在某些/许多/所有操作系统上,内存地址 0 在某些方面是特殊的。例如,它通常映射到无效/不存在的内存,如果您尝试访问它,则会导致异常。
Why isn't a negative number null instead?
为什么负数不是空的?
I think that pointer values are typically treated as unsigned numbers: otherwise for example a 32-bit pointer would only be able to address 2 GB of memory, instead of 4 GB.
我认为指针值通常被视为无符号数:否则,例如 32 位指针只能寻址 2 GB 内存,而不是 4 GB。
回答by KPexEA
My guess would be that the magic value 0 was picked to define an invalid pointer since it could be tested for with less instructions. Some machine languages automatically set the zero and sign flags according to the data when loading registers so you could test for a null pointer with a simple load then and branch instructions without doing a separate compare instruction.
我的猜测是选择了魔术值 0 来定义无效指针,因为它可以用较少的指令进行测试。某些机器语言在加载寄存器时会根据数据自动设置零和符号标志,因此您可以使用简单的加载 then 和分支指令测试空指针,而无需执行单独的比较指令。
(Most ISAs only set flags on ALU instructions, not loads, though. And usually you aren't producing pointers via computation, except in the compiler when parsing C source. But at least you don't need an arbitrary pointer-width constant to compare against.)
(大多数 ISA 仅在 ALU 指令上设置标志,而不是在加载上设置标志。通常您不会通过计算生成指针,除非在解析 C源代码时在编译器中。但至少您不需要任意指针宽度常量来比较。)
On the Commodore Pet, Vic20, and C64 which were the first machines I worked on, RAM started at location 0 so it was totally valid to read and write using a null pointer if you really wanted to.
在 Commodore Pet、Vic20 和 C64 这些是我工作的第一台机器上,RAM 从位置 0 开始,因此如果您真的愿意,使用空指针进行读写是完全有效的。
回答by George Phillips
Although C uses 0 to represent the null pointer, do keep in mind that the value of the pointer itself may not be a zero. However, most programmers will only ever use systems where the null pointer is, in fact, 0.
尽管 C 使用 0 来表示空指针,但请记住,指针本身的值可能不是零。然而,大多数程序员只会使用空指针实际上为 0 的系统。
But why zero? Well, it's one address that every system shares. And oftentimes the low addresses are reserved for operating system purposes thus the value works well as being off-limits to application programs. Accidental assignment of an integer value to a pointer is as likely to end up zero as anything else.
但为什么是零?嗯,这是每个系统共享的一个地址。并且通常低地址是为操作系统目的而保留的,因此该值可以很好地作为应用程序的禁区。将整数值意外分配给指针的可能性与其他任何事情一样最终为零。
回答by Axel Gneiting
I think it's just a convention. There must be some value to mark an invalid pointer.
我认为这只是一个约定。必须有一些值来标记无效指针。
You just lose one byte of address space, that should rarely be a problem.
您只是丢失了一个字节的地址空间,这应该很少成为问题。
There are no negative pointers. Pointers are always unsigned. Also if they could be negative your convention would mean that you lose half the address space.
没有负指针。指针总是无符号的。此外,如果它们可能是负数,您的约定将意味着您会丢失一半的地址空间。
回答by Fred Haslam
Historically the low memory of an application was occupied by system resources. It was in those days that zero became the default null value.
历史上,应用程序的低内存被系统资源占用。在那些日子里,零成为默认的空值。
While this is not necessarily true for modern systems, it is still a bad idea to set pointer values to anything but what memory allocation has handed you.
虽然这对于现代系统来说不一定是正确的,但将指针值设置为除了内存分配给您的任何内容仍然是一个坏主意。
回答by Edward Strange
Regarding the argument about not setting a pointer to null after deleting it so that future deletes "expose errors"...
关于在删除指针后不将指针设置为 null 以便将来删除“暴露错误”的论点...
If you're really, really worried about this then a better approach, one that is guaranteed to work, is to leverage assert():
如果您真的非常担心这一点,那么一种更好的方法(保证有效)是利用 assert():
...
assert(ptr && "You're deleting this pointer twice, look for a bug?");
delete ptr;
ptr = 0;
...
This requires some extra typing, and one extra check during debug builds, but it is certain to give you what you want: notice when ptr is deleted 'twice'. The alternative given in the comment discussion, not setting the pointer to null so you'll get a crash, is simply not guaranteed to be successful. Worse, unlike the above, it can cause a crash (or much worse!) on a user if one of these "bugs" gets through to the shelf. Finally, this version lets you continue to run the program to see what actually happens.
这需要一些额外的输入,并在调试构建期间进行一次额外的检查,但它肯定会为您提供您想要的:注意 ptr 何时被“两次”删除。评论讨论中给出的替代方案,不将指针设置为 null 从而导致崩溃,根本不能保证成功。更糟糕的是,与上述不同,如果这些“错误”之一进入货架,它可能会导致用户崩溃(或更糟!)。最后,这个版本让你继续运行程序,看看实际发生了什么。
I realize this does not answer the question asked, but I was worried that someone reading the comments might come to the conclusion that it is considered 'good practice' to NOT set pointers to 0 if it is possible they get sent to free() or delete twice. In those few cases when it is possible it is NEVER a good practice to use Undefined Behavior as a debugging tool. Nobody that's ever had to hunt down a bug that was ultimately caused by deleting an invalid pointer would propose this. These kinds of errors take hours to hunt down and nearly alway effect the program in a totally unexpected way that is hard to impossible to track back to the original problem.
我意识到这并没有回答所提出的问题,但我担心有人阅读评论可能会得出这样的结论,即如果有可能将指针发送到 free() 或删除两次。在少数情况下,使用未定义行为作为调试工具从来都不是一个好习惯。没有人曾经不得不追捕最终由删除无效指针引起的错误会提出这个建议。这些类型的错误需要数小时才能找到,并且几乎总是以完全出乎意料的方式影响程序,很难甚至不可能追溯到原始问题。