C语言为什么 C 中的箭头 (->) 运算符存在？

Question

提问by Askaga

The dot (.) operator is used to access a member of a struct, while the arrow operator (->) in C is used to access a member of a struct which is referenced by the pointer in question.

点 ( .) 运算符用于访问结构的成员，而->C 中的箭头运算符 ( ) 用于访问由相关指针引用的结构的成员。

The pointer itself does not have any members which could be accessed with the dot operator (it's actually only a number describing a location in virtual memory so it doesn't have any members). So, there would be no ambiguity if we just defined the dot operator to automatically dereference the pointer if it is used on a pointer (an information which is known to the compiler at compile time afaik).

指针本身没有任何可以用点运算符访问的成员（它实际上只是一个描述虚拟内存中位置的数字，因此它没有任何成员）。因此，如果我们只是定义点运算符以在指针上使用指针时自动取消引用指针（编译器在编译时已知的信息 afaik），则不会有歧义。

So why have the language creators decided to make things more complicated by adding this seemingly unnecessary operator? What is the big design decision?

那么为什么语言创建者决定通过添加这个看似不必要的操作符来让事情变得更加复杂呢？什么是重大设计决策？

Answer 1

回答by AnT

I'll interpret your question as two questions: 1) why ->even exists, and 2) why .does not automatically dereference the pointer. Answers to both questions have historical roots.

我会将您的问题解释为两个问题：1）为什么->甚至存在，以及 2）为什么.不自动取消引用指针。这两个问题的答案都有历史渊源。

Why does ->even exist?

为什么->还要存在？

In one of the very first versions of C language (which I will refer as CRM for "C Reference Manual", which came with 6th Edition Unix in May 1975), operator ->had very exclusive meaning, not synonymous with *and .combination

在 C 语言的最早版本之一（我将其称为“ C 参考手册”的CRM ，它在 1975 年 5 月随第 6 版 Unix 一起提供）中，operator->具有非常独特的含义，不是同义词*和.组合

The C language described by CRM was very different from the modern C in many respects. In CRM struct members implemented the global concept of byte offset, which could be added to any address value with no type restrictions. I.e. all names of all struct members had independent global meaning (and, therefore, had to be unique). For example you could declare

CRM 所描述的 C 语言在许多方面与现代 C 语言有很大不同。在 CRM 结构成员中实现了字节偏移的全局概念，可以将其添加到任何地址值而没有类型限制。即所有结构成员的所有名称都具有独立的全局含义（因此，必须是唯一的）。例如，您可以声明

struct S {
  int a;
  int b;
};

and name awould stand for offset 0, while name bwould stand for offset 2 (assuming inttype of size 2 and no padding). The language required all members of all structs in the translation unit either have unique names or stand for the same offset value. E.g. in the same translation unit you could additionally declare

并且 namea将代表偏移量 0，而 nameb将代表偏移量 2（假设int类型为 2 且没有填充）。该语言要求翻译单元中所有结构的所有成员具有唯一的名称或代表相同的偏移值。例如，在同一个翻译单元中，您可以另外声明

struct X {
  int a;
  int x;
};

and that would be OK, since the name awould consistently stand for offset 0. But this additional declaration

这样就可以了，因为名称a始终代表偏移量 0。但是这个额外的声明

struct Y {
  int b;
  int a;
};

would be formally invalid, since it attempted to "redefine" aas offset 2 and bas offset 0.

将正式无效，因为它试图“重新定义”a为偏移量 2 和b偏移量 0。

And this is where the ->operator comes in. Since every struct member name had its own self-sufficient global meaning, the language supported expressions like these

这就是->操作符的用武之地。由于每个结构成员名称都有其自给自足的全局含义，因此该语言支持这样的表达式

int i = 5;
i->b = 42;  /* Write 42 into `int` at address 7 */
100->a = 0; /* Write 0 into `int` at address 100 */

The first assignment was interpreted by the compiler as "take address 5, add offset 2to it and assign 42to the intvalue at the resultant address". I.e. the above would assign 42to intvalue at address 7. Note that this use of ->did not care about the type of the expression on the left-hand side. The left hand side was interpreted as an rvalue numerical address (be it a pointer or an integer).

第一个赋值被编译器解释为“获取地址5，向其添加偏移量2并分配42给int结果地址处的值”。即上面的将分配42给int地址处的值7。请注意，这种用法->并不关心左侧表达式的类型。左侧被解释为右值数字地址（无论是指针还是整数）。

This sort of trickery was not possible with *and .combination. You could not do

这种诡计是不可能*和.组合在一起的。你做不到

(*i).b = 42;

since *iis already an invalid expression. The *operator, since it is separate from ., imposes more strict type requirements on its operand. To provide a capability to work around this limitation CRM introduced the ->operator, which is independent from the type of the left-hand operand.

因为*i已经是一个无效的表达式。该*运营商，因为它是从单独.的操作数，强加更加严格类型的要求。为了提供解决此限制的能力，CRM 引入了->运算符，该运算符与左侧操作数的类型无关。

As Keith noted in the comments, this difference between ->and *+.combination is what CRM is referring to as "relaxation of the requirement" in 7.1.8: Except for the relaxation of the requirement that E1be of pointer type, the expression E1?>MOSis exactly equivalent to (*E1).MOS

正如 Keith 在评论中指出的，->和*+.组合之间的这种区别就是 CRM 在 7.1.8 中所说的“放宽要求”：除了放宽E1指针类型的要求外，该表达式E1?>MOS完全等同于(*E1).MOS

Later, in K&R C many features originally described in CRM were significantly reworked. The idea of "struct member as global offset identifier" was completely removed. And the functionality of ->operator became fully identical to the functionality of *and .combination.

后来，在 K&R C 中，对 CRM 中最初描述的许多功能进行了重大修改。完全删除了“结构成员作为全局偏移标识符”的想法。并且->运算符的功能*与.组合的功能完全相同。

Why can't .dereference the pointer automatically?

为什么不能.自动取消引用指针？

Again, in CRM version of the language the left operand of the .operator was required to be an lvalue. That was the onlyrequirement imposed on that operand (and that's what made it different from ->, as explained above). Note that CRM did notrequire the left operand of .to have a struct type. It just required it to be an lvalue, anylvalue. This means that in CRM version of C you could write code like this

同样，在该语言的 CRM 版本中，.运算符的左操作数必须是左值。这是对该操作数施加的唯一要求（这就是它与不同的原因->，如上所述）。请注意，CRM不要求左操作数.具有结构类型。它只是要求它是一个左值，任何左值。这意味着在 C 的 CRM 版本中，您可以编写这样的代码

struct S { int a, b; };
struct T { float x, y, z; };

struct T c;
c.b = 55;

In this case the compiler would write 55into an intvalue positioned at byte-offset 2 in the continuous memory block known as c, even though type struct Thad no field named b. The compiler would not care about the actual type of cat all. All it cared about is that cwas an lvalue: some sort of writable memory block.

在这种情况下，编译器会写入55一个int名为的连续内存块中位于字节偏移 2 处的值c，即使 typestruct T没有名为的字段b。编译器根本不会关心的实际类型c。它所关心的c只是一个左值：某种可写的内存块。

Now note that if you did this

现在请注意，如果你这样做

S *s;
...
s.b = 42;

the code would be considered valid (since sis also an lvalue) and the compiler would simply attempt to write data into the pointer sitself, at byte-offset 2. Needless to say, things like this could easily result in memory overrun, but the language did not concern itself with such matters.

代码将被认为是有效的（因为s它也是一个左值）并且编译器会简单地尝试将数据写入指针s本身，字节偏移量为 2。不用说，这样的事情很容易导致内存溢出，但是语言不关心这些事情。

I.e. in that version of the language your proposed idea about overloading operator .for pointer types would not work: operator .already had very specific meaning when used with pointers (with lvalue pointers or with any lvalues at all). It was very weird functionality, no doubt. But it was there at the time.

即，在该版本的语言中，您提出的有关.为指针类型重载运算符的想法不起作用：运算符.在与指针一起使用时（与左值指针或任何左值一起使用）已经具有非常特定的含义。毫无疑问，这是非常奇怪的功能。但当时它就在那里。

Of course, this weird functionality is not a very strong reason against introducing overloaded .operator for pointers (as you suggested) in the reworked version of C - K&R C. But it hasn't been done. Maybe at that time there was some legacy code written in CRM version of C that had to be supported.

当然，这种奇怪的功能并不是反对.在重新设计的 C - K&R C 版本中为指针引入重载运算符（如您所建议的）的一个非常有力的理由。但它尚未完成。也许当时有一些必须支持的用 C 的 CRM 版本编写的遗留代码。

(The URL for the 1975 C Reference Manual may not be stable. Another copy, possibly with some subtle differences, is here.)

（1975 C 参考手册的 URL 可能不稳定。另一个副本，可能有一些细微的差异，在这里。）

Answer 2

回答by effeffe

Beyond historical (good and already reported) reasons, there's is also a little problem with operators precedence: dot operator has higher priority than star operator, so if you have struct containing pointer to struct containing pointer to struct... These two are equivalent:

除了历史（好的和已经报告的）原因之外，运算符优先级也有一个小问题：点运算符的优先级高于星号运算符，所以如果你有包含指向结构的指针的结构，包含指向结构的指针......这两个是等价的：

(*(*(*a).b).c).d

a->b->c->d

But the second is clearly more readable. Arrow operator has the highest priority (just as dot) and associates left to right. I think this is clearer than use dot operator both for pointers to struct and struct, because we know the type from the expression without have to look at the declaration, that could even be in another file.

但第二个显然更具可读性。箭头运算符具有最高优先级（就像点一样）并且从左到右关联。我认为这比将点运算符用于指向 struct 和 struct 的指针更清晰，因为我们从表达式中知道类型，而不必查看声明，甚至可以在另一个文件中。

Answer 3

回答by mukunda

C also does a good job at not making anything ambiguous.

C 在不使任何模棱两可的事情上也做得很好。

Sure the dot could be overloaded to mean both things, but the arrow makes sure that the programmer knows that he's operating on a pointer, just like when the compiler won't let you mix two incompatible types.

当然，点可以重载以表示两者，但箭头确保程序员知道他正在对指针进行操作，就像编译器不允许您混合两种不兼容的类型一样。

C语言为什么 C 中的箭头 (->) 运算符存在？

提问by Askaga

回答by AnT

回答by effeffe

回答by mukunda

相关推荐

最近更新

标签

C语言 为什么 C 中的箭头 (->) 运算符存在？

提问by Askaga

回答by AnT

回答by effeffe

回答by mukunda

相关推荐

C语言 使用赋值而不是 memcpy() 在 C 中复制结构

C语言 在 C 中将大端转换为小端 [不使用提供的函数]

C语言 释放'void *'可以吗？

C语言 错误：此处未声明（不在函数中）

相关推荐

最近更新

标签

C语言为什么 C 中的箭头 (->) 运算符存在？

C语言使用赋值而不是 memcpy() 在 C 中复制结构

C语言在 C 中将大端转换为小端 [不使用提供的函数]

C语言释放'void *'可以吗？

C语言错误：此处未声明（不在函数中）