windows 为什么 Windows64 使用与 x86-64 上的所有其他操作系统不同的调用约定?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4429398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 07:52:25  来源:igfitidea点击:

Why does Windows64 use a different calling convention from all other OSes on x86-64?

windowsx86-64calling-convention

提问by JanKanis

AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it's own x86-64 calling convention. Why?

AMD 有一个 ABI 规范,描述了在 x86-64 上使用的调用约定。所有操作系统都遵循它,除了具有自己的 x86-64 调用约定的 Windows。为什么?

Does anyone know the technical, historical, or political reasons for this difference, or is it purely a matter of NIHsyndrome?

有谁知道这种差异的技术、历史或原因,还是纯粹是 NIH 综合症的问题?

I understand that different OSes may have different needs for higher level things, but that doesn't explain why for example the register parameter passing order on Windows is rcx - rdx - r8 - r9 - rest on stackwhile everyone else uses rdi - rsi - rdx - rcx - r8 - r9 - rest on stack.

我知道不同的操作系统可能对更高级别的东西有不同的需求,但这并不能解释为什么例如 Windows 上的 register 参数传递顺序是rcx - rdx - r8 - r9 - rest on stack其他人使用rdi - rsi - rdx - rcx - r8 - r9 - rest on stack.

P.S. I am aware of howthese calling conventions differ generally and I know where to find details if I need to. What I want to know is why.

PS 我知道这些调用约定通常有何不同,如果需要,我知道在哪里可以找到详细信息。我想知道的是为什么

Edit: for the how, see e.g. the wikipedia entryand links from there.

编辑:有关如何,请参阅例如维基百科条目和那里的链接。

回答by FrankH.

Choosing fourargument registers on x64 - common to UN*X / Win64

在 x64 上选择四个参数寄存器 - UN*X / Win64 通用

One of the things to keep in mind about x86 is that the register name to "reg number" encoding is not obvious; in terms of instruction encoding (the MOD R/Mbyte, see http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm), register numbers 0...7 are - in that order - ?AX, ?CX, ?DX, ?BX, ?SP, ?BP, ?SI, ?DI.

关于 x86 需要记住的一件事是“reg number”编码的寄存器名称并不明显;在指令编码方面(MOD R/M字节,请参见http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm),寄存器编号 0...7 是 - 按该顺序 - ?AX, ?CX, ?DX, ?BX, ?SP, ?BP, ?SI, ?DI.

Hence choosing A/C/D (regs 0..2) for return value and the first two arguments (which is the "classical" 32bit __fastcallconvention) is a logical choice. As far as going to 64bit is concerned, the "higher" regs are ordered, and both Microsoft and UN*X/Linux went for R8/ R9as the first ones.

因此,选择 A/C/D(regs 0..2)作为返回值和前两个参数(这是“经典的”32 位__fastcall约定)是一个合乎逻辑的选择。就 64 位而言,订购了“更高”的 reg,Microsoft 和 UN*X/Linux 都将R8/R9作为第一个。

Keeping that in mind, Microsoft's choice of RAX(return value) and RCX, RDX, R8, R9(arg[0..3]) are an understandable selection if you choose fourregisters for arguments.

记住这一点,如果您为参数选择四个寄存器,Microsoft 的RAX(return value) 和RCX, RDX, R8, R9(arg[0..3]) 选择是可以理解的选择。

I don't know why the AMD64 UN*X ABI chose RDXbefore RCX.

我不知道为什么RDX之前选择了 AMD64 UN*X ABI RCX

Choosing sixargument registers on x64 - UN*X specific

在 x64 上选择六个参数寄存器 - UN*X 特定

UN*X, on RISC architectures, has traditionally done argument passing in registers - specifically, for the first sixarguments (that's so on PPC, SPARC, MIPS at least). Which might be one of the major reasons why the AMD64 (UN*X) ABI designers chose to use six registers on that architecture as well.

在 RISC 架构上,UN*X 传统上在寄存器中传递参数——特别是前六个参数(至少在 PPC、SPARC、MIPS 上是这样)。这可能是 AMD64 (UN*X) ABI 设计人员选择在该架构上也使用六个寄存器的主要原因之一。

So if you want sixregisters to pass arguments in, and it's logical to choose RCX, RDX, R8and R9for four of them, which other two should you pick ?

因此,如果您想要六个寄存器来传递参数,并且为其中四个选择、和是合乎逻辑的RCX,那么您应该选择另外哪两个?RDXR8R9

The "higher" regs require an additional instruction prefix byte to select them and therefore have a bigger instruction size footprint, so you wouldn't want to choose any of those if you have options. Of the classical registers, due to the implicitmeaning of RBPand RSPthese aren't available, and RBXtraditionally has a special use on UN*X (global offset table) which seemingly the AMD64 ABI designers didn't want to needlessly become incompatible with.
Ergo, the only choicewere RSI/ RDI.

“更高”的 regs 需要一个额外的指令前缀字节来选择它们,因此有更大的指令大小占用空间,所以如果你有选择,你不会想要选择任何一个。在经典寄存器中,由于 的隐含含义RBPRSP这些不可用,并且RBX传统上在 UN*X(全局偏移表)上有特殊用途,似乎 AMD64 ABI 设计人员不想不必要地与它们不兼容。
因此,唯一的选择RSI/ RDI

So if you have to take RSI/ RDIas argument registers, which arguments should they be ?

因此,如果您必须将RSI/RDI作为参数寄存器,它们应该是哪些参数?

Making them arg[0]and arg[1]has some advantages. See cHao's comment.
?SIand ?DIare string instruction source / destination operands, and as cHao mentioned, their use as argument registers means that with the AMD64 UN*X calling conventions, the simplest possible strcpy()function, for example, only consists of the two CPU instructions repz movsb; retbecause the source/target addresses have been put into the correct registers by the caller. There is, particularly in low-level and compiler-generated "glue" code (think, for example, some C++ heap allocators zero-filling objects on construction, or the kernel zero-filling heap pages on sbrk(), or copy-on-write pagefaults) an enormous amount of block copy/fill, hence it'll be useful for code so frequently used to save the two or three CPU instructions that'd otherwise load such source/target address arguments into the "correct" registers.

制作它们arg[0]arg[1]具有一些优势。见 cHao 的评论。
?SI?DI是字符串指令源/目标操作数,正如 cHao 提到的,它们用作参数寄存器意味着使用 AMD64 UN*X 调用约定,strcpy()例如,最简单的函数仅包含两条 CPU 指令,repz movsb; ret因为源/目标地址已被调用者放入正确的寄存器中。尤其是在低级和编译器生成的“胶水”代码中(例如,一些 C++ 堆分配器在构造时对对象进行零填充,或者在内核上对堆页面进行零填充sbrk(),或写时复制页面错误)大量的块复制/填充,因此它对于经常用于保存两个或三个 CPU 指令的代码非常有用,否则这些指令会将此类源/目标地址参数加载到“正确”寄存器。

So in a way, UN*X and Win64 are only different in that UN*X "prepends" two additional arguments, in purposefully chosen RSI/RDIregisters, to the natural choice of four arguments in RCX, RDX, R8and R9.

因此,在某种程度上,联合国* X和Win64中是只有在UN * X“预规划”两个额外的参数,在特意挑选不同RSI/RDI寄存器,为四个参数中的自然选择RCXRDXR8R9

Beyond that ...

除此之外 ...

There are more differences between the UN*X and Windows x64 ABIs than just the mapping of arguments to specific registers. For the overview on Win64, check:

UN*X 和 Windows x64 ABI 之间的区别不仅仅是参数到特定寄存器的映射。有关 Win64 的概述,请检查:

http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx

http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx

Win64 and AMD64 UN*X also strikingly differ in the way stackspace is used; on Win64, for example, the caller mustallocate stackspace for function arguments even though args 0...3 are passed in registers. On UN*X on the other hand, a leaf function (i.e. one that doesn't call other functions) is not even required to allocate stackspace at all if it needs no more than 128 Bytes of it (yes, you own and can use a certain amount of stack without allocating it ... well, unless you're kernel code, a source of nifty bugs). All these are particular optimization choices, most of the rationale for those is explained in the full ABI references that the original poster's wikipedia reference points to.

Win64 和 AMD64 UN*X 在堆栈空间的使用方式上也有显着不同;例如,在 Win64 上,即使 args 0...3 在寄存器中传递,调用者也必须为函数参数分配堆栈空间。另一方面,在 UN*X 上,如果叶函数(即不调用其他函数的函数)需要的堆栈空间不超过 128 字节(是的,您拥有并可以使用它)甚至根本不需要分配堆栈空间一定数量的堆栈而不分配它......好吧,除非你是内核代码,一个漂亮的错误的来源)。所有这些都是特定的优化选择,其中大部分理由在原始发布者的维基百科参考指向的完整 ABI 参考中进行了解释。

回答by Peter Cordes

IDK why Windows did what they did. See the end of this answer for a guess. I was curious about how the SysV calling convention was decided on, so I dug into the mailing list archiveand found some neat stuff.

IDK 为什么 Windows 做了他们所做的。请参阅此答案的结尾进行猜测。我很好奇 SysV 调用约定是如何决定的,所以我在邮件列表存档中找到了一些不错的东西。

It's interesting reading some of those old threads on the AMD64 mailing list, since AMD architects were active on it. e.g. Choosing register names was one of the hard parts: AMD considered renaming the original 8 registers r0-r7, or calling the new registers stuff like UAX.

阅读 AMD64 邮件列表上的一些旧线程很有趣,因为 AMD 架构师对此很活跃。例如,选择寄存器名称是困难的部分之一:AMD 考虑将原始 8 个寄存器重命名为 r0-r7,或者将新寄存器称为UAX.

Also, feedback from kernel devs identified things that made the original design of syscalland swapgsunusable. That's how AMD updated the instructionto get this sorted out before releasing any actual chips. It's also interesting that in late 2000, the assumption was that Intel probably wouldn't adopt AMD64.

另外,从内核开发者的反馈是由原始设计标识的事物syscallswapgs不可用。这就是 AMD在发布任何实际芯片之前更新指令以解决此问题的方式。有趣的是,在 2000 年末,人们假设英特尔可能不会采用 AMD64。



The SysV (Linux) calling convention, and the decision on how many registers should be callee-preserved vs. caller-save, was made initially in Nov 2000, by Jan Hubicka(a gcc developer). He compiled SPEC2000and looked at code size and number of instructions. That discussion thread bounces around some of the same ideas as answers and comments on this SO question. In a 2nd thread, he proposed the current sequence as optimal and hopefully final, generating smaller code than some alternatives.

SysV (Linux) 调用约定以及关于应该保留多少寄存器与调用者保存的寄存器的决定最初是由 Jan Hubicka(gcc 开发人员)于 2000 年 11 月制定的。他编译了 SPEC2000并查看了代码大小和指令数量。该讨论线程围绕与此 SO 问题的答案和评论相同的一些想法展开。在第二个线程中,他提出当前序列是最优的,希望是最终的,生成的代码比一些替代方案更小

He's using the term "global" to mean call-preserved registers, that have to be push/popped if used.

他使用术语“全局”来表示保留调用的寄存器,如果使用,必须推送/弹出。

The choice of rdi, rsi, rdxas the first three args was motivated by:

选择rdi, rsi,rdx作为前三个参数的动机是:

  • minor code-size saving in functions that call memsetor other C string function on their args (where gcc inlines a rep string operation?)
  • rbxis call-preserved because having two call-preserved regs accessible without REX prefixes (rbx and rbp) is a win. Presumably chosen because it's the only other reg that isn't implicitly used by any instruction. (rep string, shift count, and mul/div outputs/inputs touch everything else).
  • None of the registers with special purposes are call-preserved (see prev point), so a function that wants to use rep string instructions or a variable-count shift might have to move function args somewhere else, but doesn't have to save/restore the caller's value.
  • We are trying to avoid RCX early in the sequence, since it is register used commonly for special purposes, like EAX, so it has same purpose to be missing in the sequence. Also it can't be used for syscalls and we would like to make syscall sequence to match function call sequence as much as possible.

    (background: syscall/ sysretunavoidably destroy rcx(with rip) and r11(with RFLAGS), so the kernel can't see what was originally in rcxwhen syscallran.)

  • 在参数上调用memset或其他 C 字符串函数的函数中节省了少量代码大小(其中 gcc 内联表示字符串操作?)
  • rbx是调用保留的,因为有两个调用保留的 regs 无需 REX 前缀(rbx 和 rbp)即可访问是一种胜利。可能选择它是因为它是唯一没有被任何指令隐式使用的其他 reg。(代表字符串、移位计数和 mul/div 输出/输入涉及其他所有内容)。
  • 没有任何具有特殊用途的寄存器是调用保留的(请参阅上一点),因此想要使用 rep 字符串指令或可变计数移位的函数可能必须将函数 args 移动到其他地方,但不必保存/恢复调用者的价值。
  • 我们试图在序列的早期避免 RCX,因为它通常用于特殊目的的寄存器,如 EAX,因此在序列中丢失它具有相同的目的。它也不能用于系统调用,我们希望使系统调用序列尽可能匹配函数调用序列。

    (背景:syscall/sysret不可避免地破坏了rcx(with rip) 和r11(with RFLAGS),因此内核在运行rcx时无法看到最初的内容syscall。)

The kernel system-call ABI was chosen to match the function call ABI, except for r10instead of rcx, so a libc wrapper functions like mmap(2)can just mov %rcx, %r10/ mov $0x9, %eax/ syscall.

内核系统调用ABI选择相匹配的函数调用ABI,除了r10代替的rcx,所以包装的libc功能,如mmap(2)可以只mov %rcx, %r10/ mov $0x9, %eax/ syscall



Note that the SysV calling convention used by i386 Linux sucks compared to Window's 32bit __vectorcall. It passes everything on the stack, and only returns in edx:eaxfor int64, not for small structs. It's no surprise little effort was made to maintain compatibility with it. When there's no reason not to, they did things like keeping rbxcall-preserved, since they decided that having another in the original 8 (that don't need a REX prefix) was good.

请注意,与 Window 的 32 位 __vectorcall 相比,i386 Linux 使用的 SysV 调用约定很糟糕。 它传递堆栈上的所有内容,并且只返回edx:eaxint64,而不返回小 structs。毫不奇怪,几乎没有努力保持与它的兼容性。当没有理由不这样做时,他们会做一些事情,例如保持rbx呼叫保留,因为他们认为在原来的 8 中拥有另一个(不需要 REX 前缀)是好的。

Making the ABI optimal is muchmore important long-term than any other consideration. I think they did a pretty good job. I'm not totally sure about returning structs packed into registers, instead of different fields in different regs. I guess code that passes them around by value without actually operating on the fields wins this way, but the extra work of unpacking seems silly. They could have had more integer return registers, more than just rdx:rax, so returning a struct with 4 members could return them in rdi, rsi, rdx, rax or something.

使得ABI最佳是多少更重要的长期比任何其他考虑。我认为他们做得很好。我不完全确定返回打包到寄存器中的结构,而不是不同 reg 中的不同字段。我想通过值传递它们而不实际操作字段的代码会以这种方式获胜,但是解包的额外工作似乎很愚蠢。他们可以有更多的整数返回寄存器,而不仅仅是rdx:rax,所以返回一个有 4 个成员的结构可以在 rdi、rsi、rdx、rax 或其他东西中返回它们。

They considered passing integers in vector regs, because SSE2 can operate on integers. Fortunately they didn't do that. Integers are used as pointer offsets very often, and a round-trip to stack memory is pretty cheap. Also SSE2 instructions take more code bytes than integer instructions.

他们考虑在向量 regs 中传递整数,因为 SSE2 可以对整数进行操作。幸运的是,他们没有这样做。 整数经常用作指针偏移量,并且往返堆栈内存非常便宜。此外,SSE2 指令比整数指令占用更多的代码字节。



I suspect Windows ABI designers might have been aiming to minimize differences between 32 and 64bit for the benefit of people that have to port asm from one to the other, or that can use a couple #ifdefs in some ASM so the same source can more easily build a 32 or 64bit version of a function.

我怀疑 Windows ABI 设计者的目标可能是尽量减少 32 位和 64 位之间的差异,以便那些必须将 asm 从一个移植到另一个的人的利益,或者可以#ifdef在某些 ASM 中使用几个s 以便可以更轻松地构建相同的源32 位或 64 位版本的函数。

Minimizing changes in the toolchain seems unlikely. An x86-64 compiler needs a separate table of which register is used for what, and what the calling convention is. Having a small overlap with 32bit is unlikely to produce significant savings in toolchain code size / complexity.

最小化工具链中的更改似乎不太可能。x86-64 编译器需要一个单独的表,说明哪个寄存器用于什么,以及调用约定是什么。与 32 位的少量重叠不太可能显着节省工具链代码大小/复杂性。

回答by Michael Burr

Remember that Microsoft was initially "officially noncommittal toward the early AMD64 effort" (from "A History of Modern 64-bit Computing"by Matthew Kerner and Neil Padgett) because they were strong partners with Intel on the IA64 architecture. I think that this meant that even if they would have otherwise been open to working with GCC engineers on a ABI to use both on Unix and Windows, they wouldn't have done so as it would mean publicly supporting the AMD64 effort when they hadn't yet officially done so (and would have probably upset Intel).

请记住,Microsoft 最初“正式对早期 AMD64 的努力不置可否”(来自Matthew Kerner 和 Neil Padgett 的“现代 64 位计算历史”),因为他们是英特尔在 IA64 架构上的强大合作伙伴。我认为这意味着即使他们本来愿意与 GCC 工程师在 ABI 上合作以在 Unix 和 Windows 上使用,他们也不会这样做,因为这意味着在他们没有公开支持 AMD64 的努力时还没有正式这样做(并且可能会让英特尔感到不安)。

On top of that, back in those days Microsoft had absolutely no leanings toward being friendly with open source projects. Certainly not Linux or GCC.

最重要的是,在那些日子里,微软绝对没有倾向于对开源项目友好。当然不是 Linux 或 GCC。

So why would they have cooperated on an ABI? I'd guess that the ABIs are different simply because they were designed at more or less the same time and in isolation.

那么他们为什么要在 ABI 上进行合作呢?我猜想 ABI 之所以不同,仅仅是因为它们或多或少是在同一时间和孤立地设计的。

Another quote from "A History of Modern 64-bit Computing":

“现代 64 位计算的历史”中的另一句话:

In parallel with the Microsoft collaboration, AMD also engaged the open source community to prepare for the chip. AMD contracted with both Code Sorcery and SuSE for tool chain work (Red Hat was already engaged by Intel on the IA64 tool chain port). Russell explained that SuSE produced C and FORTRAN compilers, and Code Sorcery produced a Pascal compiler. Weber explained that the company also engaged with the Linux community to prepare a Linux port. This effort was very important: it acted as an incentive for Microsoft to continue to invest in the AMD64 Windows effort, and also ensured that Linux, which was becoming an important OS at the time, would be available once the chips were released.

Weber goes so far as to say that the Linux work was absolutely crucial to AMD64's success, because it enabled AMD to produce an end-to-end system without the help of any other companies if necessary. This possibility ensured that AMD had a worst-case survival strategy even if other partners backed out, which in turn kept the other partners engaged for fear of being left behind themselves.

在与微软合作的同时,AMD 还与开源社区一起为芯片做准备。AMD 与 Code Sorcery 和 SuSE 签订了工具链工作合同(Red Hat 已经在 IA64 工具链端口上与英特尔合作)。Russell 解释说,SuSE 产生了 C 和 FORTRAN 编译器,而 Code Sorcery 产生了 Pascal 编译器。Weber 解释说,该公司还与 Linux 社区合作准备 Linux 端口。这一努力非常重要:它激励着微软继续投资于 AMD64 Windows 的努力,也确保了当时正在成为重要操作系统的 Linux 在芯片发布后可以使用。

Weber 甚至说 Linux 的工作对于 AMD64 的成功绝对至关重要,因为它使 AMD 能够在必要时无需任何其他公司的帮助即可生产端到端系统。这种可能性确保了即使其他合作伙伴退出,AMD 也有最坏情况的生存策略,这反过来又让其他合作伙伴因害怕被抛在后面而保持参与。

This indicates that even AMD didn't feel that cooperation was necessarily the most important thing between MS and Unix, but that having Unix/Linux support was very important. Maybe even trying to convince one or both sides to compromise or cooperate wasn't worth the effort or risk(?) of irritating either of them? Perhaps AMD thought that even suggesting a common ABI might delay or derail the more important objective of simply having software support ready when the chip was ready.

这说明即使是 AMD 也并不觉得 MS 和 Unix 之间的合作一定是最重要的,但是有 Unix/Linux 支持是非常重要的。也许甚至试图说服一方或双方妥协或合作都不值得付出努力或冒险(?)激怒他们中的任何一方?也许 AMD 认为,即使建议使用通用 ABI 也可能会延迟或破坏更重要的目标,即在芯片准备就绪时准备好软件支持。

Speculation on my part, but I think the major reason the ABIs are different was the political reason that MS and the Unix/Linux sides just didn't work together on it, and AMD didn't see that as a problem.

我的猜测是,但我认为 ABI 不同的主要原因是 MS 和 Unix/Linux 方面没有合作解决的原因,而 AMD 并不认为这是一个问题。

回答by cHao

Win32 has its own uses for ESI and EDI, and requires that they not be modified (or at least that they be restored before calling into the API). I'd imagine 64-bit code does the same with RSI and RDI, which would explain why they're not used to pass function arguments around.

Win32 对 ESI 和 EDI 有自己的用途,并且要求它们不被修改(或者至少在调用 API 之前将它们恢复)。我想 64 位代码对 RSI 和 RDI 执行相同的操作,这将解释为什么它们不用于传递函数参数。

I couldn't tell you why RCX and RDX are switched, though.

不过,我无法告诉您为什么要切换 RCX 和 RDX。