C语言 strcpy 如何在幕后工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14723381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:16:11  来源:igfitidea点击:

How strcpy works behind the scenes?

cpointers

提问by SandBag_1996

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code

对于某些人来说,这可能是一个非常基本的问题。我试图了解 strcpy 在幕后的实际工作原理。例如,在这段代码中

#include <stdio.h>
#include <string.h>
int main ()
{
  char s[6] = "Hello";
  char a[20] = "world isnsadsdas";
  strcpy(s,a);

  printf("%s\n",s);
  printf("%d\n", sizeof(s));
  return 0;
}

As I am declaring sto be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas.. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?

因为我声明s它是一个大小小于源的静态数组。我认为它不会打印整个单词,但它确实打印了world isnsadsdas.. 所以,我认为如果目标小于源,这个 strcpy 函数可能会分配新的大小。但是现在,当我检查 sizeof(s) 时,它仍然是 6,但打印出来的却不止于此。实际效果如何?

回答by Carl Norum

You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpyimplementation (but it's not too far off from many real ones):

你刚刚造成了未定义的行为,所以任何事情都可能发生。在你的情况下,你很幸运,它没有崩溃,但你不应该依赖这种情况的发生。这是一个简化的strcpy实现(但它与许多实际实现相差不远):

char *strcpy(char *d, const char *s)
{
   char *saved = d;
   while (*s)
   {
       *d++ = *s++;
   }
   *d = 0;
   return saved;
}

sizeofis just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.

sizeof只是从编译时返回数组的大小。如果你使用strlen,我想你会看到你所期望的。但正如我上面提到的,依赖未定义的行为是一个坏主意。

回答by Jessy Diamond Exum

http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png

http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png

strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like: http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png

由于您正在演示的原因,strcpy 被认为是危险的。您创建的两个缓冲区是存储在函数堆栈帧中的局部变量。堆栈框架大致如下所示:http: //upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png

FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.

仅供参考的东西放在堆栈的顶部意味着它通过内存向后增长(这并不意味着内存中的变量向后读取,只是新的变量被放在旧的“后面”)。因此,这意味着如果您在函数堆栈帧的局部变量部分写入足够多的内容,您将在复制到的变量之后向前写入所有其他堆栈变量并中断到其他部分,并最终覆盖返回指针。结果是,如果您很聪明,您可以完全控制函数返回的位置。你真的可以让它做任何事情,但它不是你关心的。

As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).

正如您通过为 5 个字符的字符串设置第一个缓冲区 6 个字符长似乎知道的那样,C 字符串以空字节 \x00 结尾。strcpy 函数复制字节直到源字节为 0,但它不会检查目标是否有那么长,这就是它可以复制数组边界的原因。这也是为什么您的打印正在读取超过其大小的缓冲区,它读取到 \x00。有趣的是,根据编译器在堆栈中给出的顺序,strcpy 可能已写入 s 的数据,因此一个有趣的练习可能是打印 a 并查看是否得到类似“snsadsdas”的内容,但我不能确定它会是什么样子,即使它正在污染 s,因为有时由于各种原因在堆栈条目之间存在字节)。

If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.

如果这个缓冲区包含一个密码,用于使用散列函数检查代码,然后您将其从任何获得它的地方复制到堆栈中的缓冲区(网络数据包,如果是服务器,或文本框等),您就很好可能会从源复制比目标缓冲区可以容纳的更多的数据,并将程序的返回控制权交给能够向您发送数据包或尝试密码的任何用户。他们只需要输入正确数量的字符,然后输入代表 ram 中某处地址的正确字符即可跳转。

You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/

如果您检查边界并可能修剪源字符串,则可以使用 strcpy ,但它被认为是不好的做法。有更多现代函数需要最大长度,例如http://www.cplusplus.com/reference/cstring/strncpy/

Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.

哦,最后,这都称为缓冲区溢出。一些编译器会在每个堆栈条目之前和之后添加一个由操作系统随机选择的漂亮的小字节块。每次复制后,操作系统都会根据其副本检查这些字节,如果它们不同,则终止程序。这解决了很多安全问题,但仍然可以将字节复制到堆栈中足够远的地方以覆盖指向函数的指针,以处理这些字节被更改时发生的情况,从而让您做​​同样的事情。做正确的事情变得更加困难。

回答by AndersK

In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.

在 C 中没有对数组的边界检查,这是为了获得更好的性能而冒着踩到脚的风险进行权衡。

strcpy()doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.

strcpy()不关心目标缓冲区是否足够大,因此复制太多字节会导致未定义的行为。

that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()

这是引入新版本 strcpy 的原因之一,您可以在其中指定目标缓冲区大小 strcpy_s()

回答by AndersK

Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"

请注意,sizeof(s) 是在运行时确定的。使用 strlen() 查找 s 占用的字符数。当您执行 strcpy() 时,源字符串将被目标字符串替换,因此您的输出不会是“Helloworld issadsdas”

#include <stdio.h>
#include <string.h>
main ()
{
  char s[6] = "Hello";
  char a[20] = "world isnsadsdas";
  strcpy(s,a);

  printf("%s\n",s);
  printf("%d\n", strlen(s));
}

回答by Ed Heal

You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.

您对未定义行为的依赖与编译器选择将两个数组放置在您的代码碰巧工作的地方一样多。这在将来可能行不通。

As to the sizeofoperator, this is figured out at compile time.

至于sizeof运算符,这是在编译时计算出来的。

Once you use adequate array sizes you need to use strlento fetch the length of the strings.

使用足够的数组大小后,您需要使用它strlen来获取字符串的长度。

回答by Pedro Rodrigues

The best way to understand how strcpy works behind the scene is...reading its source code! You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html. I hope it helps!

了解 strcpy 如何在幕后工作的最好方法是……阅读它的源代码!您可以阅读 GLibC 的源代码:http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html 。我希望它有帮助!

回答by vishal

Better Solution is

更好的解决方案是

char *strcpy(char *p,char const *q)
{
   char *saved=p;

   while(*p++=*q++);//enter code here

   return saved;
}

回答by intersomnium

At the end of every string/character array there is a null terminator character '\0'which marks the end of the string/character array.

在每个字符串/字符数组的末尾都有一个null terminator character '\0'标记字符串/字符数组的结尾。

strcpy()preforms its task until it sees the '\0' character.

strcpy()执行它的任务,直到它看到 '\0' 字符。

printf()also preforms its task until it sees the '\0' character.

printf()也执行它的任务,直到它看到 '\0' 字符。

sizeof()on the other hand is notinterested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).

sizeof()另一方面,对数组的内容感兴趣,只对它分配的大小(它应该有多大)感兴趣,因此没有考虑字符串/字符数组的实际结束位置(它实际上有多大)。

As opposed to sizeof(), there is strlen()that isinterested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).

至于反对的sizeof(),有strlen()感兴趣的,直到它到达终点(“\ 0”字符),它有多长字符串实际上是(这是应该不用多久是),因此计数的字符数停止(它不包括 '\0' 字符)