C++ 字符数组空终止符位置

Question

提问by John Mahoney

I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:

我是一名学习 C++ 的学生，我试图了解以空字符结尾的字符数组是如何工作的。假设我像这样定义了一个字符数组：

char* str1 = "hello world";

As expected, strlen(str1)is equal to 11, and it is null-terminated.

正如预期的那样，strlen(str1)等于 11，并且它是空终止的。

Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.comseems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.

如果上述 char 数组的 11 个元素都填充了字符“hello world”，那么 C++ 将空终止符放在哪里？它实际上分配了一个长度为 12 而不是 11 的数组，第 12 个字符是'\0'？CPlusPlus.com似乎建议 11 个中的一个需要是'\0'，除非它确实分配了 12 个。

Suppose I do the following:

假设我执行以下操作：

// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );

// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );

// Output the second one
cout << "Str2: " << str2 << endl;

This outputs Str2: hello worldatcomY╗°g??, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2until it encounters what it interprets to be a null character.

这个输出Str2: hello worldatcomY╗°g??，我假设是 C++ 在指针指向的位置读取内存，char* str2直到它遇到它解释为空字符的内容。

However, if I then do this:

但是，如果我再这样做：

// Null-terminate the second one
str2[strlen(str1)] = '// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
';

// Output the second one again
cout << "Terminated Str2: " << str2 << endl;

It outputs Terminated Str2: hello worldas expected.

它Terminated Str2: hello world按预期输出。

But doesn't writing to str2[11]imply that we are writing outside of the allocated memory space of str2, since str2[11]is the 12th byte, but we only allocated 11 bytes?

但是写入是否str2[11]意味着我们在分配的内存空间之外写入str2，因为str2[11]是第 12 个字节，但我们只分配了 11 个字节？

Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 )instead of malloc( strlen(str1) )?

运行此代码似乎不会导致任何编译器警告或运行时错误。这在实践中安全吗？使用malloc( strlen(str1) + 1 )而不是更好malloc( strlen(str1) )吗？

Answer 1

采纳答案by JaredPar

In the case of a string literal the compiler is actually reserving an extra charelement for the \0element.

在字符串文字的情况下，编译器实际上char为该\0元素保留了一个额外的元素。

// Null-terminate the second one
str2[strlen(str1)] = 'size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);

// Output the second one
cout << "Str2: " << str2 << endl;
';

This is a common mistake new C programmers make. When allocating the storage for a char*you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal

这是新 C 程序员常犯的错误。为 a 分配存储空间时，char*您需要再分配字符数 + 1 来存储\0. 不在这里分配额外的存储意味着这一行也是非法的

std::string str1 = "hello world";
std::string str2 = str1;

Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the Xelement causes undefined behavior. It will often work but is a ticking time bomb.

在这里，您实际上是在超出分配的内存末尾写入内容。分配 X 元素时，您可以访问的最后一个合法字节是内存地址偏移量X - 1。写入X元素会导致未定义的行为。它通常会起作用，但它是一个定时炸弹。

The proper way to write this is as follows

正确的写法如下

{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', 'char* str = "Hellochar* str1 = "Hellostr1[0] == 'h';  
str1[10] == 'd';  
str1[11] == '##代码##';  
 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
 world";
' }

In this example the str2[size - 1] = '\0'isn't actually needed. The strncpyfunction will fill all extra spaces with the null terminator. Here there are only size - 1elements in str1so the final element in the array is unneeded and will be filled with \0

在这个例子中，str2[size - 1] = '\0'实际上并不需要。该strncpy函数将用空终止符填充所有额外的空格。这里只有size - 1元素，str1所以数组中的最后一个元素是不需要的，将填充\0

Answer 2

回答by Oliver Charlesworth

Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?

它实际上分配了一个长度为 12 而不是 11 的数组，第 12 个字符是 '\0' 吗？

Yes.

是的。

But doesn't writing to str2[11]imply that we are writing outside of the allocated memory space of str2, since str2[11]is the 12th byte, but we only allocated 11 bytes?

但是写入是否str2[11]意味着我们在分配的内存空间之外写入str2，因为str2[11]是第 12 个字节，但我们只分配了 11 个字节？

Yes.

是的。

Would it be better to use malloc( strlen(str1) + 1 )instead of malloc( strlen(str1) )?

使用malloc( strlen(str1) + 1 )而不是更好malloc( strlen(str1) )吗？

Yes, because the second form is not long enough to copy the string into.

是的，因为第二种形式不够长，无法将字符串复制到其中。

Running this code does not seem to cause any compiler warnings or run-time errors.

运行此代码似乎不会导致任何编译器警告或运行时错误。

Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.

除了最简单的情况外，在所有情况下都检测到这一点是一个非常困难的问题。所以编译器作者根本不打扰。

This sort of complexity is exactly why you should be using std::stringrather than raw C-style strings if you are writing C++. It's as simple as this:

std::string如果您正在编写 C++，这种复杂性正是您应该使用而不是原始 C 样式字符串的原因。就这么简单：

##代码##

Answer 3

回答by AusCBloke

The literal "hello world"is a chararray that looks like:

文字"hello world"是一个char数组，看起来像：

##代码##

So, yes, the literal is 12 chars in size.

所以，是的，文字char的大小是 12秒。

Also, malloc( strlen(str1) )is allocating memory for 1 less byte than is needed, since strlenreturns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)]is writing 1 byte past the amount of memory that you've allocated.

此外，malloc( strlen(str1) )分配的内存比所需的少 1 个字节，因为strlen返回字符串的长度，不包括 NUL 终止符。写入str[strlen(str1)]是写入超过您分配的内存量 1 个字节。

Your compiler won't tell you that, but if you run your program through valgrindor a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.

您的编译器不会告诉您这一点，但是如果您通过valgrind或系统上可用的类似程序运行您的程序，它会告诉您是否正在访问不应该访问的内存。

Answer 4

回答by Dalmas

I think you are confused by the return value of strlen. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :

我认为您对strlen. 它返回字符串的长度，不应与保存字符串的数组的大小混淆。考虑这个例子：

##代码##

I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str)will return 5, because there are 5 characters before the first null character. strlenjust counts the characters until a null character is found.

我在字符串中间添加了一个空字符，这是完全有效的。这里数组的长度为 13（12 个字符 + 最后一个空字符），但strlen(str)将返回 5，因为在第一个空字符之前有 5 个字符。strlen只计算字符，直到找到空字符。

So if I use your code :

所以如果我使用你的代码：

##代码##

The str2 array will have a length of 5, and won't be terminated by a null character (because strlendoesn't count it). Is this what you expected?

str2 数组的长度为 5，并且不会以空字符结尾（因为strlen不计算它）。这是你的预期吗？

Answer 5

回答by Nitram

For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world"string has a string length of 11 but requires a backing array with 12 entries.

对于标准 C 字符串，存储字符串的数组长度总是比字符串长度（以字符为单位）长一个字符。因此，您的"hello world"字符串的字符串长度为 11，但需要一个包含 12 个条目的支持数组。

The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0'and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.

原因很简单，就是读取这些字符串的方式。处理这些字符串的函数基本上是逐个读取字符串的字符，直到找到终止字符'\0'并在此时停止。如果缺少此字符，这些函数只需继续读取内存，直到它们遇到受保护的内存区域，导致主机操作系统终止您的应用程序，或者直到它们找到终止字符。

Also if you initialize a character array with the length 11 and write the string "hello world"into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.

此外，如果您初始化一个长度为 11 的字符数组并将字符串"hello world"写入其中会产生大量问题。因为该数组预计至少可容纳 12 个字符。这意味着内存中数组后面的字节将被覆盖。导致不可预知的副作用。

Also while you are working with C++, you might want to look into std:string. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.

此外，当您使用 C++ 时，您可能想要查看std:string. 如果您使用 C++ 并提供更好的字符串处理，则可以访问此类。可能值得研究一下。

Answer 6

回答by George B

I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:

我认为您需要知道的是，char 数组从 0 开始，直到数组长度为 1，并且在位置数组长度上有终止符 ('\0')。
在你的情况下：

##代码##

This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').

这就是为什么 str2[strlen(str1)] = '\0';
strncpy 之后的输出问题是因为它复制了 11 个元素（0..10），因此您需要手动放置终止符（str2[11] = '\0'）。

C++ 字符数组空终止符位置

提问by John Mahoney

采纳答案by JaredPar

回答by Oliver Charlesworth

回答by AusCBloke

回答by Dalmas

回答by Nitram

回答by George B

相关推荐

最近更新

标签

C++ 字符数组空终止符位置

提问by John Mahoney

采纳答案by JaredPar

回答by Oliver Charlesworth

回答by AusCBloke

回答by Dalmas

回答by Nitram

回答by George B

相关推荐

C++ 与 win32 CRITICAL_SECTION 相比的 std::mutex 性能

C++ 双精度 - 小数位

在 C++ 中使用“超级”

C++ 在两个迭代器之间获取`std::string`的子字符串

相关推荐

最近更新

标签