C++ 定义一个结尾没有空终止符(\0)的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3828307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 13:51:07  来源:igfitidea点击:

Defining a string with no null terminating char(\0) at the end

c++c

提问by Ravi Gupta

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?

C/C++ 中有哪些不同的方法来定义一个字符串,最后没有空终止符(\0)?

EDIT:I am interested in character arrays only and not in STL string.

编辑:我只对字符数组感兴趣,对 STL 字符串不感兴趣。

回答by kriss

Typically as another poster wrote:

通常就像另一张海报写道:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

或者如果您当前的 C 字符集是 ASCII,这通常是正确的(今天没有多少 EBCDIC)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

还有一种在很大程度上被忽略的方法只适用于 C(不是 C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

如果数组大小太小而无法容纳最后的 0(但足以容纳常量字符串的所有其他字符),则不会复制最后的零,但它仍然是有效的 C(但无效的 C++)。

Obviously you can also do it at run time:

显然,您也可以在运行时执行此操作:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

或(如上对 ASCII 字符集的相同评论)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

或者使用 memcopy(或 memmove 或 bcopy 但在这种情况下这样做没有好处)。

memcpy(c, "string", 6);

or strncpy

或strncpy

strncpy(c, "string", 6);

What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.

应该理解的是,在 C 中没有字符串这样的东西(在 C++ 中有字符串对象,但这完全是另一回事)。所谓的字符串只是字符数组。甚至char这个名字也有误导性,它不是char而只是一种数字类型。我们可能会称它为字节,但在过去,使用 9 位寄存器之类的硬件很奇怪,而字节意味着 8 位。

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.

由于 char 经常用于存储字符代码,C 设计人员想到了一种比在 char 中存储数字更简单的方法。您可以在简单的引号之间放置一个字母,编译器会理解它必须将此字符代码存储在 char 中。

What I mean is (for example) that you don't have to do

我的意思是(例如)你不必做

char c = '
char c = 0;
';

To store a code 0 in a char, just do:

要将代码 0 存储在字符中,只需执行以下操作:

char s[6] = {'s','t','r','i','n','g'};

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

由于我们经常不得不处理一堆可变长度的字符,因此 C 设计人员还为“字符串”选择了一个约定。只需将代码 0 放在文本应该结束的地方。顺便说一下,这种字符串表示形式有一个名称“零终止字符串”,如果您在变量名称的开头看到两个字母 sz,通常意味着它的内容是零终止字符串。

"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a nletter in their name like strncpy).

“C sz 字符串”根本不是一种类型,只是一个字符数组,就像一个 int 数组一样正常,但是字符串操作函数(strcmp、strcpy、strcat、printf 和许多其他函数)理解并使用0 结束约定。这也意味着如果你有一个不是零终止的字符数组,你不应该调用这些函数中的任何一个,因为它可能会做错事(或者你必须格外小心并使用名称中带有n 个字母的函数,例如strncpy)。

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

这种约定的最大问题是在很多情况下效率低下。一个典型的例子:你想在一个以 0 结尾的字符串的末尾放一些东西。如果您保持了可以在字符串末尾跳转的大小,按照 sz 约定,您必须一个字符一个字符地检查它。处理编码的 unicode 等时会出现其他类型的问题。但是在 C 被创建的时候,这个约定非常简单并且完美地完成了工作。

Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

如今,像“string”这样的双引号之间的字母不再像过去那样是普通的字符数组,而是const char *. 这意味着指针指向的是一个不应修改的常量(如果要修改它必须先复制它),这是一件好事,因为它有助于在编译时检测许多编程错误。

回答by Prasoon Saurav

C++ std::strings are not NUL terminated.

C++ std::strings 不是 NUL 终止的。

P.S : NULLis a macro1. NULis \0. Don't mix them up.

PS:NULL是一个宏1NUL\0。不要把它们混在一起。

1: C.2.2.3 Macro NULL

1:C.2.2.3 宏 NULL

The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>, <ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International Standard (18.1).

宏NULL,在任何的定义<clocale><cstddef><cstdio><cstdlib><cstring><ctime>,或<cwchar>,在本国际标准(18.1)的实施方案定义的C ++空指针常量。

回答by Seth

The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.

终止 null 用于终止字符串。没有它,你需要一些其他的方法来确定它的长度。

You can use a predefined length:

您可以使用预定义的长度:

unsigned char s[7] = {6, 's','t','r','i','n','g'};

You can emulate pascal-style strings:

您可以模拟 pascal 样式的字符串:

typedef struct {
    char[10] characters;
} ThisIsNotACString;

You can use std::string(in C++).(since you're not interested in std::string).

您可以使用std::string(在 C++ 中)。(因为您对 std::string 不感兴趣)。

Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).

最好使用一些预先存在的技术来处理 unicode,或者至少理解字符串编码(即wchar.h)。

And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.

还有一条评论:如果你把它放在一个打算在实际计算机上运行的程序中,你可能会考虑对你自己的“字符串”进行类型定义。如果您不小心尝试将它传递给需要 C 样式字符串的函数,这将鼓励您的编译器禁止使用。

#include <string>

回答by Chubsdad

Just for the sake of completeness and nail this down completely.

只是为了完整起见,并完全确定下来。

vector<char>

vector<char>

回答by codaddict

In C++ you can use the stringclass and not deal with the null char at all.

在 C++ 中,您可以使用字符串类,而根本不处理空字符。

回答by JoshD

Use std::string.

使用 std::string。

There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).

存储字符串的方法有很多,但使用库通常比创建自己的库要好。我相信我们都可以想出很多古怪的方法来处理没有空终止符的字符串:)。

回答by shuttle87

In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char. In C++ I'd definitely use the std::string class that can be accessed by

在 C 中,通常不会有更简单的解决方案。您可能会做 pascal 所做的事情并将字符串的长度放在第一个字符中,但这有点麻烦,并且会将您的字符串长度限制为可以放入第一个字符空间的整数的大小。在 C++ 中,我肯定会使用 std::string 可以访问的类

##代码##

Being a commonly used library this will almost certainly be more reliable than rolling your own string class.

作为一个常用的库,这几乎肯定比滚动你自己的字符串类更可靠。

回答by Alexander Rafferty

The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.

NULL 终止的原因是字符串的处理程序可以确定它的长度。如果不使用 NULL 终止,则需要通过单独的参数/变量或作为字符串的一部分传递字符串长度。否则,您可以使用另一个分隔符,只要它不在字符串本身中使用。

To be honest, I don't quite understand your question, or if it actually is a question.

说实话,我不太明白你的问题,或者它是否真的是一个问题。

回答by Bryan

Even the stringclass will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.

甚至string该类也会将其存储为空值。如果由于某种原因您绝对不想在内存中的字符串末尾出现空字符,则必须手动创建一个字符块,然后自己填写。

I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.

我个人无法想到您为什么要这样做的任何现实场景,因为空字符是字符串结束的信号。如果您也存储字符串的长度,那么我猜您已经以变量的大小(可能是 4 个字节)为代价节省了一个字节,并且可以更快地访问所述字符串的长度。