C++ 什么是空终止字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2037209/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 21:54:43  来源:igfitidea点击:

What is a null-terminated string?

c++null-terminated

提问by lhj7362

How does it differ from std::string?

它与std::string有何不同?

回答by Ricket

A "string" is really just an array of chars; a null-terminated string is one where a null character '\0'marks the end of the string (not necessarily the end of the array). All strings in code (delimited by double quotes "") are automatically null-terminated by the compiler.

“字符串”实际上只是一个chars数组;以空字符结尾的字符串是一个空字符'\0'标记字符串结尾(不一定是数组结尾)的字符串。代码中的所有字符串(由双引号分隔"")都由编译器自动以空字符结尾。

So for example, "hi"is the same as {'h', 'i', '\0'}.

例如,"hi"与 相同{'h', 'i', '\0'}

回答by Ricket

A null-terminated string is a contiguous sequence of characters, the last one of which has the binary bit pattern all zeros. I'm not sure what you mean by a "usual string", but if you mean std::string, then a std::stringis not required (until C++11) to be contiguous, and is not required to have a terminator. Also, a std::string's string data is always allocated and managed by the std::stringobject that contains it; for a null-terminated string, there is no such container, and you typically refer to and manage such strings using bare pointers.

以空字符结尾的字符串是一个连续的字符序列,其中最后一个的二进制位模式全为零。我不确定“通常的字符串”是什么意思,但如果你的意思是std::string,那么 astd::string不需要(直到 C++11)是连续的,并且不需要有终止符。此外, astd::string的字符串数据始终由std::string包含它的对象分配和管理;对于以空字符结尾的字符串,没有这样的容器,您通常使用裸指针来引用和管理此类字符串。

All of this should really be covered in any decent C++ text book - I recommend getting hold of Accelerated C++, one of the best of them.

所有这些都应该在任何像样的 C++ 教科书中都有涉及 - 我建议掌握Accelerated C++,这是其中最好的之一。

回答by Steve Jessop

There are two main ways to represent a string:

字符串的表示主要有两种方式:

1) A sequence of characters with an ASCII null (nul) character, 0, at the end. You can tell how long it is by searching for the terminator. This is called a null-terminated string, or sometimes nul-terminated.

1) 以 ASCII 空 (nul) 字符 0 结尾的字符序列。您可以通过搜索终止符来判断它有多长。这称为以空字符结尾的字符串,或者有时以空字符结尾。

2) A sequence of characters, plus a separate field (either an integer length, or a pointer to the end of the string), to tell you how long it is.

2)一个字符序列,加上一个单独的字段(一个整数长度,或者一个指向字符串末尾的指针),告诉你它有多长。

I'm not sure about "usual string", but what quite often happens is that when talking about a particular language, the word "string" is used to mean the standard representation for that language. So in Java, java.lang.String is a type 2 string, so that's what "string" means. In C, "string" probably means a type 1 string. The standard is quite verbose in order to be precise, but people always want to leave out what's "obvious".

我不确定“通常的字符串”,但经常发生的是,在谈论特定语言时,“字符串”一词用于表示该语言的标准表示。所以在 Java 中,java.lang.String 是一个类型 2 的字符串,所以这就是“字符串”的意思。在 C 中,“字符串”可能表示类型 1 的字符串。为了准确起见,该标准相当冗长,但人们总是想省略“明显”的内容。

In C++, unfortunately, both types are standard. std::string is a type 2 string[*], but standard library functions inherited from C operate on type 1 strings.

不幸的是,在 C++ 中,这两种类型都是标准的。std::string 是类型 2 的 string[*],但从 C 继承的标准库函数对类型 1 的字符串进行操作。

[*] Actually, std::string is often implemented as an array of characters, with a separate length field anda nul terminator. That's so that the c_str()function can be implemented without ever needing to copy or re-allocate the string data. I can't remember off-hand whether it's legal to implement std::string without storing a length field: the question is what complexity guarantees are required by the standard. For containers in general size()is recommended to be O(1), but isn't actually required to be. So even if it is legal, an implementation of std::string that just uses nul-terminators would be surprising.

[*] 实际上,std::string 通常被实现为一个字符数组,带有一个单独的长度字段一个 nul 终止符。这样c_str()就可以在不需要复制或重新分配字符串数据的情况下实现该功能。我不记得在不存储长度字段的情况下实现 std::string 是否合法:问题是标准需要什么样的复杂性保证。对于容器,一般size()建议为 O(1),但实际上并不需要。因此,即使它是合法的,仅使用空终止符的 std::string 实现也会令人惊讶。

回答by 4pie0

'
// _Rep: string representation
      //   Invariants:
      //   1. String really contains _M_length + 1 characters: due to 21.3.4
      //      must be kept null-terminated.
      //   2. _M_capacity >= _M_length
      //      Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
      //   3. _M_refcount has three states:
      //      -1: leaked, one reference, no ref-copies allowed, non-const.
      //       0: one reference, non-const.
      //     n>0: n + 1 references, operations require a lock, const.
      //   4. All fields==0 is an empty string, given the extra storage
      //      beyond-the-end for a null terminator; thus, the shared
      //      empty string representation needs no constructor.

      struct _Rep_base
      {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
      };

struct _Rep : _Rep_base
      {
    // Types:
    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;

    // (Public) Data members:

    // The maximum number of individual char_type elements of an
    // individual string is determined by _S_max_size. This is the
    // value that will be returned by max_size().  (Whereas npos
    // is the maximum number of bytes the allocator can allocate.)
    // If one was to divvy up the theoretical largest size string,
    // with a terminating character and m _CharT elements, it'd
    // look like this:
    // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
    // Solving for m:
    // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
    // In addition, this implementation quarters this amount.
    static const size_type  _S_max_size;
    static const _CharT _S_terminal;

    // The following storage is init'd to 0 by the linker, resulting
        // (carefully) in an empty string with one reference.
        static size_type _S_empty_rep_storage[];

        static _Rep&
        _S_empty_rep()
        { 
      // NB: Mild hack to avoid strict-aliasing warnings.  Note that
      // _S_empty_rep_storage is never modified and the punning should
      // be reasonably safe in this case.
      void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
      return *reinterpret_cast<_Rep*>(__p);
    }

        bool
    _M_is_leaked() const
        { return this->_M_refcount < 0; }

        bool
    _M_is_shared() const
        { return this->_M_refcount > 0; }

        void
    _M_set_leaked()
        { this->_M_refcount = -1; }

        void
    _M_set_sharable()
        { this->_M_refcount = 0; }

    void
    _M_set_length_and_sharable(size_type __n)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        {
          this->_M_set_sharable();  // One reference.
          this->_M_length = __n;
          traits_type::assign(this->_M_refdata()[__n], _S_terminal);
          // grrr. (per 21.3.4)
          // You cannot leave those LWG people alone for a second.
        }
    }

    _CharT*
    _M_refdata() throw()
    { return reinterpret_cast<_CharT*>(this + 1); }

    _CharT*
    _M_grab(const _Alloc& __alloc1, const _Alloc& __alloc2)
    {
      return (!_M_is_leaked() && __alloc1 == __alloc2)
              ? _M_refcopy() : _M_clone(__alloc1);
    }

    // Create & Destroy
    static _Rep*
    _S_create(size_type, size_type, const _Alloc&);

    void
    _M_dispose(const _Alloc& __a)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        if (__gnu_cxx::__exchange_and_add_dispatch(&this->_M_refcount,
                               -1) <= 0)
          _M_destroy(__a);
    }  // XXX MT

    void
    _M_destroy(const _Alloc&) throw();

    _CharT*
    _M_refcopy() throw()
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
            __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);
      return _M_refdata();
    }  // XXX MT

    _CharT*
    _M_clone(const _Alloc&, size_type __res = 0);
      };
'

is an ASCII character with code 0, null terminator, null character, NUL. In Clanguage it serves as a reserved character used to signify the end of a string. Many standard functions such as strcpy, strlen, strcmp among others rely on this. Otherwise, if there was no NUL, another way to signal end of string must have been used:

是一个带有代码 0、空终止符、空字符、NUL的 ASCII 字符。在C语言中,它用作保留字符,用于表示字符串的结尾。许多标准函数如 strcpy、strlen、strcmp 等都依赖于此。否则,如果没有NUL,则必须使用另一种表示字符串结束的方法:

This allows the string to be any length with only the overhead of one byte; the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte.

from wikipedia

这允许字符串是任意长度的,只有一个字节的开销;存储计数的替代方法需要 255 的字符串长度限制或超过一个字节的开销。

来自维基百科

C++std::stringfollows this other convention and its data is represented by a structure called _Rep:

C++std::string遵循另一个约定,它的数据由一个名为 的结构表示_Rep

_Rep* _M_rep() const
      { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }


the actual data might be obtained with:

实际数据可以通过以下方式获得:

##代码##

this code snippet comes from file basic_string.hwhich on my machine is located in usr/include/c++/4.4/bits/basic_string.h

此代码片段来自basic_string.h我机器上位于的文件usr/include/c++/4.4/bits/basic_string.h

So as you can see, the difference is significant.

正如你所看到的,差异是显着的。

回答by Dario

A null-terminated string means that the end of your string is defined through the occurrence of a null-char (all bits are zero).

以空字符结尾的字符串意味着字符串的结尾是通过出现空字符(所有位都为零)来定义的。

"Other strings" e.g. have to store their own lenght.

“其他字符串”例如必须存储它们自己的长度。

回答by Seva Alekseyev

A null-terminated string is a native string format in C. String literals, for example, are implemented as null-terminated. As a result, a whole lot of code (C run-time library to begin with) assumes that strings are null-terminated.

以空字符结尾的字符串是 C 语言中的原生字符串格式。例如,字符串文字被实现为以空字符结尾。因此,大量代码(首先是 C 运行时库)假定字符串是以空字符结尾的。

回答by phyrrus9

A null terminated string (c-string) is an array of char's, and the last element of the array being a 0x0 value. The std::string is essentially a vector, in that it is an auto-resizing container for values. It does not need a null terminator since it must keep track of size to know when a resize is needed.

以空字符结尾的字符串 (c-string) 是一个字符数组,数组的最后一个元素是一个 0x0 值。std::string 本质上是一个向量,因为它是值的自动调整大小的容器。它不需要空终止符,因为它必须跟踪大小以了解何时需要调整大小。

Honestly, I prefer c-strings over std ones, they just have more applications in the basic libraries, the ones with minimal code and allocations, and the harder to use because of that.

老实说,我更喜欢 c 字符串而不是标准字符串,它们只是在基本库中有更多的应用程序,代码和分配最少的那些,因此更难使用。