bash 为什么 $'\0' 或 $'\x0' 是空字符串?应该是空字符,不是吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19227564/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:12:49  来源:igfitidea点击:

Why $'\0' or $'\x0' is an empty string? Should be the null-character, isn't it?

stringbashechoexpansiondollar-sign

提问by olibre

bashallows $'string'expansion. My man bashsays:

bash允许扩展。我的说:$'string'man bash

Words of the form $'string'are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:
\aalert (bell)
\bbackspace
\e
\Ean escape character
\fform feed
\nnew line
\rcarriage return
\thorizontal tab
\vvertical tab
\backslash
\'single quote
\"double quote
\nnnthe eight-bit character whose value is the octal value nnn(one to three digits)
\xHHthe eight-bit character whose value is the hexadecimal value HH(one or two hex digits)
\cxa control-xcharacter

The expanded result is single-quoted, as if the dollar sign had not been present.

表格中的词被特殊处理。该词扩展为,并按照 ANSI C 标准的规定替换反斜杠转义字符。反斜杠转义序列(如果存在)解码如下: alert (bell) backspace an escape character form feed 换 行 回车 水平制表符 垂直制表符 反斜杠 单引号 双引号 八位字符,其值为八进制值(一到三数字) 其值为十六进制值的八位字符$'string'string
\a
\b
\e
\E
\f
\n
\r
\t
\v
\
\'
\"
\nnnnnn
\xHHHH(一个或两个十六进制数字) 一个控制字符
\cxx

扩展结果是单引号的,就好像美元符号不存在一样。

But why does bashnot convert $'\0'and $'\x0'into a null character?
Is it documented? Is there a reason? (Is it a feature or a limitation or even a bug?)

为什么庆典不能转化$'\0'$'\x0'成空字符?
有记录吗?有原因吗?(它是功能还是限制甚至是错误?)

$ hexdump -c <<< _$'
> hexdump -c < <( echo -e '_\x0\x1\x2\x3_' )
0000000   _  
$ bash --version | head -n 1
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
001 002 003 _ \n 0000007
'$'\x1\x2\x3\x4_' 0000000 _ 001 002 003 004 _ \n 0000007

echogives the expected result:

echo给出了预期的结果:

$ # Prefer printf to echo -n
$ printf $'foo
$ echo $'hey##代码##you'
hey
bar' | wc -c 3 $ printf 'foo##代码##bar' | wc -c 7 $ # Bash extension which is better for strings which might contain % $ printf %b 'foo##代码##bar' | wc -c 7

My bash version

我的 bash 版本

##代码##

Why echo $'foo\0bar'does not behave as echo -e 'foo\0bar'?

为什么echo $'foo\0bar'不像echo -e 'foo\0bar'?

回答by rici

It's a limitation. bashdoes not allow string values to contain interior NUL bytes.

这是一个限制。bash不允许字符串值包含内部 NUL 字节。

Posix (and C) character strings cannot contain interior NULs. See, for example, the Posix definitionof character string (emphasis added):

Posix(和 C)字符串不能包含内部 NUL。参见,例如,字符串的Posix 定义(强调):

3.92 Character String

A contiguous sequence of characters terminated byand including the first null byte.

3.92 字符串

第一个空字节终止并包括第一个空字节的连续字符序列。

Similarly, standard C is reasonably explicit about the NUL character in character strings:

类似地,标准 C 对字符串中的 NUL 字符相当明确:

§5.2.1p2 …A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminatea character string.

§5.2.1p2 ...一个所有位都设置为0的字节,称为空字符,应存在于基本执行字符集中;它用于终止一个字符串。

Posix explicitly forbids the use of NUL (and /) in filenames (XBD 3.170) or in environment variables (XBD 8.1 "... are considered to end with a null byte."

Posix 明确禁止/在文件名 (XBD 3.170) 或环境变量 (XBD 8.1) 中使用 NUL(和),“...被认为以空字节结尾。”

In this context, shell command languages, including bash, tend to use the same definition of a character string, as a sequence of non-NUL characters terminated by a single NUL.

在这种情况下,包括 bash 在内的 shell 命令语言倾向于使用与字符串相同的定义,作为由单个 NUL 终止的非 NUL 字符序列。

You can pass NULs freely through bash pipes, of course, and nothing stops you from assigning a shell variable to the output of a program which outputs a NUL byte. However, the consequences are "unspecified" according to Posix (XSH 2.6.3 "If the output contains any null bytes, the behavior is unspecified."). In bash, the NULs are removed, unless you insert a NUL into a string using bash's C-escape syntax ($'\0'), in which case the NUL will end up terminating the value.

当然,您可以通过 bash 管道自由传递 NUL,并且没有什么能阻止您将 shell 变量分配给输出 NUL 字节的程序的输出。但是,根据 Posix(XSH 2.6.3“如果输出包含任何空字节,则行为未指定。”),结果是“未指定的”。在 bash 中,NUL 被删除,除非您使用 bash 的 C-escape 语法 ( $'\0')将 NUL 插入到字符串中,在这种情况下,NUL 将终止该值。

On a practical note, consider the difference between the two following ways of attempting to insert a NUL into the stdinof a utility:

实际上,请考虑以下两种尝试将 NUL 插入stdin实用程序的方法之间的区别:

##代码##

回答by devnull

But why does bash not convert $'\0'and $'\x0'into a null character?

但是,为什么不bash的转换$'\0'$'\x0'成空字符?

Because a null character terminates a string.

因为空字符会终止字符串。

##代码##

回答by cdarke

It is a null character, but it depends on what you mean by that.

它是一个空字符,但这取决于你的意思。

The null character represents an empty string, which is what you get when you expand it. It is a special case and I think that is implied by the documentation but not actually stated.

空字符代表一个空字符串,这是你展开它时得到的。这是一种特殊情况,我认为文档中暗示了这一点,但实际上并未说明。

In C binary zero '\0'terminates a string and on its own also represents an empty string. Bash is written in C, so it probably follows from that.

在 C 中,二进制零'\0'终止一个字符串,它本身也代表一个空字符串。Bash 是用 C 编写的,所以它可能是从这里得出的。

Edit: POSIX mentions a null string in a number of places. In the "Base definitions" it defines a null string as:

编辑:POSIX 在许多地方提到了一个空字符串。在“基本定义”中,它定义了一个空字符串:

3.146 Empty String (or Null String)
A string whose first byte is a null byte.

3.146 空字符串(或空字符串)
第一个字节是空字节的字符串。