bash Shell 脚本读取缺少最后一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12916352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 22:50:38  来源:igfitidea点击:

Shell script read missing last line

bashprocess

提问by RHSeeger

I have an ... odd issue with a bash shell script that I was hoping to get some insight on.

我有一个 bash shell 脚本的奇怪问题,我希望能对此有所了解。

My team is working on a script that iterates through lines in a file and checks for content in each one. We had a bug where, when run via the automated process that sequences different scripts together, the last line wasn't being seen.

我的团队正在编写一个脚本,该脚本遍历文件中的行并检查每个行中的内容。我们有一个错误,当通过将不同脚本排序在一起的自动化过程运行时,看不到最后一行。

The code used to iterate over the lines in the file (name stored in DATAFILEwas

名称存储在文件中使用遍历行(代码DATAFILE

cat "$DATAFILE" | while read line 

We could run the script from the command line and it would see every line in the file, including the last one, just fine. However, when run by the automated process (which runs the script that generates the DATAFILE just prior to the script in question), the last line is never seen.

我们可以从命令行运行脚本,它会看到文件中的每一行,包括最后一行,就好了。但是,当由自动化进程运行时(它运行在所讨论的脚本之前生成 DATAFILE 的脚本),永远不会看到最后一行。

We updated the code to use the following to iterate over the lines, and the problem cleared up:

我们更新了代码以使用以下内容来迭代这些行,问题就解决了:

for line in `cat "$DATAFILE"` 

Note: DATAFILE has no newline ever written at the end of the file.

注意:DATAFILE 在文件末尾没有写过换行符。

My question is two part... Why would the last line not be seen by the original code, and why this would change make a difference?

我的问题是两部分......为什么原始代码看不到最后一行,为什么这会有所不同?

I only thought I could come up with as to why the last line would not be seen was:

我只是想我能想出为什么最后一行不会被看到是:

  • The previous process, which writes the file, was relying on the process to end to close the file descriptor.
  • The problem script was starting up and opening the file prior fast enough that, while the previous process had "ended", it hadn't "shut down/cleaned up" enough for the system to close the file descriptor automatically for it.
  • 之前写入文件的进程依赖于进程结束以关闭文件描述符。
  • 问题脚本启动并打开文件的速度足够快,虽然前一个进程已经“结束”,但它没有“关闭/清理”足够系统自动关闭文件描述符。

That being said, it seems like, if you have 2 commands in a shell script, the first one should be completely shut down by the time the script runs the second one.

话虽如此,似乎如果您在 shell 脚本中有 2 个命令,那么在脚本运行第二个命令时,第一个命令应该完全关闭。

Any insight into the questions, especially the first one, would be very much appreciated.

对问题的任何见解,尤其是第一个,将不胜感激。

回答by Jonathan Leffler

The C standard says that text files must end with a newline or the data after the last newline may not be read properly.

C 标准规定文本文件必须以换行符结尾,否则可能无法正确读取最后一个换行符之后的数据。

ISO/IEC 9899:2011 §7.21.2 Streams

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to- one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

ISO/IEC 9899:2011 §7.21.2 流

文本流是组成行的有序字符序列,每行由零个或多个字符加上终止的换行符组成。最后一行是否需要终止换行符是实现定义的。可能必须在输入和输出中添加、更改或删除字符,以符合在宿主环境中表示文本的不同约定。因此,流中的字符与外部表示中的字符之间不需要一一对应。仅在以下情况下,从文本流中读取的数据将必然与之前写入该流的数据相等: 数据仅由打印字符和控制字符水平制表符和换行符组成;没有新行字符紧跟在空格字符之前;最后一个字符是换行符。读入时是否出现在换行符之前立即写出的空格字符是实现定义的。

I would not have unexpected a missing newline at the end of file to cause trouble in bash(or any Unix shell), but that does seem to be the problem reproducibly ($is the prompt in this output):

我不会在文件末尾意外丢失换行符导致bash(或任何 Unix shell)出现问题,但这似乎是可重现的问题($是此输出中的提示):

$ echo xxx\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done      # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done   # UUOC Award pending
abc
def
ghi
xxx
$

It is also not limited to bash— Korn shell (ksh) and zshbehave like that too. I live, I learn; thanks for raising the issue.

它也不限于bash- Korn shell ( ksh) 并且也有zsh类似的行为。我生活,我学习;感谢您提出这个问题。

As demonstrated in the code above, the catcommand reads the whole file. The for line in `cat $DATAFILE`technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).

如上面的代码所示,该cat命令读取整个文件。该for line in `cat $DATAFILE`技术收集所有输出并用单个空格替换任意空格序列(我得出结论,文件中的每一行都不包含空格)。

Tested on Mac OS X 10.7.5.

在 Mac OS X 10.7.5 上测试。



What does POSIX say?

POSIX 说什么?

The POSIX readcommand specification says:

POSIXread命令规范说:

The read utility shall read a single line from standard input.

By default, unless the -roption is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline>shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -roption is specified.

The terminating <newline> (if any)shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]

read 实用程序应从标准输入中读取一行。

默认情况下,除非-r指定了该选项,否则<backslash> 应充当转义字符。未转义的 <backslash> 应保留以下字符的字面值,<newline> 除外。如果 <newline> 跟在 <backslash> 之后,则 read 实用程序应将其解释为行继续。<反斜杠> 和<newline>应在将输入拆分为字段之前删除。将输入拆分为字段后,应删除所有其他未转义的 <backslash> 字符。

如果标准输入是终端设备并且调用 shell 是交互式的,则 read 在读取以 <backslash> <newline> 结尾的输入行时应提示输入续行,除非-r指定了该选项。

终止的 <newline> (如果有)应从输入中删除,结果应像在 shell 中一样拆分为字段以获取参数扩展的结果(请参阅字段拆分);[...]

Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:

请注意“(如果有)”(引号中加了重点)!在我看来,如果没有换行符,它仍然应该读取结果。另一方面,它还说:

STDIN

The standard input shall be a text file.

标准输入

标准输入应为文本文件。

and then you get back to the debate about whether a file that does not end with a newline is a text file or not.

然后你又回到关于不以换行符结尾的文件是否是文本文件的争论。

However, the rationale on the same page documents:

但是,同一页文档上的基本原理:

Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -roption is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.

虽然标准输入要求是一个文本文件,因此总是以 <newline> 结尾(除非它是一个空文件),但在-r不使用该选项时处理续行可能会导致输入不以<换行符>。如果输入文件的最后一行以 <backslash> <newline> 结尾,则会发生这种情况。正是因为这个原因,在描述中的“终止<newline>(如果有)应从输入中删除”中使用了“if any”。标准输入是文本文件的要求并没有放宽。

That rationale must mean that the text file is supposed to end with a newline.

这个理由必须意味着文本文件应该以换行符结尾。

The POSIX definition of a text file is:

文本文件的 POSIX 定义是:

3.395Text File

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

3.395文本文件

包含组织成零个或多个行的字符的文件。这些行不包含 NUL 字符,并且长度不能超过 {LINE_MAX} 个字节,包括 <newline> 字符。尽管 POSIX.1-2008 不区分文本文件和二进制文件(参见 ISO C 标准),但许多实用程序仅在对文本文件进行操作时产生可预测或有意义的输出。具有此类限制的标准实用程序总是在其 STDIN 或 INPUT FILES 部分中指定“文本文件”。

This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard.

这并没有直接规定“以 <newline> 结尾”,而是遵循 C 标准。



A solution to the 'no terminal newline' problem

“无终端换行”问题的解决方案

Note Gordon Davisson's answer. A simple test shows that his observation is accurate:

请注意戈登戴维森回答。一个简单的测试表明他的观察是准确的:

$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$

Therefore, his technique of:

因此,他的技术:

while read line || [ -n "$line" ]; do echo $line; done < y

or:

或者:

cat y | while read line || [ -n "$line" ]; do echo $line; done

will work for files without a newline at the end (at least on my machine).

将适用于最后没有换行符的文件(至少在我的机器上)。



I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.

我仍然惊讶地发现 shell 删除了输入的最后一段(它不能称为一行,因为它没有以换行符结尾),但在 POSIX 中可能有足够的理由这样做。显然,最好确保您的文本文件确实是以换行符结尾的文本文件。

回答by Gordon Davisson

According to the POSIX spec for the read command, it should return a nonzero status if "End-of-file was detected or an error occurred." Since EOF is detected as it reads the last "line", it sets $lineand then returns an error status, and the error status prevents the loop from executing on that last "line". The solution is easy: make the loop execute if the read command succeeds OR if anything was read into $line.

根据读取命令POSIX 规范,如果“检测到文件结尾或发生错误”,它应该返回一个非零状态。由于在读取最后“行”时检测到 EOF,因此它会设置$line并返回错误状态,错误状态会阻止循环在最后“行”上执行。解决方案很简单:如果读取命令成功,或者如果有任何内容被读入$line.

while read line || [ -n "$line" ]; do

回答by Jahid

Adding some additional info:

添加一些额外的信息:

  1. There's no need to use catwith while loop. while ...;do something;done<fileis enough.
  2. Don't read lines with for.
  1. 没有必要cat与 while 循环一起使用。while ...;do something;done<file足够。
  2. 不要阅读带有for.

When using while loop to read lines:

使用 while 循环读取行时:

  1. Set the IFSproperly (you may lose indentation otherwise).
  2. You should almost always use the -r option with read.
  1. IFS正确设置(否则可能会丢失缩进)。
  2. 您几乎应该总是将 -r 选项与 read 一起使用。

with meeting the above requirements a proper while loop will look like this:

满足上述要求后,适当的 while 循环将如下所示:

while IFS= read -r line; do
  ...
done <file

And to make it work with files without a newline at end (reposting my solution from here):

并使其在没有换行符的文件中工作(从这里重新发布我的解决方案):

while IFS= read -r line || [ -n "$line" ]; do
  echo "$line"
done <file

Or using grepwith while loop:

grep与 while 循环一起使用:

while IFS= read -r line; do
  echo "$line"
done < <(grep "" file)

回答by Joel Bruner

Use sed to match the last line of a file, which it will then append a newline if one does not exist and have it do an inline replacement of the file:

使用 sed 匹配文件的最后一行,如果不存在,它将附加一个换行符,并让它对文件进行内联替换:

sed -i '' -e '$a\' file

sed -i '' -e '$a\' file

The code is from this stackexchange link

代码来自这个 stackexchange链接

Note: I have added empty single quotes to -i ''because, at least in OS X, -iwas using -eas a file extension for the backup file. I would have gladly commented on the original post but lacked 50 points. Perhaps this will gain me a few in this thread, thanks.

注意:我添加了空单引号,-i ''因为至少在 OS X 中,它-i-e用作备份文件的文件扩展名。我很乐意对原始帖子发表评论,但缺少 50 分。也许这会让我在这个线程中获得一些,谢谢。

回答by Gulesbaron

I had a similar issue. I was doing a cat of a file, piping it to a sort and then piping the result to a 'while read var1 var2 var3'. ie: cat $FILE|sort -k3|while read Count IP Name doThe work under the "do" was an if statement that identified changing data in the $Name field and based on change or no change did sums of $Count or printed the summed line to the report. I also ran into the issue where I couldnt get the last line to print to the report. I went with the simple expedient of redirecting the cat/sort to a new file, echoing a newline to that new file and THEN ran my "while read Count IP Name" on the new file with successful results. ie: cat $FILE|sort -k3 > NEWFILE echo "\n" >> NEWFILE cat NEWFILE |while read Count IP Name doSometimes the simple, inelegant is the best way to go.

我有一个类似的问题。我正在做一个文件的猫,将它输送到一个排序,然后将结果输送到一个'while read var1 var2 var3'。即: cat $FILE|sort -k3|while read Count IP Name do“do”下的工作是一个 if 语句,它识别 $Name 字段中的变化数据,并根据变化或没有变化做了 $Count 的总和或打印报告的汇总行。我还遇到了无法打印到报告的最后一行的问题。我采用了简单的权宜之计,将 cat/sort 重定向到一个新文件,将换行符回显到该新文件,然后在新文件上运行我的“while read Count IP Name”,结果成功。即: cat $FILE|sort -k3 > NEWFILE echo "\n" >> NEWFILE cat NEWFILE |while read Count IP Name do有时,简单、不优雅是最好的方式。

回答by ArunGJ

As a workaround, before reading from the text file a newline can be appended to the file.

作为一种解决方法,在从文本文件读取之前,可以将换行符附加到文件中。

echo "\n" >> $file_path

This will ensure that all the lines that was previously in the file will be read.

这将确保之前在文件中的所有行都将被读取。

回答by doubleDown

I tested this in command line

我在命令行中测试了这个

# create dummy file. last line doesn't end with newline
printf "%i\n%i\nNo-newline-here" >testing

Test with your first form (piping to while-loop)

使用您的第一个表单进行测试(管道到 while 循环)

cat testing | while read line; do echo $line; done

This misses the last line, which makes sense since readonly gets input that ends with a newline.

这错过了最后一行,这是有道理的,因为read只获取以换行符结尾的输入。



Test with your second form (command substitution)

使用您的第二种形式进行测试(命令替换)

for line in `cat testbed1` ; do echo $line; done

This gets the last line as well

这也得到最后一行



readread仅在以换行符终止时才获取输入,这就是您错过最后一行的原因。

On the other hand, in the second form

另一方面,在第二种形式中

`cat testing` 

expands to the form of

扩展为

line1\nline2\n...lineM 

which is separated by the shell into multiple fields using IFS, so you get

外壳使用 IFS 将其分隔为多个字段,因此您可以得到

line1 line2 line3 ... lineM 

That's why you still get the last line.

这就是为什么你仍然得到最后一行。

p/s: What I don't understand is how you get the first form working...

p/s:我不明白你是如何让第一个表格工作的......