bash "(head; tail) < file" 是如何工作的？

Question

提问by zellyn

(via https://stackoverflow.com/a/8624829/23582)

（通过https://stackoverflow.com/a/8624829/23582）

How does (head; tail) < filework? Note that cat file | (head;tail)doesn't.

如何(head; tail) < file工作？请注意，cat file | (head;tail)没有。

Also, why does (head; wc -l) < filegive 0for the output of wc?

另外，为什么(head; wc -l) < file给0的输出wc？

Note: I understand how head and tail work. Just not the subtleties involved with these particular invocations.

注意：我了解头部和尾部的工作原理。只是不是这些特定调用所涉及的微妙之处。

Answer 1

采纳答案by rob mayoff

OS X

操作系统

For OS X, you can look at the source code for headand the source code for tailto figure out some of what's going on. In the case of tail, you'll want to look at forward.c.

对于 OS X，您可以查看的源代码head和的源代码tail以了解发生了什么。在的情况下tail，您需要查看forward.c.

So, it turns out that headdoesn't do anything special. It just reads its input using the stdiolibrary, so it reads a buffer at a time and might read too much. This means cat file | (head; tail)won't work for small files where head's buffering makes it read some (or all) of the last 10 lines.

所以，事实证明这head并没有什么特别的。它只是使用stdio库读取其输入，因此它一次读取一个缓冲区并且可能读取太多。这意味着cat file | (head; tail)不适用于小文件，其中head的缓冲使其读取最后 10 行中的部分（或全部）行。

On the other hand, tailchecks the type of its input file. If it's a regular file, tailseeks to the end and reads backwards until it finds enough lines to emit. This is why (head; tail) < fileworks on any regular file, regardless of size.

另一方面，tail检查其输入文件的类型。如果它是一个常规文件，则tail寻找到最后并向后读取，直到找到足够的行来发出。这就是为什么(head; tail) < file适用于任何常规文件，无论大小。

Linux

You could look at the source for headand tailon Linux too, but it's easier to just use strace, like this:

您也可以查看Linuxhead和tailLinux上的源代码，但使用起来更容易strace，如下所示：

(strace -o /tmp/head.trace head; strace -o /tmp/tail.trace tail) < file

Take a look at /tmp/head.trace. You'll see that the headcommand tries to fill a buffer (of 8192 bytes in my test) by reading from standard input (file descriptor 0). Depending on the size of file, it may or may not fill the buffer. Anyway, let's assume that it reads 10 lines in that first read. Then, it uses lseekto back up the file descriptorto the end of the 10th line, essentially “unreading” any extra bytes it read. This works because the file descriptor is open on a normal, seekable file. So (head; tail) < filewill work for any seekable file, but it won't make cat file | (head; tail)work.

看看/tmp/head.trace。您将看到该head命令尝试通过从标准输入（文件描述符 0）读取来填充缓冲区（在我的测试中为 8192 字节）。根据的大小file，它可能会或可能不会填充缓冲区。无论如何，让我们假设它在第一次读取时读取了 10 行。然后，它用于lseek将文件描述符备份到第 10 行的末尾，基本上“未读”它读取的任何额外字节。这是有效的，因为文件描述符是在正常的、可查找的文件上打开的。所以(head; tail) < file适用于任何可查找的文件，但它不会cat file | (head; tail)起作用。

On the other hand, taildoes not(in my testing) seek to the end and read backwards, like it does on OS X. At least, it doesn't read all the way back to the beginning of the file.

在另一方面，tail确实没有（在我的测试），寻求结束和阅读倒退，像它在OS X上至少，它不读取所有的方式回到文件的开头。

Here's my test. Create a small, 12-line input file:

这是我的测试。创建一个小的 12 行输入文件：

yes | head -12 | cat -n > /tmp/file

Then, try (head; tail) < /tmp/fileon Linux. I get this with GNU coreutils 5.97:

然后，(head; tail) < /tmp/file在 Linux 上尝试。我用 GNU coreutils 5.97 得到了这个：

But on OS X, I get this:

但是在 OS X 上，我得到了这个：

Answer 2

回答by Samus_

the parenthesis here create a subshellwhich is another instance of the interpreter to run the commands that are inside, what is interesting is that the subshell acts as a single stdin/stdout combo; in this case it'll first connect stdin to headwhich echoes the first 10 lines and closes the pipe then the subshell connects its stdin to tailwhich consumes the rest and writes back the last 10 lines to stdout, but the subshell takes both outputs and writes them as its ownstdout and that's why it appears combined.

这里的括号创建了一个subshell解释器的另一个实例来运行内部的命令，有趣的是子外壳充当单个标准输入/标准输出组合；在这种情况下，它将首先连接标准输入，head它与前 10 行相呼应并关闭管道，然后子外壳将其标准输入连接到标准输入，tail后者消耗其余部分并将最后 10 行写回标准输出，但子外壳获取两个输出并写入它们作为它自己的标准输出，这就是它出现组合的原因。

it's worth mentioning that the same effect could be achieved with command groupinglike { head; tail; } < filewhich is cheaper because it doesn't create another instance of bash.

值得一提的是，同样的效果可以通过命令分组来实现，比如{ head; tail; } < file它更便宜，因为它不会创建另一个 bash 实例。

Answer 3

回答by ashirley

All of these should work as expected if the file is sufficiently large. The head command will consume a certain amount of the input (not just what it needs as it buffers it's input) and if that doesn't leave enough input for the tail command, it won't work.

如果文件足够大，所有这些都应该按预期工作。head 命令将消耗一定量的输入（不仅仅是它缓冲输入时需要的输入），如果这没有为 tail 命令留下足够的输入，它将无法工作。

Another concern is that the pipe results in both sides executing in parallel and so the producing side might cause the consuming side's head command to read a different amount every time it is run.

另一个问题是管道导致双方并行执行，因此生产方可能会导致消费方的 head 命令在每次运行时读取不同的数量。

Compare multiple runs of the following command:

比较以下命令的多次运行：

for i in `seq 1 10`; do echo "foo"; done | (head -n1; wc -l)

The wc command should see a different amount of the file every time.

wc 命令每次应该看到不同数量的文件。

When using a <to provide input it doesn't seem like this parallelism exists (presumably bash reads the whole input then passes it to the head command).

当使用 a<提供输入时，似乎不存在这种并行性（大概 bash 读取整个输入然后将其传递给 head 命令）。

Answer 4

回答by prabu

head command display first 10(default) lines of file. And tail command display last 10(default) lines of file. Suppose if the file has only 3 lines also no problem those command will display those lines. But if you have more than 10 lines, then both command will display default 10 lines only. The default number of lines will be changed by using -n, n, +n options. (refer man page)

head 命令显示文件的前 10（默认）行。tail 命令显示文件的最后 10（默认）行。假设文件只有 3 行也没有问题，这些命令将显示这些行。但是如果你有超过 10 行，那么这两个命令将只显示默认的 10 行。将使用 -n、n、+n 选项更改默认行数。（参考手册页）

bash "(head; tail) < file" 是如何工作的？

提问by zellyn

采纳答案by rob mayoff

OS X

操作系统

Linux

Linux

回答by Samus_

回答by ashirley

回答by prabu

相关推荐

最近更新

标签

bash "(head; tail) < file" 是如何工作的？

提问by zellyn

采纳答案by rob mayoff

OS X

操作系统

Linux

Linux

回答by Samus_

回答by ashirley

回答by prabu

相关推荐

bash 执行 ssh-add 并自动输入密码的脚本

bash 使用 sed 删除文件中的所有注释

bash 群：为什么是 200？

bash 有条件的 awk hashmap 匹配查找

相关推荐

最近更新

标签