bash 在 unix 命令行中删除文件的前 N 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17330188/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove first N lines of a file in place in unix command line
提问by Mittenchops
I'm trying to remove the first 37 lines from a very, very large file. I started trying sed and awk, but they seem to require copying the data to a new file. I'm looking for a "remove lines in place" method, that unlike sed -i
is not making copies of any kind, but rather is just removing lines from the existing file.
我试图从一个非常非常大的文件中删除前 37 行。我开始尝试 sed 和 awk,但它们似乎需要将数据复制到新文件中。我正在寻找一种“就地删除行”的方法,它不像sed -i
不制作任何类型的副本,而只是从现有文件中删除行。
Here's what I've done...
这是我所做的......
awk 'NR > 37' file.xml > 'f2.xml'
sed -i '1,37d' file.xml
Both of these seem to do a full copy. Is there any other simple CLI that can do this quickly without a full document traversal?
这两个似乎都做了一个完整的副本。有没有其他简单的 CLI 可以在没有完整文档遍历的情况下快速完成此操作?
采纳答案by Ed Morton
There's no simple way to do inplace editing using UNIX utilities, but here's one inplace file modification solution that you might be able to modify to work for you (courtesy of Robert Bonomi at https://groups.google.com/forum/#!topic/comp.unix.shell/5PRRZIP0v64):
没有使用 UNIX 实用程序进行就地编辑的简单方法,但这里有一个就地文件修改解决方案,您可以对其进行修改以适合您(由 Robert Bonomi 在https://groups.google.com/forum/#!主题/comp.unix.shell/5PRRZIP0v64):
bytes=$(head -37 "$file" |wc -c)
dd if="$file" bs="$bytes" skip=1 conv=notrunc of="$file"
The final file should be $count
bytes smaller than the original (since the goal was to remove $count
bytes from the beginning), so to finish we must remove the final $count
bytes. We're using conv=notrunc
above to make sure that the file doesn't get completely emptied rather than just truncated (see below for example). On a GNU system such as Linux doing the truncation afterwards can be accomplished by:
最终文件的$count
字节数应小于原始文件(因为目标是$count
从开头删除字节),因此要完成我们必须删除最后的$count
字节。我们使用conv=notrunc
上面的方法来确保文件不会被完全清空,而不仅仅是被截断(例如,请参见下文)。在 GNU 系统(例如 Linux)上,可以通过以下方式完成截断:
truncate -s "-$bytes" "$file"
For example to delete the first 5 lines from this 12-line file
例如从这个 12 行文件中删除前 5 行
$ wc -l file
12 file
$ cat file
When chapman billies leave the street,
And drouthy neibors, neibors, meet;
As market days are wearing late,
And folk begin to tak the gate,
While we sit bousing at the nappy,
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
First use dd
to remove the target 5 lines (really "$bytes" bytes) from the start of the file and copy the rest from the end to the front but leave the trailing "$bytes" bytes as-is:
首先使用dd
从文件的开头删除目标 5 行(实际上是“$bytes”字节),并将其余部分从文件末尾复制到开头,但保留尾随的“$bytes”字节原样:
$ bytes=$(head -5 file |wc -c)
$ dd if=file bs="$bytes" skip=1 conv=notrunc of=file
1+1 records in
1+1 records out
253 bytes copied, 0.0038458 s, 65.8 kB/s
$ wc -l file
12 file
$ cat file
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
s, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
and then use truncate
to remove those leftover bytes from the end:
然后用于truncate
从末尾删除那些剩余的字节:
$ truncate -s "-$bytes" "file"
$ wc -l file
7 file
$ cat file
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
If we had tried the above without dd ... conv=notrunc
:
如果我们在没有以下情况下尝试过dd ... conv=notrunc
:
$ wc -l file
12 file
$ bytes=$(head -5 file |wc -c)
$ dd if=file bs="$bytes" skip=1 of=file
dd: file: cannot skip to specified offset
0+0 records in
0+0 records out
0 bytes copied, 0.0042254 s, 0.0 kB/s
$ wc -l file
0 file
See the google groups thread I referenced for other suggestions and info.
有关其他建议和信息,请参阅我引用的 google 群组线程。
回答by that other guy
Unix file semantics do not allow truncating the front part of a file.
Unix 文件语义不允许截断文件的前部。
All solutions will be based on either:
所有解决方案都将基于:
- Reading the file into memory and then writing it back (
ed
,ex
, other editors). This should be fine if your file is <1GB or if you have plenty of RAM. - Writing a second copy and optionally replacing the original (
sed -i
,awk
/tail > foo
). This is fine as long as you have enough free diskspace for a copy, and don't mind the wait.
- 将文件读入内存,然后将其写回(
ed
、ex
、其他编辑器)。如果您的文件小于 1GB 或者您有足够的 RAM,这应该没问题。 - 编写第二个副本并可选择替换原始副本 (
sed -i
,awk
/tail > foo
)。只要您有足够的可用磁盘空间用于复制,这很好,并且不介意等待。
If the file is too large for any of these to work for you, you may be able to work around it depending on what's reading your file.
如果文件太大而无法为您工作,您可以根据读取文件的内容来解决它。
Perhaps your reader skips comments or blank lines? If so, you can then craft a message the reader ignores, make sure it has the same number of bytes as the 37 first lines in your file, and overwrite the start of the file with dd if=yourdata of=file conv=notrunc
.
也许您的读者会跳过评论或空白行?如果是这样,您可以制作一条读者忽略的消息,确保它与文件中的前 37 行具有相同的字节数,并用dd if=yourdata of=file conv=notrunc
.
回答by Peteris
The copy will have to be created at some point - why not at the time of reading the "modified" file; streaming the altered copy instead of storing it?
必须在某个时候创建副本——为什么不在读取“修改过的”文件时创建;流式传输更改的副本而不是存储它?
What I'm thinking - create a named pipe "file2" that is the output of that same awk 'NR > 37' file.xml or whatever; then whoever reads file2 will not see the first 37 lines.
我在想什么 - 创建一个命名管道“file2”,它是相同 awk 'NR > 37' file.xml 或其他任何东西的输出;那么读取 file2 的人将看不到前 37 行。
The drawback is that it will run awk each time the file is processed, so it's feasible only if it's read rarely.
缺点是每次处理文件时都会运行awk,所以只有在很少读取的情况下才可行。