bash Awk、tail、sed 或其他 - 对于大文件，哪一种更快？

Question

提问by onur

I have scripts for big log files. I can check all line and do something with tailand awk.

我有大日志文件的脚本。我可以检查所有线路，并做一些与tail和awk。

Tail:

尾巴：

tail -n +$startline $LOG

Awk:

惊：

awk 'NR>='"$startline"' {print}' $LOG

And checking time, tail working 6 mins 39 seconds, awk working 6 mins 42 seconds. So two commands do same thing / same time.

并检查时间，尾部工作 6 分 39 秒，awk 工作 6 分 42 秒。所以两个命令做同样的事情/同时。

I don't know how to do with sed. Sed can be faster than tail and awk? Or maybe other commands.

我不知道如何处理 sed。sed 可以比tail 和awk 快吗？或者其他命令。

Second question, I use $startlineand every time continue remains from the last line. For example:

第二个问题，我使用$startline并且每次继续从最后一行开始。例如：

I use script line this:

我使用脚本行：

10:00AM -> ./script -> $startline=1 and do something -> write line number to save file(for ex. 25),
10:05AM -> ./script -> $startline=26(read save file +1) and do something -> write line number save file(55),
10:10AM -> ./script -> $startline=56(read save file +1) and do something ....

But when script is running, checking all lines and when see $startline, doing something. And it's a little slow because of huge files.

但是当脚本运行时，检查所有行，当看到时$startline，做一些事情。由于文件很大，它有点慢。

Any suggestions for it be faster?

有什么建议可以更快吗？

Script example:

脚本示例：

lastline=$(tail -1 "line.save")
startline=$(($lastline + 1))
tail -n +$startline $LOG | while read -r
do
....
done
linecount=$(wc -l "$LOG" | awk '{print }')
echo $linecount >> line.save

Answer 1

采纳答案by fedorqui 'SO stop harming'

tailand headare tools especially created for this purposes, so the intuitive idea is that their are quite optimized for it. On the other hand, awkand sedcan perfectly do it because they are like a Swiss Army knife, but this is not supposed to be its best "skill" over the multiple others that they have.

tail并且head是专门为此目的创建的工具，因此直观的想法是它们已为此进行了相当优化。另一方面，awk并且sed可以完美地做到这一点，因为它们就像一把瑞士军刀，但这不应该是其拥有的众多其他人的最佳“技能”。

In Efficient way to print lines from a massive file using awk, sed, or something else?there is a nice comparison on methods and head/ tailis seen as the best approach.

以有效的方式使用 awk、sed 或其他方式从大量文件中打印行？在方法上有一个很好的比较，head/tail被视为最好的方法。

Hence, I would go for tail+ head.

因此，我会选择tail+ head。

Note also that if it is not only the last lines, but a set of them within the text, in awk(or in sed) you have the option to exitafter the last line you wanted. This way, you avoid the script to run the file until the last line.

另请注意，如果它不仅是最后一行，而且是文本中的一组它们，则 in awk（或 in sed）您可以选择exit在您想要的最后一行之后。这样，您可以避免脚本运行文件直到最后一行。

So this:

所以这：

awk '{if (NR>=10 && NR<20) print} NR==20 {print; exit}'

is faster than

比

awk 'NR>=10 && NR<=20'

If your input happens to contain more than 20 lines.

如果您的输入恰好包含 20 多行。

Regarding your expression:

关于你的表达：

awk 'NR>='"$startline"' {print}' $LOG

note that it is more straight forward to write:

请注意，编写更直接：

awk -v start="$startline" 'NR>=start' $LOG

there is no need to say printbecause it is implicit.

不用说，print因为它是隐含的。

bash Awk、tail、sed 或其他 - 对于大文件，哪一种更快？

提问by onur

采纳答案by fedorqui 'SO stop harming'

相关推荐

最近更新

标签

bash Awk、tail、sed 或其他 - 对于大文件，哪一种更快？

提问by onur

采纳答案by fedorqui 'SO stop harming'

相关推荐

bash 陷阱中断命令，但应在循环结束时退出

如何将 bash 数组格式化为 JSON 数组

bash awk: 致命: 无法打开文件 `' 进行读取（没有这样的文件或目录）

[-f: 命令未找到，Bash 脚本文件存在

相关推荐

最近更新

标签