bash 如何在bash中处理每隔一行

Question

提问by Perlnika

I would like to print odd lines (1,3,5,7..) without any change, but even lines (2,4,6,8) process with pipeline beginning with grep. I would like to write everything to new file (odd lines without any change and new values for even lines).

我想打印奇数行 (1,3,5,7..) 不做任何更改，但偶数行 (2,4,6,8) 处理以 grep 开头的管道。我想将所有内容都写入新文件（没有任何更改的奇数行和偶数行的新值）。

I know how to print every other line in awk:

我知道如何在 awk 中打印每隔一行：

awk ' NR % 2 == 1 { print; } NR % 2 ==0 {print; }' file.fasta

However, for even lines, I dont want to use {print; }but I want to use my grep pipeline instead.

但是，对于偶数行，我不想使用{print; }但我想改用我的 grep 管道。

An advice will be appreciated. Thanks a lot.

建议将不胜感激。非常感谢。

Answer 1

采纳答案by Shawn Chin

If you're planning to do a simple grep, you can do away with the additional step and do the filtering within awk itself, e.g.:

如果您打算做一个简单的grep，您可以取消额外的步骤并在 awk 本身内进行过滤，例如：

awk 'NR % 2 {print} !(NR % 2) && /pattern/ {print}' file.fasta

However, if you intend to do a lot more then, as chepner already pointer out, you can indeed pipe from inside awk. For example:

但是，如果您打算做更多的事情，正如chepner 已经指出的那样，您确实可以从 awk 内部进行管道传输。例如：

awk 'NR % 2 {print} !(NR % 2) {print | "grep pattern | rev" }' file.fasta

That opens a pipe to the command "pattern | rev"(note the surrounding quotes) and redirects the print output to it. Do note that the output in this case may not be as you might expect; you will end up with all odd lines being output first followed by the output of the piped command (which consumes the even lines).

这会打开一个指向命令的管道"pattern | rev"（注意周围的引号）并将打印输出重定向到它。请注意，这种情况下的输出可能与您预期的不同；您最终将首先输出所有奇数行，然后是管道命令的输出（消耗偶数行）。

(In response to your comments) to count the number of chars in each even line, try:

（回应您的评论）要计算每个偶数行中的字符数，请尝试：

awk 'NR % 2 {print} !(NR % 2) {print length(awk ' NR % 2 == 1 { print; } NR % 2 ==0 {print | "grep -o [actgnACTGN] | wc -l"; }' file.fasta
)}' file.fasta

Answer 2

回答by chepner

You can pipe directly from inside awk:

您可以直接从内部管道awk：

awk 'BEGIN{ cmd = "grep -io 7[actgn]7 | wc -l" } NR % 2 { print } NR % 2 == 0 { print | cmd; close(cmd) }' file.fasta

Be aware, however, that this will not preserve the order of your input file.

但是请注意，这不会保留输入文件的顺序。

(The selected answer is better for the task at hand, but I'll leave this answer here as an example of piping the print statement to an external command.)

（选定的答案更适合手头的任务，但我将把这个答案留在这里作为将打印语句传递给外部命令的示例。）

Answer 3

回答by Paused until further notice.

In order to have your pipeline output appear in order with your AWK output, you need to close the pipeline at each iteration. This is, of course, very inefficient.

为了让您的管道输出与您的 AWK 输出按顺序出现，您需要在每次迭代时关闭管道。当然，这是非常低效的。

awk 'NR % 2 { print } NR % 2 == 0 {n = split(##代码##, a, /[^actgnACTGN]/); print length(##代码##) - n + 1}' file.fasta

You apparently don't want to count characters that are not in the specified list, so length($0)won't work. This will work and should be a lot faster than the pipeline method:

您显然不想计算不在指定列表中的字符，因此length($0)不起作用。这将起作用并且应该比管道方法快得多：

##代码##

It works by splitting the line using the characters you don'twant as delimiters and subtracting the count of the substrings from the length of the line and adding 1. In essence, it subtracts the number of unwanted characters from the length of the line leaving the number of wanted characters as the result.

它的工作原理是使用不需要的字符作为分隔符分割行，并从行的长度中减去子字符串的计数并加 1。本质上，它从离开的行的长度中减去不需要的字符的数量结果所需的字符数。

bash 如何在bash中处理每隔一行

提问by Perlnika

采纳答案by Shawn Chin

回答by chepner

回答by Paused until further notice.

相关推荐

最近更新

标签

bash 如何在bash中处理每隔一行

提问by Perlnika

采纳答案by Shawn Chin

回答by chepner

回答by Paused until further notice.

相关推荐

具有多个强制选项的 bash getopts

bash 覆盖终端上的最后一行

获取“sed 错误 - 非法字节序列”（在 bash 中）

如何使用 Bash 检查文件是否包含特定字符串

相关推荐

最近更新

标签