bash 根据列中的值选择行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23916082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select lines based on value in a column
提问by Tom
I have a tab delimited table for which I want to print all lines where column 'x' is greater than 'Y'. I have attempted using the code below but am new to using awk so am unsure how to use it based on columns.
我有一个制表符分隔的表格,我想打印其中列“x”大于“Y”的所有行。我曾尝试使用下面的代码,但我不熟悉使用 awk,所以我不确定如何基于列使用它。
awk '$X >= Y {print} ' Table.txt | cat > Wanted_lines
Y are values from 1 to 100.
Y 是从 1 到 100 的值。
If the input were like below with column X were the second column.
如果输入如下所示,X 列是第二列。
1 30
2 50
3 100
4 100
5 80
6 79
7 90
The wanted output would be:
想要的输出是:
3 100
4 100
5 80
7 90
The first 2 lines of the file is:
文件的前两行是:
1 OTU1 243622 208679 121420 265864 0 0 2 0 0 11 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 839604 OTU1 - Archaea 100% Euryarchaeota 100% Methanobacteria 100% Methanobacteriales 100% Methanobacteriaceae 100% Methanobrevibacter 100%
2 OTU2 84366 120817 15834 74737 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 295755 OTU2 - Archaea 100% Euryarchaeota 100% Methanobacteria 100% Methanobacteriales 100% Methanobacteriaceae 100% Methanobrevibacter 100%
采纳答案by Anthony Rutledge
First
第一的
awk's default internal field separator (FS) will work on space or tab delimited files.
awk 的默认内部字段分隔符 (FS) 将适用于空格或制表符分隔的文件。
Secondly
其次
awk '$x > FLOOR' Table.txt
Where $x
is the target column, and FLOOR
is the actual numeric floor (i.e. 5000, etc ...)
$x
目标列在哪里,FLOOR
是实际数字楼层(即 5000 等...)
Example file: awktest
示例文件:awktest
500 100
400 1100
1000 400
1200 500
awk ' > 1000' awktest
1200 500
awk ' >= 1000' awktest
1000 400
1200 500
Thus, you should be able to use a relational expression to print the lines where x > y, in the form:
因此,您应该能够使用关系表达式以以下形式打印 x > y 的行:
awk '$x > $y' awktest
Where $x
is a numeric column as in $1
, or other.
哪里$x
是数字列,如 in$1
或其他。
Where $y
is a numeric column as in $2
, or other.
哪里$y
是数字列,如 in$2
或其他。
Example:
示例:
awk ' > ' awktest
or ...
或者 ...
awk ' > ' awktest
awk numbers are floating point numbers, so you can compare decimals, too.
awk 数字是浮点数,因此您也可以比较小数。
回答by ghoti
So...
所以...
'$X >= Y {print}'
is redundant, as the default action in awk is to print.| cat > file
is UUOC.- Your expected output shows lines where that value is 80 or above. This answer assumes the output is what you really want, despite the lack of code to handle it.
- I don't see how your last input example relates to things. Is there particular output you'd like from that input?
'$X >= Y {print}'
是多余的,因为 awk 中的默认操作是打印。| cat > file
是UUOC。- 您的预期输出显示该值为 80 或更高的行。尽管缺少处理它的代码,但此答案假定输出是您真正想要的。
- 我不明白你的最后一个输入示例与事物有何关系。您是否希望该输入有特定的输出?
Consider:
考虑:
$ awk '$X >= Y' X=2 Y=80 input.txt
3 100
4 100
5 80
7 90
$ awk '$X >= Y' X=2 Y=90 input.txt
3 100
4 100
7 90
The notation above relies on the following statement from man awk
:
上述符号依赖于以下声明man awk
:
Any file of the form var=value is treated as an assignment, not a filename, and is executed at the time it would have been opened if it were a filename.
任何形式为 var=value 的文件都被视为赋值,而不是文件名,并且在它是文件名时会被打开时执行。
This is functionally equivalent to:
这在功能上等同于:
$ awk -v X=2 -v Y=80 '$X >= Y' input.txt
Either of these notations for getting shell variables into your awk script will do just fine, I believe any version of awk you come across (bsdawk, gawk, mawk) should handle both equally well.
将 shell 变量添加到 awk 脚本中的这些符号中的任何一个都可以,我相信您遇到的任何版本的 awk(bsdawk、gawk、mawk)都应该同样好地处理。
Within a shell script, you might see something like this:
在 shell 脚本中,您可能会看到如下内容:
#!/usr/bin/env bash
if [[ $# != 2 ]]; then
printf 'Please supply column and floor values as parameters.\n'
exit 1
elif [[ =~ [^0-9] ]] || [[ =~ [^0-9] ]]; then
printf 'Invalid parameters.\n'
exit 1
fi
awk '$X >= Y' X="" Y="" input.txt
回答by Juan Diego Godoy Robles
Try:
尝试:
awk -v num_col=$X -v limit=$Y '$num_col + 0 >= limit + 0' Table.txt > Wanted_lines
Example:
例子:
$ cat Table.txt
1 30
2 50
3 100
4 100
5 80
6 79
7 90
$ X=2
$ Y=80
$ awk -v num_col=$X -v limit=$Y '$num_col + 0 > limit + 0' Table.txt
3 100
4 100
5 80
7 90
Alternatively (hacky and NOT recomended) awk enclosure could be broken this way:
或者(hacky 且不推荐)awk 外壳可以通过这种方式破坏:
$ awk '$'"${X}"' + 0 >= '"${Y}"' + 0' Table.txt
This is what you need to get rid of %symbol in your actual file:
这是您在实际文件中摆脱%符号所需的内容:
$ awk -v num_col=43 -v limit=80 '{sub(/%/,"",$num_col)}$num_col + 0 >= limit + 0 ' Table.txt