bash 根据列中的值选择行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23916082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:33:29  来源:igfitidea点击:

Select lines based on value in a column

bashawk

提问by Tom

I have a tab delimited table for which I want to print all lines where column 'x' is greater than 'Y'. I have attempted using the code below but am new to using awk so am unsure how to use it based on columns.

我有一个制表符分隔的表格,我想打印其中列“x”大于“Y”的所有行。我曾尝试使用下面的代码,但我不熟悉使用 awk,所以我不确定如何基于列使用它。

awk '$X >= Y {print} ' Table.txt | cat > Wanted_lines 

Y are values from 1 to 100.

Y 是从 1 到 100 的值。

If the input were like below with column X were the second column.

如果输入如下所示,X 列是第二列。

1    30
2    50
3    100
4    100
5    80
6    79
7    90

The wanted output would be:

想要的输出是:

3    100
4    100
5    80
7    90

The first 2 lines of the file is:

文件的前两行是:

1   OTU1    243622  208679  121420  265864  0   0   2   0   0   11  1   5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   839604  OTU1    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%
2   OTU2    84366   120817  15834   74737   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   295755  OTU2    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%

采纳答案by Anthony Rutledge

First

第一的

awk's default internal field separator (FS) will work on space or tab delimited files.

awk 的默认内部字段分隔符 (FS) 将适用于空格或制表符分隔的文件。

Secondly

其次

awk '$x > FLOOR' Table.txt

Where $xis the target column, and FLOORis the actual numeric floor (i.e. 5000, etc ...)

$x目标列在哪里,FLOOR是实际数字楼层(即 5000 等...)

Example file: awktest

示例文件:awktest

500  100
400  1100
1000 400
1200 500


awk ' > 1000' awktest

1200   500

awk ' >= 1000' awktest

1000   400 
1200   500

Thus, you should be able to use a relational expression to print the lines where x > y, in the form:

因此,您应该能够使用关系表达式以以下形式打印 x > y 的行:

awk '$x > $y' awktest

Where $xis a numeric column as in $1, or other.

哪里$x是数字列,如 in$1或其他。

Where $yis a numeric column as in $2, or other.

哪里$y是数字列,如 in$2或其他。

Example:

示例

awk ' > ' awktest

or ...

或者 ...

awk ' > ' awktest

awk numbers are floating point numbers, so you can compare decimals, too.

awk 数字是浮点数,因此您也可以比较小数。

回答by ghoti

So...

所以...

  • '$X >= Y {print}'is redundant, as the default action in awk is to print.
  • | cat > fileis UUOC.
  • Your expected output shows lines where that value is 80 or above. This answer assumes the output is what you really want, despite the lack of code to handle it.
  • I don't see how your last input example relates to things. Is there particular output you'd like from that input?
  • '$X >= Y {print}'是多余的,因为 awk 中的默认操作是打印。
  • | cat > fileUUOC
  • 您的预期输出显示该值为 80 或更高的行。尽管缺少处理它的代码,但此答案假定输出是您真正想要的。
  • 我不明白你的最后一个输入示例与事物有何关系。您是否希望该输入有特定的输出?

Consider:

考虑:

$ awk '$X >= Y' X=2 Y=80 input.txt
3    100
4    100
5    80
7    90
$ awk '$X >= Y' X=2 Y=90 input.txt
3    100
4    100
7    90

The notation above relies on the following statement from man awk:

上述符号依赖于以下声明man awk

Any file of the form var=value is treated as an assignment, not a filename, and is executed at the time it would have been opened if it were a filename.

任何形式为 var=value 的文件都被视为赋值,而不是文件名,并且在它是文件名时会被打开时执行。

This is functionally equivalent to:

这在功能上等同于:

$ awk -v X=2 -v Y=80 '$X >= Y' input.txt

Either of these notations for getting shell variables into your awk script will do just fine, I believe any version of awk you come across (bsdawk, gawk, mawk) should handle both equally well.

将 shell 变量添加到 awk 脚本中的这些符号中的任何一个都可以,我相信您遇到的任何版本的 awk(bsdawk、gawk、mawk)都应该同样好地处理。

Within a shell script, you might see something like this:

在 shell 脚本中,您可能会看到如下内容:

#!/usr/bin/env bash

if [[ $# != 2 ]]; then
  printf 'Please supply column and floor values as parameters.\n'
  exit 1
elif [[  =~ [^0-9] ]] || [[  =~ [^0-9] ]]; then
  printf 'Invalid parameters.\n'
  exit 1
fi

awk '$X >= Y' X="" Y="" input.txt

回答by Juan Diego Godoy Robles

Try:

尝试:

awk -v num_col=$X -v limit=$Y '$num_col + 0 >= limit + 0' Table.txt > Wanted_lines

Example:

例子:

$ cat Table.txt
1    30
2    50
3    100
4    100
5    80
6    79
7    90


$ X=2
$ Y=80
$ awk -v num_col=$X -v limit=$Y '$num_col + 0 > limit + 0' Table.txt
3    100
4    100
5    80
7    90

Alternatively (hacky and NOT recomended) awk enclosure could be broken this way:

或者(hacky 且不推荐)awk 外壳可以通过这种方式破坏:

$  awk '$'"${X}"' + 0 >= '"${Y}"' + 0' Table.txt

This is what you need to get rid of %symbol in your actual file:

这是您在实际文件中摆脱%符号所需的内容:

$ awk -v num_col=43 -v limit=80 '{sub(/%/,"",$num_col)}$num_col + 0 >= limit + 0 ' Table.txt