Linux 计算 Unix 上每行/字段的字符出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8629410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 03:48:57  来源:igfitidea点击:

Count occurrences of character per line/field on Unix

linuxbashshellunixscripting

提问by toop

Given a file with data like this (ie stores.dat file)

给定一个包含这样数据的文件(即 stores.dat 文件)

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200

What is the command that would return the number of occurrences of the 't' character per line?

返回每行“t”字符出现次数的命令是什么?

eg. would return:

例如。会返回:

count   lineNum
   4       1
   3       2
   6       3


Also, to do it by count of occurrences by field what is the command to return the following results?

另外,要按字段的出现次数进行计算,返回以下结果的命令是什么?

eg. input of column 2 and character 't'

例如。输入第 2 列和字符 't'

count   lineNum
   1       1
   0       2
   1       3

eg. input of column 3 and character 't'

例如。输入第 3 列和字符 't'

count   lineNum
   2       1
   1       2
   4       3

采纳答案by jaypal singh

To count occurrence of a character per line you can do:

要计算每行字符的出现次数,您可以执行以下操作:

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4       1
3       2
6       3

To count occurrence of a character per field/column you can do:

要计算每个字段/列的字符出现次数,您可以执行以下操作:

column 2:

第 2 栏:

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1       1
0       2
1       3

column 3:

第 3 栏:

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2       1
1       2
4       3
  • gsub()function's return value is number of substitution made. So we use that to print the number.
  • NRholds the line number so we use it to print the line number.
  • For printing occurrences of particular field, we create a variable fldand put the field number we wish to extract counts from.
  • gsub()函数的返回值是替换的次数。所以我们用它来打印数字。
  • NR保存行号,所以我们用它来打印行号。
  • 为了打印特定字段的出现次数,我们创建一个变量fld并放置我们希望从中提取计数的字段编号。

回答by Birei

One possible solution using perl:

一种可能的解决方案使用perl

Content of script.pl:

script.pl 的内容:

use warnings;
use strict;

## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
##     of columns, default to search char in all the line.
(@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl 
perl script.pl
Usage: perl script.pl input-file char [column]
input-file char [column]\n); my ($char,$column); ## Get values or arguments. if ( @ARGV == 3 ) { ($char, $column) = splice @ARGV, -2; } else { $char = pop @ARGV; $column = 0; } ## Check that $char must be a non-white space character and $column ## only accept numbers. die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; print qq[count\tlineNum\n]; while ( <> ) { ## Remove last '\n' chomp; ## Get fields. my @f = split /\|/; ## If column is a valid one, select it to the search. if ( $column > 0 and $column <= scalar @f ) { $_ = $f[ $column - 1]; } ## Count. my $count = eval qq[tr/$char/$char/]; ## Print result. printf qq[%d\t%d\n], $count, $.; }

The script accepts three parameters:

该脚本接受三个参数:

  1. Input file
  2. Char to search
  3. Column to search: If column is a bad digit, it searchs all the line.
  1. 输入文件
  2. 要搜索的字符
  3. 要搜索的列:如果列是坏数字,则搜索所有行。


Running the script without arguments:

不带参数运行脚本:

perl script.pl stores.dat 't' 0
count   lineNum
4       1
3       2
6       3

With arguments and its output:

带有参数及其输出:

Here 0 is a bad column, it searches all the line.

这里 0 是一个坏列,它搜索所有行。

perl script.pl stores.dat 't' 1
count   lineNum
0       1
2       2
0       3

Here it searches in column 1.

这里它在第 1 列中搜索。

perl script.pl stores.dat 't' 3
count   lineNum
2       1
1       2
4       3

Here it searches in column 3.

这里它在第 3 列中搜索。

perl script.pl stores.dat 'th' 3
Bad input

this not a char.

th不是字符。

cat stores.dat | awk 'BEGIN {FS = "|"}; {print }' |  awk 'BEGIN {FS = "\t"}; {print NF}'

回答by Jelena

cat file | tr -c -d "t\n" | cat -n |
  { echo "count   lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

Where $1would be a column number you want to count.

$1您要计算的列号在哪里。

回答by jfg956

No need for awk or perl, only with bash and standard Unix utilities:

不需要 awk 或 perl,只需要 bash 和标准 Unix 实用程序:

cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
  { echo -e "count lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

And for a particular column:

对于特定列:

echo "count   lineNum"
num=1
while read data; do
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file

And we can even avoid trand the cats:

我们甚至可以避免trcats:

echo "count   lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
  data=${array_data[1]}
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file
IFS=$OLF_IFS

and event the cut:

和事件削减:

grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1

回答by Gabriel Burt

  4 1
  3 2
  6 3

gives almost exactly the output you want:

给出几乎完全你想要的输出:

 $ cat -n test.txt
 1  test 1
 2  you want
 3  void
 4  you don't want
 5  ttttttttttt
 6  t t t t t t

 $ awk '{n=split(
awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
    print "count", "lineNum"
}{
    split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
,c,"t")-1;if (n!=0) print n,NR}' test.txt 2 1 1 2 2 4 11 5 6 6

Thanks to @raghav-bhushan for the grep -ohint, what a useful flag. The -n flag includes the line number as well.

感谢@raghav-bhushan 的grep -o提示,这是一个多么有用的标志。-n 标志也包括行号。

回答by Haven Holmes

awk '{gsub("[^t]",""); print length(
awk 'BEGIN{FS="|"} {gsub("[^t]","",); print NR,length();}' stores.dat
),NR;}' stores.dat

回答by Cole Tierney

You could also split the line or field with "t" and check the length of the resulting array - 1. Set the colvariable to 0 for the line or 1 through 3 for columns:

您还可以使用“t”拆分行或字段并检查结果数组的长度 - 1. 将col变量设置为 0 行或 1 到 3 列:

$ awk -F 't' '{print NF-1, NR}'  input.txt
4 1
3 2
6 3

回答by vulcan

$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3

$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3

The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.

对 gsub() 的调用会删除不在 at 的行中的所有内容,然后只打印剩余内容的长度和当前行号。

Want to do it just for column 2?

只想为第 2 列执行此操作?

perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat

回答by artm

To count occurences of a character per line:

要计算每行字符的出现次数:

##代码##

this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.

这将字段分隔符设置为需要计数的字符,然后使用字段数比分隔符数大 1 的事实。

To count occurences in a particular column cutout that column first:

要首先计算cut该列中特定列中的出现次数:

##代码##

回答by Steve Thorn

##代码##

Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times trfound the character 't'. ++$xmaintains the line number count.

另一个 perl 回答是的!tr/t// 函数返回翻译在该行上发生的次数,换句话说,tr找到字符“ t”的次数。++$x维护行号计数。