Linux 计算 Unix 上每行/字段的字符出现次数

Question

提问by toop

Given a file with data like this (ie stores.dat file)

给定一个包含这样数据的文件（即 stores.dat 文件）

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200

What is the command that would return the number of occurrences of the 't' character per line?

返回每行“t”字符出现次数的命令是什么？

eg. would return:

例如。会返回：

count   lineNum
   4       1
   3       2
   6       3

Also, to do it by count of occurrences by field what is the command to return the following results?

另外，要按字段的出现次数进行计算，返回以下结果的命令是什么？

eg. input of column 2 and character 't'

例如。输入第 2 列和字符 't'

count   lineNum
   1       1
   0       2
   1       3

eg. input of column 3 and character 't'

例如。输入第 3 列和字符 't'

count   lineNum
   2       1
   1       2
   4       3

Answer 1

采纳答案by jaypal singh

To count occurrence of a character per line you can do:

要计算每行字符的出现次数，您可以执行以下操作：

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4       1
3       2
6       3

To count occurrence of a character per field/column you can do:

要计算每个字段/列的字符出现次数，您可以执行以下操作：

column 2:

第 2 栏：

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1       1
0       2
1       3

column 3:

第 3 栏：

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2       1
1       2
4       3

gsub()function's return value is number of substitution made. So we use that to print the number.
NRholds the line number so we use it to print the line number.
For printing occurrences of particular field, we create a variable fldand put the field number we wish to extract counts from.

gsub()函数的返回值是替换的次数。所以我们用它来打印数字。
NR保存行号，所以我们用它来打印行号。
为了打印特定字段的出现次数，我们创建一个变量fld并放置我们希望从中提取计数的字段编号。

Answer 2

回答by Birei

One possible solution using perl:

一种可能的解决方案使用perl：

Content of script.pl:

script.pl 的内容：

use warnings;
use strict;

## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
##     of columns, default to search char in all the line.
(@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl perl script.pl
Usage: perl script.pl input-file char [column]
 input-file char [column]\n);

my ($char,$column);

## Get values or arguments.
if ( @ARGV == 3 ) {
        ($char, $column) = splice @ARGV, -2;
} else {
        $char = pop @ARGV;
        $column = 0;
}

## Check that $char must be a non-white space character and $column 
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; 

print qq[count\tlineNum\n];

while ( <> ) {
        ## Remove last '\n'
        chomp;

        ## Get fields.
        my @f = split /\|/;

        ## If column is a valid one, select it to the search.
        if ( $column > 0 and $column <= scalar @f ) {
                $_ = $f[ $column - 1];
        }

        ## Count.
        my $count = eval qq[tr/$char/$char/];

        ## Print result.
        printf qq[%d\t%d\n], $count, $.;
}

The script accepts three parameters:

该脚本接受三个参数：

Input file
Char to search
Column to search: If column is a bad digit, it searchs all the line.

输入文件
要搜索的字符
要搜索的列：如果列是坏数字，则搜索所有行。

Running the script without arguments:

不带参数运行脚本：

perl script.pl stores.dat 't' 0
count   lineNum
4       1
3       2
6       3

With arguments and its output:

带有参数及其输出：

Here 0 is a bad column, it searches all the line.

这里 0 是一个坏列，它搜索所有行。

perl script.pl stores.dat 't' 1
count   lineNum
0       1
2       2
0       3

Here it searches in column 1.

这里它在第 1 列中搜索。

perl script.pl stores.dat 't' 3
count   lineNum
2       1
1       2
4       3

Here it searches in column 3.

这里它在第 3 列中搜索。

perl script.pl stores.dat 'th' 3
Bad input

this not a char.

th不是字符。

cat stores.dat | awk 'BEGIN {FS = "|"}; {print }' |  awk 'BEGIN {FS = "\t"}; {print NF}'

Answer 3

回答by Jelena

cat file | tr -c -d "t\n" | cat -n |
  { echo "count   lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

Where $1would be a column number you want to count.

$1您要计算的列号在哪里。

Answer 4

回答by jfg956

No need for awk or perl, only with bash and standard Unix utilities:

不需要 awk 或 perl，只需要 bash 和标准 Unix 实用程序：

cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
  { echo -e "count lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

And for a particular column:

对于特定列：

echo "count   lineNum"
num=1
while read data; do
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file

And we can even avoid trand the cats:

我们甚至可以避免tr和cats：

echo "count   lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
  data=${array_data[1]}
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file
IFS=$OLF_IFS

and event the cut:

和事件削减：

grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1

Answer 5

回答by Gabriel Burt

  4 1
  3 2
  6 3

gives almost exactly the output you want:

给出几乎完全你想要的输出：

 $ cat -n test.txt
 1  test 1
 2  you want
 3  void
 4  you don't want
 5  ttttttttttt
 6  t t t t t t

 $ awk '{n=split(awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
    print "count", "lineNum"
}{
    split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
,c,"t")-1;if (n!=0) print n,NR}' test.txt
 2 1
 1 2
 2 4
 11 5
 6 6

Thanks to @raghav-bhushan for the grep -ohint, what a useful flag. The -n flag includes the line number as well.

感谢@raghav-bhushan 的grep -o提示，这是一个多么有用的标志。-n 标志也包括行号。

Answer 6

回答by Haven Holmes

awk '{gsub("[^t]",""); print length(awk 'BEGIN{FS="|"} {gsub("[^t]","",); print NR,length();}' stores.dat
),NR;}' stores.dat

Answer 7

回答by Cole Tierney

You could also split the line or field with "t" and check the length of the resulting array - 1. Set the colvariable to 0 for the line or 1 through 3 for columns:

您还可以使用“t”拆分行或字段并检查结果数组的长度 - 1. 将col变量设置为 0 行或 1 到 3 列：

$ awk -F 't' '{print NF-1, NR}'  input.txt
4 1
3 2
6 3

Answer 8

回答by vulcan

$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3

$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3

The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.

对 gsub() 的调用会删除不在 at 的行中的所有内容，然后只打印剩余内容的长度和当前行号。

Want to do it just for column 2?

只想为第 2 列执行此操作？

perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat

Answer 9

回答by artm

To count occurences of a character per line:

要计算每行字符的出现次数：

##代码##

this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.

这将字段分隔符设置为需要计数的字符，然后使用字段数比分隔符数大 1 的事实。

To count occurences in a particular column cutout that column first:

要首先计算cut该列中特定列中的出现次数：

##代码##

Answer 10

回答by Steve Thorn

##代码##

Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times trfound the character 't'. ++$xmaintains the line number count.

另一个 perl 回答是的！tr/t// 函数返回翻译在该行上发生的次数，换句话说，tr找到字符“ t”的次数。++$x维护行号计数。

Linux 计算 Unix 上每行/字段的字符出现次数

提问by toop

采纳答案by jaypal singh

回答by Birei

回答by Jelena

回答by jfg956

回答by Gabriel Burt

回答by Haven Holmes

回答by Cole Tierney

回答by vulcan

回答by artm

回答by Steve Thorn

相关推荐

最近更新

标签

Linux 计算 Unix 上每行/字段的字符出现次数

提问by toop

采纳答案by jaypal singh

回答by Birei

回答by Jelena

回答by jfg956

回答by Gabriel Burt

回答by Haven Holmes

回答by Cole Tierney

回答by vulcan

回答by artm

回答by Steve Thorn

相关推荐

Linux 使用 RSA 和 DES3 密钥加密和解密一串文本

如何在没有查找的情况下在 linux shell 脚本中根据日期查找和删除文件？

在 ASP.NET (C#) 中实现安全、独特的“一次性”激活 URL

Linux diff命令比较不同服务器上的文件--Unix

相关推荐

最近更新

标签