bash 使用命令行工具计算文件中的行长度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16750911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 23:45:54  来源:igfitidea点击:

Count line lengths in file using command line tools

bashshellcommand-linescripting

提问by Pete Hamilton

Problem

问题

If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?

如果我有一个包含许多不同长度行的长文件,我如何计算每行长度的出现次数?

Example:

例子:

file.txt

文件.txt

this
is
a
sample
file
with
several
lines
of
varying
length

Running count_line_lengths file.txtwould give:

跑步count_line_lengths file.txt会给:

Length Occurences
1      1
2      2
4      3
5      1
6      2
7      2

Ideas?

想法?

回答by Ignacio Vazquez-Abrams

count.awk:

计数.awk:

{
  print length(
$ awk -f count.awk input.txt | sort | uniq -c
      1 1
      2 2
      3 4
      1 5
      2 6
      2 7
); }

...

...

awk '{++a[length()]} END{for (i in a) print i, a[i]}' file.txt

4 3
5 1
6 2
7 2
1 1
2 2

回答by iruvar

Pure awk

纯awk

#!/bin/bash

while read line; do
    ((histogram[${#line}]++))
done < file.txt

echo "Length Occurrence"
for length in "${!histogram[@]}"; do
    printf "%-6s %s\n" "${length}" "${histogram[$length]}"
done

回答by Adrian Frühwirth

Using basharrays:

使用bash数组:

$ ./t.sh
Length Occurrence
1      1
2      2
4      3
5      1
6      2
7      2

Example run:

示例运行:

$ perl -lne '$c{length($_)}++ }{ print qq($_ $c{$_}) for (keys %c);' file.txt

回答by jfs

6 2
1 1
4 3
7 2
2 2
5 1

Output

输出

$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /")
1 1
2 2
4 3
5 1
6 2
7 2

回答by Maksym Ganenko

You can accomplish this by using basic unix utilities only:

您可以仅使用基本的 unix 实用程序来完成此操作:

$ cat file.txt
this
is
a
sample
file
with
several
lines
of
varying
length

How it works?

这个怎么运作?

  1. Here's the source file:
    $ for line in $(cat file.txt); do printf $line | wc -c; done
    4
    2
    1
    6
    4
    4
    7
    5
    2
    7
    6
    
  2. Replace each line of the source file with its length:
    $ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c
          1 1
          2 2
          3 4
          1 5
          2 6
          2 7
    
  3. Sort and count the number of length occurrences:
    $ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 
    1 1
    2 2
    4 3
    5 1
    6 2
    7 2
    
  4. Swap and format the numbers:
    $ cat file.txt
    this
    is
    a
    sample
    file
    with
    several
    lines
    of
    varying
    length
    
  1. 这是源文件:
    $ for line in $(cat file.txt); do printf $line | wc -c; done
    4
    2
    1
    6
    4
    4
    7
    5
    2
    7
    6
    
  2. 用其长度替换源文件的每一行:
    $ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c
          1 1
          2 2
          3 4
          1 5
          2 6
          2 7
    
  3. 排序并计算长度出现的次数:
    $ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 
    1 1
    2 2
    4 3
    5 1
    6 2
    7 2
    
  4. 交换和格式化数字:
    1 1
    2 2
    3 4
    1 5
    2 6
    2 7
    

回答by imrek

If you allow for the columns to be swapped and don't need the headers, something as easy as

如果您允许交换列并且不需要标题,那么简单

while read line; do echo -n $line | wc -m; done < file | sort | uniq -c

while read line; do echo -n $line | wc -m; done < file | sort | uniq -c

(without any advanced tricks with sedor awk) will work. The output is:

(没有任何高级技巧sedawk)将起作用。输出是:

##代码##

One important thing to keep in mind: wc -ccounts the bytes, not the characters, and will not give the correct length for strings containing multibyte characters. Therefore the use of wc -m.

要记住的一件重要事情:wc -c计算字节数,而不是字符数,并且不会为包含多字节字符的字符串提供正确的长度。因此使用wc -m.

References:

参考:

man uniq(1)

男人 uniq(1)

man sort(1)

人排序(1)

man wc(1)

男人厕所(1)