bash 使用命令行工具计算文件中的行长度

Question

提问by Pete Hamilton

Problem

问题

If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?

如果我有一个包含许多不同长度行的长文件，我如何计算每行长度的出现次数？

Example:

例子：

file.txt

文件.txt

this
is
a
sample
file
with
several
lines
of
varying
length

Running count_line_lengths file.txtwould give:

跑步count_line_lengths file.txt会给：

Length Occurences
1      1
2      2
4      3
5      1
6      2
7      2

Ideas?

想法？

Answer 1

回答by Ignacio Vazquez-Abrams

count.awk:

计数.awk：

{
  print length($ awk -f count.awk input.txt | sort | uniq -c
      1 1
      2 2
      3 4
      1 5
      2 6
      2 7
);
}

...

awk '{++a[length()]} END{for (i in a) print i, a[i]}' file.txt

4 3
5 1
6 2
7 2
1 1
2 2

Answer 2

回答by iruvar

Pure awk

纯awk

#!/bin/bash

while read line; do
    ((histogram[${#line}]++))
done < file.txt

echo "Length Occurrence"
for length in "${!histogram[@]}"; do
    printf "%-6s %s\n" "${length}" "${histogram[$length]}"
done

Answer 3

回答by Adrian Frühwirth

Using basharrays:

使用bash数组：

$ ./t.sh
Length Occurrence
1      1
2      2
4      3
5      1
6      2
7      2

Example run:

示例运行：

$ perl -lne '$c{length($_)}++ }{ print qq($_ $c{$_}) for (keys %c);' file.txt

Answer 4

回答by jfs

Output

输出

$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /")
1 1
2 2
4 3
5 1
6 2
7 2

Answer 5

回答by Maksym Ganenko

You can accomplish this by using basic unix utilities only:

您可以仅使用基本的 unix 实用程序来完成此操作：

$ cat file.txt
this
is
a
sample
file
with
several
lines
of
varying
length

How it works?

这个怎么运作？

Here's the source file:

$ for line in $(cat file.txt); do printf $line | wc -c; done
4
2
1
6
4
4
7
5
2
7
6

Replace each line of the source file with its length:

$ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c
      1 1
      2 2
      3 4
      1 5
      2 6
      2 7

Sort and count the number of length occurrences:

$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 
1 1
2 2
4 3
5 1
6 2
7 2

Swap and format the numbers:

$ cat file.txt
this
is
a
sample
file
with
several
lines
of
varying
length

这是源文件：

$ for line in $(cat file.txt); do printf $line | wc -c; done
4
2
1
6
4
4
7
5
2
7
6

用其长度替换源文件的每一行：

$ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c
      1 1
      2 2
      3 4
      1 5
      2 6
      2 7

排序并计算长度出现的次数：

$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 
1 1
2 2
4 3
5 1
6 2
7 2

交换和格式化数字：
```
1 1
2 2
3 4
1 5
2 6
2 7
```

Answer 6

回答by imrek

If you allow for the columns to be swapped and don't need the headers, something as easy as

如果您允许交换列并且不需要标题，那么简单

while read line; do echo -n $line | wc -m; done < file | sort | uniq -c

(without any advanced tricks with sedor awk) will work. The output is:

（没有任何高级技巧sed或awk）将起作用。输出是：

##代码##

One important thing to keep in mind: wc -ccounts the bytes, not the characters, and will not give the correct length for strings containing multibyte characters. Therefore the use of wc -m.

要记住的一件重要事情：wc -c计算字节数，而不是字符数，并且不会为包含多字节字符的字符串提供正确的长度。因此使用wc -m.

References:

参考：

bash 使用命令行工具计算文件中的行长度

提问by Pete Hamilton

Problem

问题

Example:

例子：

回答by Ignacio Vazquez-Abrams

回答by iruvar

回答by Adrian Frühwirth

回答by jfs

Output

输出

回答by Maksym Ganenko

How it works?

这个怎么运作？

回答by imrek

相关推荐

最近更新

标签

bash 使用命令行工具计算文件中的行长度

提问by Pete Hamilton

Problem

问题

Example:

例子：

回答by Ignacio Vazquez-Abrams

回答by iruvar

回答by Adrian Frühwirth

回答by jfs

Output

输出

回答by Maksym Ganenko

How it works?

这个怎么运作？

回答by imrek

相关推荐

使用 bash 查找包含字符串的第一个文件夹名称

如何在 bash shell 脚本中添加整数和浮点数

bash 使用 curl 命令将文件保存到特定文件夹

bash 删除文件末尾的换行符

相关推荐

最近更新

标签