bash 使用命令行工具计算文件中的行长度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16750911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count line lengths in file using command line tools
提问by Pete Hamilton
Problem
问题
If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?
如果我有一个包含许多不同长度行的长文件,我如何计算每行长度的出现次数?
Example:
例子:
file.txt
文件.txt
this
is
a
sample
file
with
several
lines
of
varying
length
Running count_line_lengths file.txt
would give:
跑步count_line_lengths file.txt
会给:
Length Occurences
1 1
2 2
4 3
5 1
6 2
7 2
Ideas?
想法?
回答by Ignacio Vazquez-Abrams
count.awk:
计数.awk:
{
print length($ awk -f count.awk input.txt | sort | uniq -c
1 1
2 2
3 4
1 5
2 6
2 7
);
}
...
...
awk '{++a[length()]} END{for (i in a) print i, a[i]}' file.txt
4 3
5 1
6 2
7 2
1 1
2 2
回答by iruvar
Pure awk
纯awk
#!/bin/bash
while read line; do
((histogram[${#line}]++))
done < file.txt
echo "Length Occurrence"
for length in "${!histogram[@]}"; do
printf "%-6s %s\n" "${length}" "${histogram[$length]}"
done
回答by Adrian Frühwirth
Using bash
arrays:
使用bash
数组:
$ ./t.sh
Length Occurrence
1 1
2 2
4 3
5 1
6 2
7 2
Example run:
示例运行:
$ perl -lne '$c{length($_)}++ }{ print qq($_ $c{$_}) for (keys %c);' file.txt
回答by jfs
回答by Maksym Ganenko
You can accomplish this by using basic unix utilities only:
您可以仅使用基本的 unix 实用程序来完成此操作:
$ cat file.txt this is a sample file with several lines of varying length
How it works?
这个怎么运作?
- Here's the source file:
$ for line in $(cat file.txt); do printf $line | wc -c; done 4 2 1 6 4 4 7 5 2 7 6
- Replace each line of the source file with its length:
$ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c 1 1 2 2 3 4 1 5 2 6 2 7
- Sort and count the number of length occurrences:
$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 1 1 2 2 4 3 5 1 6 2 7 2
- Swap and format the numbers:
$ cat file.txt this is a sample file with several lines of varying length
- 这是源文件:
$ for line in $(cat file.txt); do printf $line | wc -c; done 4 2 1 6 4 4 7 5 2 7 6
- 用其长度替换源文件的每一行:
$ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c 1 1 2 2 3 4 1 5 2 6 2 7
- 排序并计算长度出现的次数:
$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/ /") 1 1 2 2 4 3 5 1 6 2 7 2
- 交换和格式化数字:
1 1 2 2 3 4 1 5 2 6 2 7
回答by imrek
If you allow for the columns to be swapped and don't need the headers, something as easy as
如果您允许交换列并且不需要标题,那么简单
while read line; do echo -n $line | wc -m; done < file | sort | uniq -c
while read line; do echo -n $line | wc -m; done < file | sort | uniq -c
(without any advanced tricks with sed
or awk
) will work. The output is:
(没有任何高级技巧sed
或awk
)将起作用。输出是:
One important thing to keep in mind: wc -c
counts the bytes, not the characters, and will not give the correct length for strings containing multibyte characters. Therefore the use of wc -m
.
要记住的一件重要事情:wc -c
计算字节数,而不是字符数,并且不会为包含多字节字符的字符串提供正确的长度。因此使用wc -m
.
References:
参考: