Bash:查找具有最大行数的文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8488301/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 01:10:08  来源:igfitidea点击:

Bash: Find file with max lines count

bashunixsedawkwc

提问by Marek Sebera

This is my try to do it

这是我的尝试

  • Find all *.javafiles
    find . -name '*.java'
  • Count lines
    wc -l
  • Delete last line
    sed '$d'
  • Use AWK to find max lines-count in wcoutput
    awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'
  • 查找所有*.java文件
    find . -name '*.java'
  • 计数线
    wc -l
  • 删除最后一行
    sed '$d'
  • 使用 AWK 在wc输出中查找最大行数
    awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'

then merge it to single line

然后将其合并为单行

find . -name '*.java' | xargs wc -l | sed '$d' | awk 'max=="" || data=="" ||  > max {max= ; data=} END{ print max " " data}'

Can I somehow implement counting just non-blank lines?

我可以以某种方式实现只计算非空行吗?

回答by Shawn Chin

find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; | \
    sort -nr -t":" -k2 | awk -F: '{print ; exit;}'

Replace the awkcommand with head -n1if you also want to see the number of non-blank lines.

如果您还想查看非空白行的数量,请将awk命令替换为head -n1



Breakdown of the command:

命令分解:

find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; 
'---------------------------'       '-----------------------'
             |                                   |
   for each *.java file             Use grep to count non-empty lines
                                   -H includes filenames in the output
                                 (output = ./full/path/to/file.java:count)

| sort -nr -t":" -k2  | awk -F: '{print ; exit;}'
  '----------------'    '-------------------------'
          |                            |
  Sort the output in         Print filename of the first entry (largest count)
reverse order using the         then exit immediately
  second column (count)

回答by Vijay

find . -name "*.java" -type f | xargs wc -l | sort -rn | grep -v ' total$' | head -1

回答by holygeek

Something like this might work:

像这样的事情可能会奏效:

find . -name '*.java'|while read filename; do
    nlines=`grep -v -E '^[[:space:]]*$' "$filename"|wc -l`
    echo $nlines $filename
done|sort -nr|head -1

(edited as per Ed Morton's comment. I must have had too much coffee :-) )

(根据 Ed Morton 的评论进行编辑。我一定是喝了太多咖啡 :-) )

回答by Ed Morton

To get the size of all of your files using awk is just:

要使用 awk 获取所有文件的大小,只需:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
{ size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

To get the count of the non-empty lines, simply make the line where you increment the size[] conditional:

要获得非空行的计数,只需将增加 size[] 的行设置为有条件的:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

(If you want to consider lines that contain only blanks as "empty" then replace NF with /^./.)

(如果您想将仅包含空格的行视为“空”,则将 NF 替换为 /^./。)

To get only the file with the most non-empty lines just tweak again:

要仅获取具有最多非空行的文件,只需再次调整:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END {
   for (file in size) {
      if (size[file] >= maxSize) {
         maxSize = size[file]
         maxFile = file
      }
   }
   print maxSize, maxFile
}
'