Linux 如何在文件夹层次结构中找到所有不同的文件扩展名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1842254/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:57:13  来源:igfitidea点击:

How can I find all of the distinct file extensions in a folder hierarchy?

linuxgrepfilesystemsfile-extension

提问by GloryFish

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.

在 Linux 机器上,我想遍历文件夹层次结构并获取其中所有不同文件扩展名的列表。

What would be the best way to achieve this from a shell?

从外壳实现这一目标的最佳方法是什么?

采纳答案by Ivan Nevostruev

Try this (not sure if it's the best way, but it works):

试试这个(不确定这是否是最好的方法,但它有效):

find . -type f | perl -ne 'print  if m/\.([^.\/]+)$/' | sort -u

It work as following:

它的工作原理如下:

  • Find all files from current folder
  • Prints extension of files if any
  • Make a unique sorted list
  • 查找当前文件夹中的所有文件
  • 打印文件的扩展名(如果有)
  • 制作一个独特的排序列表

回答by ChristopheD

Recursive version:

递归版本:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

If you want totals (how may times the extension was seen):

如果你想要总数(看到扩展的次数):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

Non-recursive (single folder):

非递归(单个文件夹):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

I've based this upon this forum post, credit should go there.

我基于这个论坛帖子,信用应该去那里。

回答by user224243

Find everythin with a dot and show only the suffix.

用点查找所有内容并仅显示后缀。

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

if you know all suffix have 3 characters then

如果您知道所有后缀都有 3 个字符,那么

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.

或 with sed 显示所有后缀一到四个字符。将 {1,4} 更改为您期望后缀中的字符范围。

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$//p'| sort -u

回答by ChristopheD

Since there's already another solution which uses Perl:

由于已经有另一个使用 Perl 的解决方案:

If you have Python installed you could also do (from the shell):

如果您安装了 Python,您还可以执行以下操作(从 shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"

回答by ChristopheD

None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.

到目前为止,没有任何回复正确处理带有换行符的文件名(除了 ChristopheD,它在我输入时才出现)。以下不是单行外壳,但有效,并且相当快。

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf

回答by Simon R

Powershell:

电源外壳:

dir -recurse | select-object extension -unique

Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html

感谢http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html

回答by SiegeX

No need for the pipe to sort, awk can do it all:

不需要管道 to sort,awk 可以做到这一切:

find . -type f | awk -F. '!a[$NF]++{print $NF}'

回答by Andres Restrepo

In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:

在 Python 中使用生成器生成非常大的目录,包括空白扩展名,并获取每个扩展名出现的次数:

import json
import collections
import itertools
import os

root = '/home/andres'
files = itertools.chain.from_iterable((
    files for _,_,files in os.walk(root)
    ))
counter = collections.Counter(
    (os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)

回答by jrock2004

you could also do this

你也可以这样做

find . -type f -name "*.php" -exec PATHTOAPP {} +

回答by gkb0986

Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.

将我自己的变化添加到组合中。我认为它是最简单的,当效率不是一个大问题时会很有用。

find . -type f | grep -o -E '\.[^\.]+$' | sort -u