bash 如何在给定目录中查找重复的文件名（递归）？巴什

Question

提问by yak

I need to find every duplicate filenames in a given dir tree. I dont know, what dir tree user will give as a script argument, so I dont know the directory hierarchy. I tried this:

我需要在给定的目录树中找到每个重复的文件名。我不知道，目录树用户将提供什么作为脚本参数，所以我不知道目录层次结构。我试过这个：

#!/bin/sh
find -type f | while IFS= read vo
do
echo `basename "$vo"`
done

but thats not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesnt print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command:

但这并不是我真正想要的。它只找到一个重复项然后结束，即使有更多重复的文件名，它也不会打印整个路径（仅打印文件名）和重复计数。我想做一些类似于这个命令的事情：

find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "

but it doenst work for me, dont know why. Even if I have a duplicates, it prints nothing. I use Xubuntu 12.04.

但它确实对我有用，不知道为什么。即使我有重复，它也不会打印任何内容。我使用Xubuntu 12.04。

Answer 1

回答by psibar

Here is another solution (based on the suggestion by @jim-mcnamara) without awk:

这是另一个没有 awk 的解决方案（基于@jim-mcnamara 的建议）：

Solution 1

解决方案1

#!/bin/sh 
dirname=/path/to/directory
find $dirname -type f | sed 's_.*/__' | sort|  uniq -d| 
while read fileName
do
find $dirname -type f | grep "$fileName"
done

However, you have to do the same search twice. This can become very slow if you have to search a lot of data. Saving the "find" results in a temporary file might give a better performance.

但是，您必须进行两次相同的搜索。如果您必须搜索大量数据，这可能会变得非常缓慢。将“查找”结果保存在临时文件中可能会提供更好的性能。

Solution 2 (with temporary file)

解决方案2（使用临时文件）

#!/bin/sh 
dirname=/path/to/directory
tempfile=myTempfileName
find $dirname -type f  > $tempfile
cat $tempfile | sed 's_.*/__' | sort |  uniq -d| 
while read fileName
do
 grep "$fileName" $tempfile
done
#rm -f $tempfile

Since you might not want to write a temp file on the harddrive in some cases, you can choose the method which fits your needs. Both examples print out the full path of the file.

由于在某些情况下您可能不想在硬盘驱动器上写入临时文件，因此您可以选择适合您需要的方法。这两个示例都打印出文件的完整路径。

Bonus question here: Is it possible to save the whole output of the find command as a list to a variable?

这里的额外问题：是否可以将 find 命令的整个输出作为列表保存到变量中？

Answer 2

回答by jim mcnamara

#!/bin/sh
dirname=/path/to/check
find $dirname -type f | 
while read vo
do
  echo `basename "$vo"`
done | awk '{arr[find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate
]++; next} END{for (i in arr){if(arr[i]>1){print i}}}

Answer 3

回答by trs

Yes this is a really old question. But all those loops and temporary files seem a bit cumbersome.

是的，这是一个非常古老的问题。但所有这些循环和临时文件似乎有点麻烦。

Here's my 1-line answer:

这是我的 1 行回答：

find  /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend

It has its limitations due to uniqand sort:

由于uniq和，它有其局限性sort：

no whitespace (space, tab) in filename (will be interpreted as new field by uniqand sort)
needs file name printed as last field delimited by space (uniqdoesn't support comparing only 1 fieldand is inflexible with field delimiters)

没有空白（空格，制表符）中的文件名（将被解释为新的字段uniq和sort）
需要将文件名打印为由空格分隔的最后一个字段（uniq不支持仅比较 1 个字段，并且字段分隔符不灵活）

But it is quite flexible regarding its output thanks to find -printfand works well for me. Also seems to be what @yak tried to achieve originally.

但是由于它的输出非常灵活，find -printf并且对我来说效果很好。似乎也是@yak 最初试图实现的目标。

Demonstrating some of the options you have with this:

展示您对此的一些选择：

lst=$( find . -type f )
echo "$lst" | rev | cut -f 1 -d/ | rev | sort -f | uniq -i | while read f; do
   names=$( echo "$lst" | grep -i -- "/$f$" )
   n=$( echo "$names" | wc -l )
   [ $n -gt 1 ] && echo -e "Duplicates found ($n):\n$names"
done

Also there are options in sortand uniqto ignore case (as the topic opener intended to achieve by piping through tr). Look them up using man uniqor man sort.

也有选项 insort和uniq忽略大小写（作为主题开启器打算通过管道实现tr）。使用man uniq或查找它们man sort。

Answer 4

回答by Fabien Bouleau

One "find" command only:

仅一个“查找”命令：

#!/bin/bash

file=`mktemp /tmp/duplicates.XXXXX` || { echo "Error creating tmp file"; exit 1; }
find  -type f |sort >  $file
awk -F/ '{print tolower($NF)}' $file |
        uniq -c|
        awk '>1 { sub(/^[[:space:]]+[[:digit:]]+[[:space:]]+/,""); print }'| 
        while read line;
                do grep -i "$line" $file;
        done

rm $file

Answer 5

回答by Elisiano Petrini

./duplicates.sh ./test
./test/2/INC 255286
./test/INC 255286

And it also work with spaces in filenames. Here's a simple test (the first argument is the directory):

它也适用于文件名中的空格。这是一个简单的测试（第一个参数是目录）：

#!/bin/bash

# Create a temp directory to contain placeholder files.
tmp_dir=`mktemp -d`

# Get paths of files to test from standard input.
while read p; do
  fname=$(basename "$p")
  tmp_path=$tmp_dir/$fname
  if [[ -e $tmp_path ]]; then
    q=`cat "$tmp_path"`
    echo "duplicate: $p"
    echo "    first: $q"
  else
    echo $p > "$tmp_path" 
  fi
done

exit

Answer 6

回答by Mike Finch

This solution writes one temporary file to a temporary directory for every unique filename found. In the temporary file, I write the path where I first found the unique filename, so that I can output it later. So, I create a lot more files that other posted solutions. But, it was something I could understand.

此解决方案为找到的每个唯一文件名将一个临时文件写入临时目录。在临时文件中，我写了我第一次找到唯一文件名的路径，以便稍后输出。所以，我创建了比其他发布的解决方案更多的文件。但是，这是我能理解的。

Following is the script, named fndupe.

以下是脚本，名为fndupe.

$ find . -name '*.tif' | fndupe

Following is an example of using the script.

以下是使用脚本的示例。

duplicate: a/b/extra/gobble.tif
    first: a/b/gobble.tif

Following is example output when the script finds duplicate filenames.

以下是脚本找到重复文件名时的示例输出。

#!/usr/bin/env bash

find . -type f | while read filename; do
    filename=$(basename -- "$filename")
    extension="${filename##*.}"
    if [[ $extension == "pdf" ]]; then
        fileNameCount=`find . -iname "$filename" | wc -l`
        if [[ $fileNameCount -gt 1 ]]; then
            echo "File Name: $filename, count: $fileNameCount"
        fi
    fi
done

Tested with Bash version: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

用 Bash 版本测试： GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

Answer 7

回答by Benjamin Frazier

Here is my contribution (this just searches for a specific file type, pdfs in this case) but it does so recursively:

这是我的贡献（这只是搜索特定的文件类型，在这种情况下为 pdfs）但它是递归的：

##代码##

bash 如何在给定目录中查找重复的文件名（递归）？巴什

提问by yak

回答by psibar

回答by jim mcnamara

回答by trs

Here's my 1-line answer:

这是我的 1 行回答：

回答by Fabien Bouleau

回答by Elisiano Petrini

回答by Mike Finch

回答by Benjamin Frazier

相关推荐

最近更新

标签

bash 如何在给定目录中查找重复的文件名（递归）？巴什

提问by yak

回答by psibar

回答by jim mcnamara

回答by trs

Here's my 1-line answer:

这是我的 1 行回答：

回答by Fabien Bouleau

回答by Elisiano Petrini

回答by Mike Finch

回答by Benjamin Frazier

相关推荐

如何从 bash 变量回显/打印特定行

使用 Bash 时需要转义哪些字符？

在 bash 中捕获 SIGINT，处理和忽略

bash awk 系统调用

相关推荐

最近更新

标签