bash 如何查找相同大小的文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7541616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 00:52:20  来源:igfitidea点击:

How to find files with same size?

linuxbashawk

提问by Sandra Schlichting

I have a file structure like so

我有一个像这样的文件结构

a/file1
a/file2
a/file3
a/...
b/file1
b/file2
b/file3
b/...
...

where within each dir, some files have the same file size, and I would like to delete those.

在每个目录中,有些文件具有相同的文件大小,我想删除它们。

I guess if the problem could be solved for one dir e.g. dir a, then I could wrap a for-loop around it?

我想如果问题可以通过一个目录解决,例如 dir a,那么我可以围绕它包装一个 for 循环吗?

for f in *; do
???
done

But how do I find files with same size?

但是如何找到相同大小的文件呢?

回答by Kent

 ls -l|grep '^-'|awk '{if(a[]){ a[]=a[]"\n"$NF; b[]++;} else a[]=$NF} END{for(x in b)print a[x];}'

this will only check files, no directories.

这只会检查文件,不检查目录。

$5 is the size of ls command

$5 是 ls 命令的大小

test:

测试:

kent@ArchT60:/tmp/t$ ls -l
total 16
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{if(a[]){ a[]=a[]"\n"$NF; b[]++;} else a[]=$NF} END{for(x in b)print a[x];}'
a
b
c
kent@ArchT60:/tmp/t$ 

update based on Micha? ?rajer 's comment:

基于 Micha 的更新?? rajer 的评论

Now filenames with spaces are also supported

现在也支持带空格的文件名

command:

命令:

 ls -l|grep '^-'|awk '{ f=""; if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=; 
        if(a[]){ a[]=a[]"\n"f; b[]++;} else a[]=f}END{for(x in b)print a[x];}'

test:

测试:

kent@ArchT60:/tmp/t$ l
total 24
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
-rw-r--r-- 1 kent kent  51 Sep 24 22:40 x y

kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{ f=""
        if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=; 
        if(a[]){ a[]=a[]"\n"f; b[]++;} else a[]=f} END{for(x in b)print a[x];}'
a
b
c
x y

kent@ArchT60:/tmp/t$

回答by Micha? ?rajer

Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):

使用“带空格的文件名”的解决方案(基于 Kent (+1) 和 awiebe (+1) 帖子):

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print ; else a[]=1}' | xargs echo rm

to make it remove duplicates, remove echofrom xargs.

要删除重复项,请echo从 xargs 中删除。

回答by awiebe

Here is code if you need the size of a file:

如果您需要文件的大小,这里是代码:

FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

Then use a for loop to get the first item in your structure, Store the size of that file in a variable.

然后使用 for 循环获取结构中的第一项,将该文件的大小存储在变量中。

Nest a for loop in that for loop to each item in your structure(excluding the current item) to the current item.

将 for 循环中的 for 循环嵌套到结构中的每个项目(不包括当前项目)到当前项目。

Route all the names of identical files into a text file to ensure you have written you script correctly(insteed of executing rm immediately) .

将相同文件的所有名称路由到一个文本文件中,以确保您正确编写了脚本(而不是立即执行 rm)。

Execute rm on the contents of this file.

对该文件的内容执行 rm 。

回答by destenson

Based on the accepted answer, the following provides a list of all the files of the same size in the current directory (so you can choose which one to keep), sorted by size:

根据接受的答案,以下提供了当前目录中所有相同大小的文件的列表(因此您可以选择要保留的文件),按大小排序:

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print a[]"\n"; else a[]=}' | sort -u | tr '\n' '
for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print a[]"\n"; else a[]=}' | sort -u | tr '\n' '
find -not -empty -type f -printf "%s\n" | 
sort -rn | uniq -d | 
xargs -I{} -n1 find -type f -size {}c -print0 | 
xargs -0 du | sort
' | xargs -0 -n1 shasum
' | xargs -0 ls -lS

To determine if the files are actually the same, not just the contain the same number of bytes, do an shasumor md5sumon each file:

要确定文件是否实际上相同,而不仅仅是包含相同数量的字节,请对每个文件执行shasummd5sum

##代码##

回答by Nick Olszanski

Plain bash solution

普通 bash 解决方案

##代码##

回答by w00t

Looks like what you really want is a duplicate file finder?

看起来您真正想要的是重复文件查找器

回答by nick

It sounds like this has been answered several times and in several different ways, so I may be beating a dead horse but here goes...

听起来这已经以几种不同的方式多次回答,所以我可能正在击败一匹死马,但这里是......

find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;

find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;

find is an awesome command and I highly recommend reading its manpage.

find 是一个很棒的命令,我强烈建议阅读它的联机帮助页。