bash 如何查找相同大小的文件？

Question

提问by Sandra Schlichting

I have a file structure like so

我有一个像这样的文件结构

a/file1
a/file2
a/file3
a/...
b/file1
b/file2
b/file3
b/...
...

where within each dir, some files have the same file size, and I would like to delete those.

在每个目录中，有些文件具有相同的文件大小，我想删除它们。

I guess if the problem could be solved for one dir e.g. dir a, then I could wrap a for-loop around it?

我想如果问题可以通过一个目录解决，例如 dir a，那么我可以围绕它包装一个 for 循环吗？

for f in *; do
???
done

But how do I find files with same size?

但是如何找到相同大小的文件呢？

Answer 1

回答by Kent

 ls -l|grep '^-'|awk '{if(a[]){ a[]=a[]"\n"$NF; b[]++;} else a[]=$NF} END{for(x in b)print a[x];}'

this will only check files, no directories.

这只会检查文件，不检查目录。

$5 is the size of ls command

$5 是 ls 命令的大小

test:

测试：

kent@ArchT60:/tmp/t$ ls -l
total 16
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{if(a[]){ a[]=a[]"\n"$NF; b[]++;} else a[]=$NF} END{for(x in b)print a[x];}'
a
b
c
kent@ArchT60:/tmp/t$

update based on Micha? ?rajer 's comment:

基于 Micha 的更新？? rajer 的评论：

Now filenames with spaces are also supported

现在也支持带空格的文件名

command:

命令：

 ls -l|grep '^-'|awk '{ f=""; if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=; 
        if(a[]){ a[]=a[]"\n"f; b[]++;} else a[]=f}END{for(x in b)print a[x];}'

test:

测试：

kent@ArchT60:/tmp/t$ l
total 24
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
-rw-r--r-- 1 kent kent  51 Sep 24 22:40 x y

kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{ f=""
        if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=; 
        if(a[]){ a[]=a[]"\n"f; b[]++;} else a[]=f} END{for(x in b)print a[x];}'
a
b
c
x y

kent@ArchT60:/tmp/t$

Answer 2

回答by Micha? ?rajer

Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):

使用“带空格的文件名”的解决方案（基于 Kent (+1) 和 awiebe (+1) 帖子）：

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print ; else a[]=1}' | xargs echo rm

to make it remove duplicates, remove echofrom xargs.

要删除重复项，请echo从 xargs 中删除。

Answer 3

回答by awiebe

Here is code if you need the size of a file:

如果您需要文件的大小，这里是代码：

FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

Then use a for loop to get the first item in your structure, Store the size of that file in a variable.

然后使用 for 循环获取结构中的第一项，将该文件的大小存储在变量中。

Nest a for loop in that for loop to each item in your structure(excluding the current item) to the current item.

将 for 循环中的 for 循环嵌套到结构中的每个项目（不包括当前项目）到当前项目。

Route all the names of identical files into a text file to ensure you have written you script correctly(insteed of executing rm immediately) .

将相同文件的所有名称路由到一个文本文件中，以确保您正确编写了脚本（而不是立即执行 rm）。

Execute rm on the contents of this file.

对该文件的内容执行 rm 。

Answer 4

回答by destenson

Based on the accepted answer, the following provides a list of all the files of the same size in the current directory (so you can choose which one to keep), sorted by size:

根据接受的答案，以下提供了当前目录中所有相同大小的文件的列表（因此您可以选择要保留的文件），按大小排序：

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print a[]"\n"; else a[]=}' | sort -u | tr '\n' 'for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ( in a)print a[]"\n"; else a[]=}' | sort -u | tr '\n' 'find -not -empty -type f -printf "%s\n" | 
sort -rn | uniq -d | 
xargs -I{} -n1 find -type f -size {}c -print0 | 
xargs -0 du | sort
' | xargs -0 -n1 shasum
' | xargs -0 ls -lS

To determine if the files are actually the same, not just the contain the same number of bytes, do an shasumor md5sumon each file:

要确定文件是否实际上相同，而不仅仅是包含相同数量的字节，请对每个文件执行shasum或md5sum：

##代码##

Answer 5

回答by Nick Olszanski

Plain bash solution

普通 bash 解决方案

##代码##

Answer 6

回答by w00t

Looks like what you really want is a duplicate file finder?

看起来您真正想要的是重复文件查找器？

Answer 7

回答by nick

It sounds like this has been answered several times and in several different ways, so I may be beating a dead horse but here goes...

听起来这已经以几种不同的方式多次回答，所以我可能正在击败一匹死马，但这里是......

find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;

find is an awesome command and I highly recommend reading its manpage.

find 是一个很棒的命令，我强烈建议阅读它的联机帮助页。

bash 如何查找相同大小的文件？

提问by Sandra Schlichting

回答by Kent

回答by Micha? ?rajer

回答by awiebe

回答by destenson

回答by Nick Olszanski

回答by w00t

回答by nick

相关推荐

最近更新

标签

bash 如何查找相同大小的文件？

提问by Sandra Schlichting

回答by Kent

回答by Micha? ?rajer

回答by awiebe

回答by destenson

回答by Nick Olszanski

回答by w00t

回答by nick

相关推荐

每当正文包含 ssh 时，Bash while 循环仅迭代一次

bash：调试选项和功能

bash 如何从多个文件中删除特定字符串之前的所有行

bash 如何在bash命令中传递sqlplus中的变量

相关推荐

最近更新

标签