Linux 计算目录大小的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4307692/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 00:10:41  来源:igfitidea点击:

Fastest Way To Calculate Directory Sizes

linuxfile-iofilesystems

提问by Justin

what is the best and fastestway to calculate directory sizes? For example we will have the following structure:

计算目录大小的最佳和最快方法是什么?例如,我们将具有以下结构:

/users
      /a
      /b
      /c
      /...

We need the output to be per user directory:

我们需要每个用户目录的输出:

a = 1224KB
b = 3533KB
c = 3324KB
...

We plan on having tens maybe even hundred of thousands of directories under /users. The following shell command works:

我们计划在 /users 下拥有数十甚至数十万个目录。以下 shell 命令有效:

du -cms /users/a | grep total | awk '{print }'

But, we will have to call it N number of times. The entire point, is that the output; each users directory size will be stored in our database. Also, we would love to have it update as frequently as possible, but without blocking all the resources on the server. Is it even possible to have it calculate users directory size every minute? How about every 5 minutes?

但是,我们将不得不调用它 N 次。重点是输出;每个用户目录大小将存储在我们的数据库中。此外,我们希望尽可能频繁地更新它,但不会阻塞服务器上的所有资源。甚至可以让它每分钟计算用户目录大小吗?每 5 分钟一次怎么样?

Now that I am thinking about it some more, would it make sense to use node.js? That way, we can calculate directory sizes, and even insert into the database all in one transaction. We could do that as well in PHP and Python, but not sure it is as fast.

现在我正在考虑更多,使用 node.js 有意义吗?这样,我们可以计算目录大小,甚至可以在一个事务中全部插入到数据库中。我们也可以在 PHP 和 Python 中做到这一点,但不确定它是否快。

Thanks.

谢谢。

回答by Roland Illig

What do you need this information for? If it's only for reminding the users that their home directories are too big, you should add quotalimits to the filesystem. You can set the quota to 1000 GB if you just want the numbers without really limiting disk usage.

你需要这些信息做什么?如果只是为了提醒用户他们的主目录太大,您应该为文件系统添加配额限制。如果您只想要数字而不真正限制磁盘使用,您可以将配额设置为 1000 GB。

The numbers are usually accurate whenever you access anything on the disk. The only downside is that they tell you how large the files are that are ownedby a particular user, instead of how large the files below his home directoryare. But maybe you can live with that.

每当您访问磁盘上的任何内容时,这些数字通常都是准确的。唯一的缺点是它们会告诉您特定用户拥有的文件有多大,而不是他的主目录下的文件有多大。但也许你可以忍受。

回答by HaskellElephant

I think what you are looking for is:

我认为你正在寻找的是:

du -cm --max-depth=1 /users | awk '{user = substr(,7,300);
>                                   ans = user ": " ;
>                                   print ans}'

The magic numbers 7 is taking away the substring /users/, and 300 is just an arbitrary big number (awk is not one of my best languages =D, but I am guessing that part is not going to be written in awk anyways.) It's faster since you don't involve greping for the total and the loop is contained inside du. I bet it can be done faster, but this should be fast enough.

神奇的数字 7 带走了子字符串 /users/,而 300 只是一个任意的大数字(awk 不是我最好的语言之一 =D,但我猜这部分无论如何都不会用 awk 编写。)它更快,因为您不涉及总数的 grep 并且循环包含在 du 中。我敢打赌它可以做得更快,但这应该足够快。

回答by caf

Why not just:

为什么不只是:

du -sm /users/*

(The slowest part is still likely to be dutraversing the filesystem to calculate the size, though).

(不过,最慢的部分仍然可能是du遍历文件系统来计算大小)。

回答by morpheus

not that slow but will show you folders size: du -sh /* > total.size.files.txt

不是那么慢,但会显示文件夹大小: du -sh /* > total.size.files.txt