windows 如何拆分一个巨大的文件夹?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4766047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a huge folder?
提问by Kai Wang
We have a folder on Windows that's ... huge. I ran "dir > list.txt". The command lost response after 1.5 hours. The output file is about 200 MB. It shows there're at least 2.8 million files. I know the situation is stupid but let's focus the problem itself. If I have such a folder, how can I split it to some "manageable" sub-folders? Surprisingly all the solutions I have come up with all involve getting all the files in the folder at some point, which is a no-no in my case. Any suggestions?
我们在 Windows 上有一个......巨大的文件夹。我跑了“目录> list.txt”。该命令在 1.5 小时后失去响应。输出文件大约为 200 MB。它显示至少有 280 万个文件。我知道情况很愚蠢,但让我们关注问题本身。如果我有这样的文件夹,如何将其拆分为一些“可管理”的子文件夹?令人惊讶的是,我提出的所有解决方案都涉及在某个时候获取文件夹中的所有文件,这对我来说是禁忌。有什么建议?
Thank Keith Hill and Mehrdad. I accepted Keith's answer because that's exactly what I wanted to do but I couldn't quite get PS working quickly.
感谢 Keith Hill 和 Mehrdad。我接受了基思的回答,因为这正是我想做的,但我无法让 PS 快速工作。
With Mehrdad's tip, I wrote this little program. It took 7+ hours to move 2.8 million files. So the initial dir command did finish. But somehow it didn't return to console.
在 Mehrdad 的提示下,我编写了这个小程序。移动 280 万个文件需要 7 个多小时。所以最初的 dir 命令确实完成了。但不知何故它没有返回到控制台。
namespace SplitHugeFolder
{
class Program
{
static void Main(string[] args)
{
var destination = args[1];
if (!Directory.Exists(destination))
Directory.CreateDirectory(destination);
var di = new DirectoryInfo(args[0]);
var batchCount = int.Parse(args[2]);
int currentBatch = 0;
string targetFolder = GetNewSubfolder(destination);
foreach (var fileInfo in di.EnumerateFiles())
{
if (currentBatch == batchCount)
{
Console.WriteLine("New Batch...");
currentBatch = 0;
targetFolder = GetNewSubfolder(destination);
}
var source = fileInfo.FullName;
var target = Path.Combine(targetFolder, fileInfo.Name);
File.Move(source, target);
currentBatch++;
}
}
private static string GetNewSubfolder(string parent)
{
string newFolder;
do
{
newFolder = Path.Combine(parent, Path.GetRandomFileName());
} while (Directory.Exists(newFolder));
Directory.CreateDirectory(newFolder);
return newFolder;
}
}
}
采纳答案by Keith Hill
I use Get-ChildItem to index my whole C: drive every night into c:\filelist.txt. That's about 580,000 files and the resulting file size is ~60MB. Admittedly I'm on Win7 x64 with 8 GB of RAM. That said, you might try something like this:
我每天晚上使用 Get-ChildItem 将我的整个 C: 驱动器索引到 c:\filelist.txt。这大约有 580,000 个文件,生成的文件大小约为 60MB。不可否认,我使用的是带有 8 GB RAM 的 Win7 x64。也就是说,你可以尝试这样的事情:
md c:\newdir
Get-ChildItem C:\hugedir -r |
Foreach -Begin {$i = $j = 0} -Process {
if ($i++ % 100000 -eq 0) {
$dest = "C:\newdir\dir$j"
md $dest
$j++
}
Move-Item $_ $dest
}
The key is to do the move in a streaming manner. That is, don't collect up all the Get-ChildItem results into a single variable and then proceed. That would require all 2.8 million FileInfos to be in memory at once. Also, if you use the Name
parameter on Get-ChildItem it will output a single string containing the file's path relative to the base dir. Even then, perhaps this size will just overwhelm the memory available to you. And no doubt, it will take quite a while to execute. IIRC correctly, my indexing script takes several hours.
关键是以流的方式进行移动。也就是说,不要将所有 Get-ChildItem 结果收集到一个变量中然后继续。这将需要一次将所有 280 万个 FileInfo 存储在内存中。此外,如果您Name
在 Get-ChildItem 上使用该参数,它将输出一个包含文件相对于基本目录的路径的字符串。即便如此,也许这个大小只会压倒您可用的内存。毫无疑问,执行需要相当长的时间。IIRC 正确,我的索引脚本需要几个小时。
If it does work, you should wind up with c:\newdir\dir0
thru dir28
but then again, I haven't tested this script at all so your mileage may vary. BTW this approach assumes that you're huge dir is a pretty flat dir.
如果它确实有效,你应该以c:\newdir\dir0
thru 结束,dir28
但话又说回来,我根本没有测试过这个脚本,所以你的里程可能会有所不同。顺便说一句,这种方法假设你是一个巨大的目录是一个非常平坦的目录。
Update:Using the Name
parameter is almost twice as slow so don't use that parameter.
更新:使用该Name
参数的速度几乎是原来的两倍,所以不要使用该参数。
回答by stej
I found out the GetChildItem
is the slowest option when working with many items in a directory.
我发现GetChildItem
在处理目录中的许多项目时,这是最慢的选项。
Look at the results:
看看结果:
Measure-Command { Get-ChildItem C:\Windows -rec | Out-Null }
TotalSeconds : 77,3730275
Measure-Command { listdir C:\Windows | Out-Null }
TotalSeconds : 20,4077132
measure-command { cmd /c dir c:\windows /s /b | out-null }
TotalSeconds : 13,8357157
(with listdir function defined like this:
(listdir 函数定义如下:
function listdir($dir) {
$dir
[system.io.directory]::GetFiles($dir)
foreach ($d in [system.io.directory]::GetDirectories($dir)) {
listdir $d
}
}
)
)
With this in mind, what I would do: I would stay in PowerShell but use more lowlevel approach with .NET methods:
考虑到这一点,我会做什么:我会留在 PowerShell 中,但在 .NET 方法中使用更底层的方法:
function DoForFirst($directory, $max, $action) {
function go($dir, $options)
{
foreach ($f in [system.io.Directory]::EnumerateFiles($dir))
{
if ($options.Remaining -le 0) { return }
& $action $f
$options.Remaining--
}
foreach ($d in [system.io.directory]::EnumerateDirectories($dir))
{
if ($options.Remaining -le 0) { return }
go $d $options
}
}
go $directory (New-Object PsObject -Property @{Remaining=$max })
}
doForFirst c:\windows 100 {write-host File: $args }
# I use PsObject to avoid global variables and ref parameters.
To use the code you have to switch to .NET 4.0 runtime -- enumerating methods are new in .NET 4.0.
要使用代码,您必须切换到 .NET 4.0 运行时——枚举方法是 .NET 4.0 中的新功能。
You can specify any scriptblock as -action
parameter, so in your case it would be something like {Move-item -literalPath $args -dest c:\dir }
.
您可以指定任何脚本块作为-action
参数,因此在您的情况下,它类似于{Move-item -literalPath $args -dest c:\dir }
.
Just try to list first 1000 items, I hope it will finish very quickly:
尝试列出前 1000 项,我希望它会很快完成:
doForFirst c:\yourdirectory 1000 {write-host '.' -nonew }
And of course you can process all items at once, just use
当然,您可以一次处理所有项目,只需使用
doForFirst c:\yourdirectory ([long]::MaxValue) {move-item ... }
and each item should be processed immediately after it is returned. So the whole list is not read at once and then processed, but it is processed during reading.
每件物品在退回后应立即处理。所以整个列表不是一次读取然后处理,而是在读取过程中处理。
回答by mjolinor
How about starting with this: cmd /c dir /b > list.txt
如何从这个开始: cmd /c dir /b > list.txt
That should get you a list of all the file names.
这应该为您提供所有文件名的列表。
If you're doing "dir > list.txt" from a powershell prompt, get-childitem is aliased as "dir". Get-childitem has known issues enumerating large directories, and the object collections it returns can get huge.
如果您在 powershell 提示符下执行“dir > list.txt”,则 get-childitem 的别名为“dir”。Get-childitem 存在枚举大型目录的已知问题,并且它返回的对象集合可能会变得很大。