Java HDFS 目录中的文件数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20381422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 01:31:52  来源:igfitidea点击:

File count in an HDFS directory

javahadoophdfs

提问by user1125953

In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I couldn't figure out how to count files in a directory and get file names like an ordinary directory.

在 Java 代码中,我想连接到 HDFS 中的一个目录,了解该目录中的文件数量,获取它们的名称并想要读取它们。我已经可以读取文件了,但我不知道如何计算目录中的文件数并像普通目录一样获取文件名。

In order to read I use DFSClient and open files into InputStream.

为了阅读,我使用 DFSClient 并将文件打开到 InputStream 中。

回答by user2486495

count

数数

Usage: hadoop fs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.

计算路径下与指定文件模式匹配的目录、文件和字节数。输出列是:DIR_COUNT、FILE_COUNT、CONTENT_SIZE FILE_NAME。

The output columns with -q are:QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.

带有 -q 的输出列是:QUOTA、REMAINING_QUATA、SPACE_QUOTA、REMAINING_SPACE_QUOTA、DIR_COUNT、FILE_COUNT、CONTENT_SIZE、FILE_NAME。

Example:

例子:

hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hadoop fs -count -q hdfs://nn1.example.com/file1

Exit Code:

退出代码:

Returns 0 on success and -1 on error.

成功时返回 0,错误时返回 -1。

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

您可以只使用 FileSystem 并遍历路径中的文件。这是一些示例代码

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

回答by user1125953

FileSystem fs = FileSystem.get(conf);
Path pt = new Path("/path");
ContentSummary cs = fs.getContentSummary(pt);
long fileCount = cs.getFileCount();

回答by Akarsh

On command line, you can do it as below.

在命令行上,您可以按如下方式进行。

 hdfs dfs -ls $parentdirectory | awk '{system("hdfs dfs -count " ) }'

回答by Eric

To do a quick and simple count, you can also try the following one-liner:

要进行快速简单的计数,您还可以尝试以下单行

hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l

Quick explanation:

快速解释

grep -E '^-'or egrep '^-': Grep all files: Files start with '-' whereas folders start with 'd';

grep -E '^-'egrep '^-': Grep 所有文件:文件以“-”开头,而文件夹以“d”开头;

wc -l: line count.

wc -l: 行数。

回答by Suraj Nagare

hadoop fs -du [-s] [-h] [-x] URI [URI ...]

hadoop fs -du [-s] [-h] [-x] URI [URI ...]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

显示给定目录中包含的文件和目录的大小或文件的长度(如果它只是一个文件)。

Options:

选项:

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path.
The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)
The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.