Java HDFS 目录中的文件数

Question

提问by user1125953

In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I couldn't figure out how to count files in a directory and get file names like an ordinary directory.

在 Java 代码中，我想连接到 HDFS 中的一个目录，了解该目录中的文件数量，获取它们的名称并想要读取它们。我已经可以读取文件了，但我不知道如何计算目录中的文件数并像普通目录一样获取文件名。

In order to read I use DFSClient and open files into InputStream.

为了阅读，我使用 DFSClient 并将文件打开到 InputStream 中。

Answer 1

回答by user2486495

count

数数

Usage: hadoop fs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.

计算路径下与指定文件模式匹配的目录、文件和字节数。输出列是：DIR_COUNT、FILE_COUNT、CONTENT_SIZE FILE_NAME。

The output columns with -q are:QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.

带有 -q 的输出列是：QUOTA、REMAINING_QUATA、SPACE_QUOTA、REMAINING_SPACE_QUOTA、DIR_COUNT、FILE_COUNT、CONTENT_SIZE、FILE_NAME。

Example:

例子：

hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hadoop fs -count -q hdfs://nn1.example.com/file1

Exit Code:

退出代码：

Returns 0 on success and -1 on error.

成功时返回 0，错误时返回 -1。

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

您可以只使用 FileSystem 并遍历路径中的文件。这是一些示例代码

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

Answer 2

回答by user1125953

FileSystem fs = FileSystem.get(conf);
Path pt = new Path("/path");
ContentSummary cs = fs.getContentSummary(pt);
long fileCount = cs.getFileCount();

Answer 3

回答by Akarsh

On command line, you can do it as below.

在命令行上，您可以按如下方式进行。

 hdfs dfs -ls $parentdirectory | awk '{system("hdfs dfs -count " ) }'

Answer 4

回答by Eric

To do a quick and simple count, you can also try the following one-liner:

要进行快速简单的计数，您还可以尝试以下单行：

hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l

Quick explanation:

快速解释：

grep -E '^-'or egrep '^-': Grep all files: Files start with '-' whereas folders start with 'd';

grep -E '^-'或egrep '^-': Grep 所有文件：文件以“-”开头，而文件夹以“d”开头；

wc -l: line count.

wc -l: 行数。

Answer 5

回答by Suraj Nagare

hadoop fs -du [-s] [-h] [-x] URI [URI ...]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

显示给定目录中包含的文件和目录的大小或文件的长度（如果它只是一个文件）。

Options:

选项：

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path.
The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)
The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

Java HDFS 目录中的文件数

提问by user1125953

回答by user2486495

回答by user1125953

回答by Akarsh

回答by Eric

回答by Suraj Nagare

相关推荐

最近更新

标签

Java HDFS 目录中的文件数

提问by user1125953

回答by user2486495

回答by user1125953

回答by Akarsh

回答by Eric

回答by Suraj Nagare

相关推荐

Java 将包名转换为路径

Java 如何将浮点数转换为整数：android

从java类调用控制器方法

如何获取Java进程中的线程数

相关推荐

最近更新

标签