java 使用JAVA列出HDFS的文件夹和文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33807394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 22:08:58  来源:igfitidea点击:

List folder and files of HDFS using JAVA

javahadoopmapreducehdfs

提问by Ajay

I am trying to list all the directory and files in the HDFS using JAVA.

我正在尝试使用 JAVA 列出 HDFS 中的所有目录和文件。

Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
    System.out.println(status.getPath().toString());
}

My code able to generate fs object but got stuck on line number 3, here it try to read the folder and files of files. I am using AWS.

我的代码能够生成 fs 对象,但卡在第 3 行,在这里它尝试读取文件夹和文件文件。我正在使用 AWS。

Please help me to resolve the issue.

请帮我解决这个问题。

回答by Kishore

this is working for me..

这对我有用..

public static void main(String[] args) throws IOException, URISyntaxException {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
    FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
    for(FileStatus status : fileStatus){
        System.out.println(status.getPath().toString());
    }
}

output

输出

hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase

it think you are giving incorrect uri. try to do according the code.

它认为您提供的 uri 不正确。尝试按照代码做。

if conf is not set then you have to add resource file

如果未设置 conf 则必须添加资源文件

conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));

回答by CavaJ

Check the following method that get list of files using either recursive or non-recursive approach. For getting list of directories you can change the code in such a way that it will add directory paths to resulting list rather than files. Please check fs.isDirectory()if-else clauses in the code for extracting paths of directories. FileStatusclass also has isDirectory() method to check whether the FileStatusinstance refers to a directory.

检查以下使用递归或非递归方法获取文件列表的方法。要获取目录列表,您可以更改代码,以便将目录路径添加到结果列表而不是文件中。请检查fs.isDirectory()提取目录路径的代码中的 if-else 子句。FileStatus类也有isDirectory() 方法来检查FileStatus实例是否引用目录。

    //helper method to get the list of files from the HDFS path
    public static List<String> 
        listFilesFromHDFSPath(Configuration hadoopConfiguration,
                              String hdfsPath,
                              boolean recursive) throws IOException, 
                                            IllegalArgumentException
    {
        //resulting list of files
        List<String> filePaths = new ArrayList<String>();

        //get path from string and then the filesystem
        Path path = new Path(hdfsPath);  //throws IllegalArgumentException
        FileSystem fs = path.getFileSystem(hadoopConfiguration);

        //if recursive approach is requested
        if(recursive)
        {
            //(heap issues with recursive approach) => using a queue
            Queue<Path> fileQueue = new LinkedList<Path>();

            //add the obtained path to the queue
            fileQueue.add(path);

            //while the fileQueue is not empty
            while (!fileQueue.isEmpty())
            {
                //get the file path from queue
                Path filePath = fileQueue.remove();

                //filePath refers to a file
                if (fs.isFile(filePath))
                {
                    filePaths.add(filePath.toString());
                }
                else   //else filePath refers to a directory
                {
                    //list paths in the directory and add to the queue
                    FileStatus[] fileStatuses = fs.listStatus(filePath);
                    for (FileStatus fileStatus : fileStatuses)
                    {
                        fileQueue.add(fileStatus.getPath());
                    } // for
                } // else

            } // while

        } // if
        else        //non-recursive approach => no heap overhead
        {
            //if the given hdfsPath is actually directory
            if(fs.isDirectory(path))
            {
                FileStatus[] fileStatuses = fs.listStatus(path);

                //loop all file statuses
                for(FileStatus fileStatus : fileStatuses)
                {
                    //if the given status is a file, then update the resulting list
                    if(fileStatus.isFile())
                        filePaths.add(fileStatus.getPath().toString());
                } // for
            } // if
            else        //it is a file then
            {
                //return the one and only file path to the resulting list
                filePaths.add(path.toString());
            } // else

        } // else

        //close filesystem; no more operations
        fs.close();

        //return the resulting list
        return filePaths;
    } // listFilesFromHDFSPath