java 使用JAVA列出HDFS的文件夹和文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33807394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
List folder and files of HDFS using JAVA
提问by Ajay
I am trying to list all the directory and files in the HDFS using JAVA.
我正在尝试使用 JAVA 列出 HDFS 中的所有目录和文件。
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
System.out.println(status.getPath().toString());
}
My code able to generate fs object but got stuck on line number 3, here it try to read the folder and files of files. I am using AWS.
我的代码能够生成 fs 对象,但卡在第 3 行,在这里它尝试读取文件夹和文件文件。我正在使用 AWS。
Please help me to resolve the issue.
请帮我解决这个问题。
回答by Kishore
this is working for me..
这对我有用..
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
for(FileStatus status : fileStatus){
System.out.println(status.getPath().toString());
}
}
output
输出
hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase
it think you are giving incorrect uri. try to do according the code.
它认为您提供的 uri 不正确。尝试按照代码做。
if conf is not set then you have to add resource file
如果未设置 conf 则必须添加资源文件
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));
回答by CavaJ
Check the following method that get list of files using either recursive or non-recursive approach. For getting list of directories you can change the code in such a way that it will add directory paths to resulting list rather than files. Please check fs.isDirectory()
if-else clauses in the code for extracting paths of directories. FileStatus
class also has isDirectory(
) method to check whether the FileStatus
instance refers to a directory.
检查以下使用递归或非递归方法获取文件列表的方法。要获取目录列表,您可以更改代码,以便将目录路径添加到结果列表而不是文件中。请检查fs.isDirectory()
提取目录路径的代码中的 if-else 子句。FileStatus
类也有isDirectory(
) 方法来检查FileStatus
实例是否引用目录。
//helper method to get the list of files from the HDFS path
public static List<String>
listFilesFromHDFSPath(Configuration hadoopConfiguration,
String hdfsPath,
boolean recursive) throws IOException,
IllegalArgumentException
{
//resulting list of files
List<String> filePaths = new ArrayList<String>();
//get path from string and then the filesystem
Path path = new Path(hdfsPath); //throws IllegalArgumentException
FileSystem fs = path.getFileSystem(hadoopConfiguration);
//if recursive approach is requested
if(recursive)
{
//(heap issues with recursive approach) => using a queue
Queue<Path> fileQueue = new LinkedList<Path>();
//add the obtained path to the queue
fileQueue.add(path);
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
{
//get the file path from queue
Path filePath = fileQueue.remove();
//filePath refers to a file
if (fs.isFile(filePath))
{
filePaths.add(filePath.toString());
}
else //else filePath refers to a directory
{
//list paths in the directory and add to the queue
FileStatus[] fileStatuses = fs.listStatus(filePath);
for (FileStatus fileStatus : fileStatuses)
{
fileQueue.add(fileStatus.getPath());
} // for
} // else
} // while
} // if
else //non-recursive approach => no heap overhead
{
//if the given hdfsPath is actually directory
if(fs.isDirectory(path))
{
FileStatus[] fileStatuses = fs.listStatus(path);
//loop all file statuses
for(FileStatus fileStatus : fileStatuses)
{
//if the given status is a file, then update the resulting list
if(fileStatus.isFile())
filePaths.add(fileStatus.getPath().toString());
} // for
} // if
else //it is a file then
{
//return the one and only file path to the resulting list
filePaths.add(path.toString());
} // else
} // else
//close filesystem; no more operations
fs.close();
//return the resulting list
return filePaths;
} // listFilesFromHDFSPath