从java中删除hdfs文件夹
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28767607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Delete hdfs folder from java
提问by Juh_
In a java app running on an edge node, I need to delete a hdfs folder, if it exists. I need to do that before running a mapreduce job (with spark) that output in the folder.
在边缘节点上运行的 java 应用程序中,我需要删除 hdfs 文件夹(如果存在)。我需要在运行在文件夹中输出的 mapreduce 作业(带 spark)之前执行此操作。
I found I could use the method
我发现我可以使用该方法
org.apache.hadoop.fs.FileUtil.fullyDelete(new File(url))
However, I can only make it work with local folder (i.e. file url on the running computer). I tried to use something like:
但是,我只能使其与本地文件夹(即正在运行的计算机上的文件 url)一起使用。我尝试使用类似的东西:
url = "hdfs://hdfshost:port/the/folder/to/delete";
with hdfs://hdfshost:port
being the hdfs namenode IPC. I use it for the mapreduce, so it is correct.
However it doesn't do anything.
与hdfs://hdfshost:port
作为HDFS的NameNode IPC。我将它用于 mapreduce,所以它是正确的。然而它什么也不做。
So, what url should I use, or is there another method?
那么,我应该使用什么网址,还是有另一种方法?
Note: hereis the simple project in question.
注意:这是有问题的简单项目。
采纳答案by Tucker
I do it this way:
我这样做:
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
FileSystem hdfs = FileSystem.get(URI.create("hdfs://<namenode-hostname>:<port>"), conf);
hdfs.delete("/path/to/your/file", isRecursive);
you don't need hdfs://hdfshost:port/
in your file path
你不需要hdfs://hdfshost:port/
在你的文件路径
回答by Jun
This works for me.
这对我有用。
Just add the following codes in my WordCountprogram will do:
只需在我的WordCount程序中添加以下代码即可:
import org.apache.hadoop.fs.*;
...
Configuration conf = new Configuration();
Path output = new Path("/the/folder/to/delete");
FileSystem hdfs = FileSystem.get(URI.create("hdfs://namenode:port"),conf);
// delete existing directory
if (hdfs.exists(output)) {
hdfs.delete(output, true);
}
Job job = Job.getInstance(conf, "word count");
...
You need to add hdfs://hdfshost:port
explicitly to get distributed file system. Else the code will work for local file system only.
您需要hdfs://hdfshost:port
显式添加以获得分布式文件系统。否则,代码仅适用于本地文件系统。
回答by Carlos Noé
if you need to delete all files in the directory:
如果需要删除目录中的所有文件:
1) check how many files are there in your directory.
1)检查您的目录中有多少文件。
2) later delete all of them
2)稍后删除所有这些
public void delete_archivos_dedirectorio() throws IOException {
//namenode= hdfs://ip + ":" + puerto
Path directorio = new Path(namenode + "//test//"); //nos situamos en la ruta//
FileStatus[] fileStatus = hdfsFileSystem.listStatus(directorio); //listamos los archivos que hay actualmente en ese directorio antes de hacer nada
int archivos_basura = fileStatus.length; //vemos cuandoarchivos hay en el directorio antes de hacer nada, y luego iteramos hasta el nuemro de archivos que haya y llos vamos borrando para luego ir crandolos de nuevo en el writte.
for (int numero = 0; numero <= archivos_basura ; numero++) {
Path archivo = new Path(namenode + "//test//" + numero + ".txt");
try {
if(hdfsFileSystem.exists(archivo)) {
try {
hdfsFileSystem.delete(archivo, true);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
good luck :)
祝你好运 :)