从java中删除hdfs文件夹

Question

提问by Juh_

In a java app running on an edge node, I need to delete a hdfs folder, if it exists. I need to do that before running a mapreduce job (with spark) that output in the folder.

在边缘节点上运行的 java 应用程序中，我需要删除 hdfs 文件夹（如果存在）。我需要在运行在文件夹中输出的 mapreduce 作业（带 spark）之前执行此操作。

I found I could use the method

我发现我可以使用该方法

org.apache.hadoop.fs.FileUtil.fullyDelete(new File(url))

However, I can only make it work with local folder (i.e. file url on the running computer). I tried to use something like:

但是，我只能使其与本地文件夹（即正在运行的计算机上的文件 url）一起使用。我尝试使用类似的东西：

url = "hdfs://hdfshost:port/the/folder/to/delete";

with hdfs://hdfshost:portbeing the hdfs namenode IPC. I use it for the mapreduce, so it is correct. However it doesn't do anything.

与hdfs://hdfshost:port作为HDFS的NameNode IPC。我将它用于 mapreduce，所以它是正确的。然而它什么也不做。

So, what url should I use, or is there another method?

那么，我应该使用什么网址，还是有另一种方法？

Note: hereis the simple project in question.

注意：这是有问题的简单项目。

Answer 1

采纳答案by Tucker

I do it this way:

我这样做：

    Configuration conf = new Configuration();
    conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
    FileSystem  hdfs = FileSystem.get(URI.create("hdfs://<namenode-hostname>:<port>"), conf);
    hdfs.delete("/path/to/your/file", isRecursive);

you don't need hdfs://hdfshost:port/in your file path

你不需要hdfs://hdfshost:port/在你的文件路径

Answer 2

回答by Jun

This works for me.

这对我有用。

Just add the following codes in my WordCountprogram will do:

只需在我的WordCount程序中添加以下代码即可：

import org.apache.hadoop.fs.*;

...
Configuration conf = new Configuration();

Path output = new Path("/the/folder/to/delete");
FileSystem hdfs = FileSystem.get(URI.create("hdfs://namenode:port"),conf);

// delete existing directory
if (hdfs.exists(output)) {
    hdfs.delete(output, true);
}

Job job = Job.getInstance(conf, "word count");
...

You need to add hdfs://hdfshost:portexplicitly to get distributed file system. Else the code will work for local file system only.

您需要hdfs://hdfshost:port显式添加以获得分布式文件系统。否则，代码仅适用于本地文件系统。

Answer 3

回答by Carlos Noé

if you need to delete all files in the directory:

如果需要删除目录中的所有文件：

1) check how many files are there in your directory.

1）检查您的目录中有多少文件。

2) later delete all of them

2）稍后删除所有这些

     public void delete_archivos_dedirectorio() throws IOException {

//namenode= hdfs://ip + ":" + puerto 

            Path directorio = new Path(namenode + "//test//"); //nos situamos en la ruta//
            FileStatus[] fileStatus = hdfsFileSystem.listStatus(directorio); //listamos los archivos que hay actualmente en ese directorio antes de hacer nada
            int archivos_basura =  fileStatus.length; //vemos cuandoarchivos hay en el directorio antes de hacer nada, y luego iteramos hasta el nuemro de archivos que haya y llos vamos borrando para luego ir crandolos de nuevo en el writte.


            for (int numero = 0; numero <= archivos_basura ; numero++) {

                Path archivo = new Path(namenode + "//test//" + numero + ".txt");

                try {

                    if(hdfsFileSystem.exists(archivo)) {

                        try {
                            hdfsFileSystem.delete(archivo, true);
                        } catch (IOException ex) {
                            System.out.println(ex.getMessage());
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

good luck :)

祝你好运：）

从java中删除hdfs文件夹

提问by Juh_

采纳答案by Tucker

回答by Jun

回答by Carlos Noé

相关推荐

最近更新

标签

从java中删除hdfs文件夹

提问by Juh_

采纳答案by Tucker

回答by Jun

回答by Carlos Noé

相关推荐

Java 错误：org.springframework.web.context.ContextLoader - 上下文初始化失败

Java DTO 与实体的转换，反之亦然

Java 将双精度转换为没有小数位的字符串的最佳方法

Java Wildfly 和自动重新连接到数据库

相关推荐

最近更新

标签