Java Mapreduce Hadoop 作业异常输出目录已存在

Question

提问by msadri

I'm running a mapreduce job with the following run code and it keeps giving me the following exception. I made sure that I remove the folder before starting the job but it doesn't work.

我正在使用以下运行代码运行 mapreduce 作业，它不断给我以下异常。我确保在开始工作之前删除了该文件夹，但它不起作用。

The code:

编码：

    JobConf jobConf = new JobConf( getConf(), MPTU.class );
    jobConf.setJobName( "MPTU" );

    AvroJob.setMapperClass( jobConf, MPTUMapper.class );
    AvroJob.setReducerClass( jobConf, MPTUReducer.class );

    long milliSeconds = 1000 * 60 * 60;
    jobConf.setLong( "mapred.task.timeout", milliSeconds );

    Job job = new Job( jobConf );
    job.setJarByClass( MPTU.class );

    String paths = args[0] + "," + args[1];
    FileInputFormat.setInputPaths( job, paths );
    Path outputDir = new Path( args[2] );
    outputDir.getFileSystem( jobConf ).delete( outputDir, true );
    FileOutputFormat.setOutputPath( job, outputDir );

    AvroJob.setInputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.LONG ), Schema.create( Type.STRING ) ) );
    AvroJob.setMapOutputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.STRING ),
                                                             Schema.create( Type.STRING ) ) );
    AvroJob.setOutputSchema( jobConf,
                             Pair.getPairSchema( Schema.create( Type.STRING ), Schema.create( Type.STRING ) ) );

    job.setNumReduceTasks( 400 );
    job.submit();
    JobClient.runJob( jobConf );

The Exception:

例外：

13:31:39,268 ERROR UserGroupInformation:1335 - PriviledgedActionException as:msadri (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
    at org.apache.hadoop.mapred.JobClient.run(JobClient.java:937)
    at org.apache.hadoop.mapred.JobClient.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
    at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.run(MatchProfileTwitterUserHandler.java:58)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.main(MatchProfileTwitterUserHandler.java:81)

Answer 1

采纳答案by Srini

Correct me if my understanding is wrong.. In the above code, you are referring to "/Users/msadri/Documents/.....", in local file system isn't it.? it seems like fs.defaultFS in core-site.xml is pointing to file:/// instead of hdfs address for your cluster.

如果我的理解有误，请纠正我.. 在上面的代码中，您指的是“/Users/msadri/Documents/.....”，在本地文件系统中不是。？core-site.xml 中的 fs.defaultFS 似乎指向 file:/// 而不是集群的 hdfs 地址。

1) If you needed to point to Local file system as per your requirement, then try this.

1）如果您需要根据您的要求指向本地文件系统，请尝试此操作。

FileSystem.getLocal(conf).delete(outputDir, true);

2) If it is expected to point hdfs then Please check core-site.xml and in that, fs.defaultFS has to point to hdfs://<nameNode>:<port>/then try it once.. (Error message saying that you are pointing to local file system. if it is pointing to hdfs, it would say "Output directory hdfs://<nameNode>:<port>/Users/msadri/...already exists"

2) 如果它是指向 hdfs 那么请检查 core-site.xml 并且 fs.defaultFS 必须指向hdfs://<nameNode>:<port>/然后尝试一次..（错误消息说你指向本地文件系统。如果它是指向hdfs，它会说“输出目录hdfs://<nameNode>:<port>/Users/msadri/...已经存在”

Rule this out if its not necessary. Please let me know your response..

如果没有必要，排除这个。请让我知道你的回应..

Answer 2

回答by sethi

Can you try as

你可以试试

 outputDir.getFileSystem( jobConf ).delete( outputDir, true );

//to

FileSystem fs = FileSystem.get(jobConf);
fs.delete(outputDir, true);

Answer 3

回答by Unmesha SreeVeni

You can try this too

你也可以试试这个

Deletes output folder if already exist.

如果已经存在，则删除输出文件夹。

Answer 4

回答by Rahul Wagh

You are getting above exception because your output directory (/Users/msadri/Documents/files/linkage_output)is already created/existing in the HDFS file system

您遇到上述异常是因为您的输出目录(/Users/msadri/Documents/files/linkage_output)已在 HDFS 文件系统中创建/存在

Just remember while running map reduce job do mention the output directory which is already their in HDFS. Please refer to the following instruction which would help you to resolve this exception

请记住，在运行 map reduce 作业时，请务必提及 HDFS 中已经存在的输出目录。请参阅以下说明以帮助您解决此异常

To run a map reduce job you have to write a command similar to below command

要运行 map reduce 作业，您必须编写类似于以下命令的命令

$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example:-hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

示例：-hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

只需注意 {output_directory_path} 即 /home/facebook/crawler-output 。如果您已经在 HDFS 中创建了此目录结构，那么 Hadoop EcoSystem 将抛出异常“org.apache.hadoop.mapred.FileAlreadyExistsException”。

Solution:-Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation). As mentioned in the above example the same command can be run in following manner -

解决方案：-始终在运行时指定输出目录名称（即 Hadoop 会自动为您创建目录。您无需担心输出目录的创建）。正如上面例子中提到的，同样的命令可以通过以下方式运行 -

"hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

“hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1”

So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

所以输出目录 {crawler-output-1} 将在运行时由 Hadoop 生态系统创建。

For more details you can refer to : - https://jhooq.com/hadoop-file-already-exists-exception/

有关更多详细信息，您可以参考： - https://jhooq.com/hadoop-file-already-exists-exception/

Java Mapreduce Hadoop 作业异常输出目录已存在

提问by msadri

采纳答案by Srini

回答by sethi

回答by Unmesha SreeVeni

回答by Rahul Wagh

相关推荐

最近更新

标签

Java Mapreduce Hadoop 作业异常输出目录已存在

提问by msadri

采纳答案by Srini

回答by sethi

回答by Unmesha SreeVeni

回答by Rahul Wagh

相关推荐

Java 使用两个（或更多）对象作为 HashMap 键

Java 中的字符串和字符数组

Java LinkedHashMap 中的 entrySet() 是否也保证顺序？

Java 如何使用底部导航活动更改片段？

相关推荐

最近更新

标签