Java Mapreduce Hadoop 作业异常输出目录已存在
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18344554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mapreduce Hadoop job exception Output directory already exists
提问by msadri
I'm running a mapreduce job with the following run code and it keeps giving me the following exception. I made sure that I remove the folder before starting the job but it doesn't work.
我正在使用以下运行代码运行 mapreduce 作业,它不断给我以下异常。我确保在开始工作之前删除了该文件夹,但它不起作用。
The code:
编码:
JobConf jobConf = new JobConf( getConf(), MPTU.class );
jobConf.setJobName( "MPTU" );
AvroJob.setMapperClass( jobConf, MPTUMapper.class );
AvroJob.setReducerClass( jobConf, MPTUReducer.class );
long milliSeconds = 1000 * 60 * 60;
jobConf.setLong( "mapred.task.timeout", milliSeconds );
Job job = new Job( jobConf );
job.setJarByClass( MPTU.class );
String paths = args[0] + "," + args[1];
FileInputFormat.setInputPaths( job, paths );
Path outputDir = new Path( args[2] );
outputDir.getFileSystem( jobConf ).delete( outputDir, true );
FileOutputFormat.setOutputPath( job, outputDir );
AvroJob.setInputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.LONG ), Schema.create( Type.STRING ) ) );
AvroJob.setMapOutputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.STRING ),
Schema.create( Type.STRING ) ) );
AvroJob.setOutputSchema( jobConf,
Pair.getPairSchema( Schema.create( Type.STRING ), Schema.create( Type.STRING ) ) );
job.setNumReduceTasks( 400 );
job.submit();
JobClient.runJob( jobConf );
The Exception:
例外:
13:31:39,268 ERROR UserGroupInformation:1335 - PriviledgedActionException as:msadri (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
at org.apache.hadoop.mapred.JobClient.run(JobClient.java:937)
at org.apache.hadoop.mapred.JobClient.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.run(MatchProfileTwitterUserHandler.java:58)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.main(MatchProfileTwitterUserHandler.java:81)
采纳答案by Srini
Correct me if my understanding is wrong.. In the above code, you are referring to "/Users/msadri/Documents/.....", in local file system isn't it.? it seems like fs.defaultFS in core-site.xml is pointing to file:/// instead of hdfs address for your cluster.
如果我的理解有误,请纠正我.. 在上面的代码中,您指的是“/Users/msadri/Documents/.....”,在本地文件系统中不是。?core-site.xml 中的 fs.defaultFS 似乎指向 file:/// 而不是集群的 hdfs 地址。
1) If you needed to point to Local file system as per your requirement, then try this.
1)如果您需要根据您的要求指向本地文件系统,请尝试此操作。
FileSystem.getLocal(conf).delete(outputDir, true);
2) If it is expected to point hdfs then Please check core-site.xml and in that, fs.defaultFS has to point to hdfs://<nameNode>:<port>/
then try it once..
(Error message saying that you are pointing to local file system. if it is pointing to hdfs, it would say "Output directory hdfs://<nameNode>:<port>/Users/msadri/...
already exists"
2) 如果它是指向 hdfs 那么请检查 core-site.xml 并且 fs.defaultFS 必须指向hdfs://<nameNode>:<port>/
然后尝试一次..(错误消息说你指向本地文件系统。如果它是指向hdfs,它会说“输出目录hdfs://<nameNode>:<port>/Users/msadri/...
已经存在”
Rule this out if its not necessary. Please let me know your response..
如果没有必要,排除这个。请让我知道你的回应..
回答by sethi
Can you try as
你可以试试
outputDir.getFileSystem( jobConf ).delete( outputDir, true );
//to
FileSystem fs = FileSystem.get(jobConf);
fs.delete(outputDir, true);
回答by Unmesha SreeVeni
回答by Rahul Wagh
You are getting above exception because your output directory (/Users/msadri/Documents/files/linkage_output)is already created/existing in the HDFS file system
您遇到上述异常是因为您的输出目录(/Users/msadri/Documents/files/linkage_output)已在 HDFS 文件系统中创建/存在
Just remember while running map reduce job do mention the output directory which is already their in HDFS. Please refer to the following instruction which would help you to resolve this exception
请记住,在运行 map reduce 作业时,请务必提及 HDFS 中已经存在的输出目录。请参阅以下说明以帮助您解决此异常
To run a map reduce job you have to write a command similar to below command
要运行 map reduce 作业,您必须编写类似于以下命令的命令
$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}
$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}
Example:-hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output
示例:-hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output
Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".
只需注意 {output_directory_path} 即 /home/facebook/crawler-output 。如果您已经在 HDFS 中创建了此目录结构,那么 Hadoop EcoSystem 将抛出异常“org.apache.hadoop.mapred.FileAlreadyExistsException”。
Solution:-Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation). As mentioned in the above example the same command can be run in following manner -
解决方案:-始终在运行时指定输出目录名称(即 Hadoop 会自动为您创建目录。您无需担心输出目录的创建)。正如上面例子中提到的,同样的命令可以通过以下方式运行 -
"hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"
“hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1”
So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.
所以输出目录 {crawler-output-1} 将在运行时由 Hadoop 生态系统创建。
For more details you can refer to : - https://jhooq.com/hadoop-file-already-exists-exception/
有关更多详细信息,您可以参考: - https://jhooq.com/hadoop-file-already-exists-exception/