java Hadoop：如何将reducer 输出合并到一个文件中？

Question

提问by thomaslee

I know that "getmerge" command in shell can do this work.

我知道 shell 中的“getmerge”命令可以完成这项工作。

But what should I do if I want to merge these outputs after the job by HDFS API for java？

但是如果我想在工作后通过 HDFS API for java 合并这些输出怎么办？

What i actually want is a single merged file on HDFS.

我真正想要的是 HDFS 上的单个合并文件。

The only thing i can think of is to start an additional job after that.

我唯一能想到的就是在那之后开始一份额外的工作。

thanks!

谢谢！

Answer 1

采纳答案by VoiceOfUnreason

But what should I do if I want to merge these outputs after the job by HDFS API for java?

但是，如果我想通过 HDFS API for java 在作业完成后合并这些输出，我该怎么办？

Guessing, because I haven't tried this myself, but I think the method you are looking for is FileUtil.copyMerge, which is the method that FsShell invokes when you run the -getmergecommand. FileUtil.copyMergetakes two FileSystem objects as arguments - FsShell uses FileSystem.getLocal to retrieve the destination FileSystem, but I don't see any reason you couldn't instead use Path.getFileSystem on the destination to obtain an OutputStream

猜测，因为我没有尝试这样做我自己，但我认为你正在寻找的方法是FileUtil.copyMerge，这是方法，当你运行FsShell调用-getmerge命令。 FileUtil.copyMerge将两个 FileSystem 对象作为参数 - FsShell 使用 FileSystem.getLocal 来检索目标文件系统，但我没有看到您不能在目标上使用 Path.getFileSystem 来获取 OutputStream 的任何理由

That said, I don't think it wins you very much -- the merge is still happening in the local JVM; so you aren't really saving very much over -getmergefollowed by -put.

话虽如此，我认为它不会让您受益匪浅——合并仍在本地 JVM 中进行；所以你并没有真正节省很多，-getmerge然后是-put.

Answer 2

回答by saurabh shashank

You get a single Out-put File by Setting a single Reducer in your code .

您可以通过在代码中设置单个 Reducer 来获得单个输出文件。

Job.setNumberOfReducer(1);

Will work for your requirement , but costly

将满足您的要求，但成本高

OR

或者

Static method to execute a shell command. 
Covers most of the simple cases without requiring the user to implement the Shell interface.

Parameters:
env the map of environment key=value
cmd shell command to execute.
Returns:
the output of the executed command.

org.apache.hadoop.util.Shell.execCommand(String[])

java Hadoop：如何将reducer 输出合并到一个文件中？

提问by thomaslee

采纳答案by VoiceOfUnreason

回答by saurabh shashank

相关推荐

最近更新

标签

java Hadoop：如何将reducer 输出合并到一个文件中？

提问by thomaslee

采纳答案by VoiceOfUnreason

回答by saurabh shashank

相关推荐

java redhat linux中VM初始化时出错

java Apache Cxf HTTP 身份验证

java 在多服务器环境中管理会话

java WebServiceTransportException：未找到 [404]

相关推荐

最近更新

标签