如何使用 Java 类运行 Hadoop?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3606679/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I run Hadoop run with a Java class?
提问by Josh Morrison
I am following the book Hadoop: the definitive Guide.
我正在关注Hadoop:权威指南一书。
I am confused on example 3-1.
我对示例 3-1 感到困惑。
There is a Java source file, URLCat.java.
I use javacto compile it into URLCat.class, then use jarto wrap it into a jar.
有一个 Java 源文件 URLCat.java。我用javac它编译成URLCat.class,然后用jar它包装成一个jar。
The book said to use
书上说要用
% hadoop URLCat hdfs://localhost/user/tom/quangle.txt
to run it. I have tried a lot of different ways, such as
运行它。我尝试了很多不同的方法,例如
% hadoop jar URLCat.jar .......
but didn't work. I got errors like this:
但没有用。我收到这样的错误:
Exception in thread "main" java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt
线程“main”中的异常 java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt
What is the reason for this, and how do I do it right?
这是什么原因,我该怎么做?
采纳答案by khmarbaise
回答by SquareCog
It's quite simple:
这很简单:
[me@myhost ~]$ hadoop jar
RunJar jarFile [mainClass] args...
So, what you want is hadoop jar yourJar.jar your.class.with.Main [any args]
所以,你想要的是 hadoop jar yourJar.jar your.class.with.Main [any args]
回答by Jean Gateau
Of course you could use cat, but that sort of isn't the point (i.e. you're learning, not just trying to get it to work).
当然,您可以使用 cat,但这不是重点(即您正在学习,而不仅仅是试图让它发挥作用)。
As per the book, you need to set your HADOOP_CLASSPATHenvironment variable. In my case, using the build example in the book, all of my classes are at: /media/data/hadefguide/book/build/classes
根据本书,您需要设置HADOOP_CLASSPATH环境变量。就我而言,使用书中的构建示例,我的所有类都位于:/media/data/hadefguide/book/build/classes
Here's an example:
下面是一个例子:
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
Exception in thread "main" java.lang.NoClassDefFoundError: URLCat
Caused by: java.lang.ClassNotFoundException: URLCat
at java.net.URLClassLoader.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: URLCat. Program will exit.
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=/media/data/hadefguide/book/build/classes
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
回答by Punit Sachdev
Not sure how Useful is the answer now. I faced the same issue today in fact working on an example from the same book (Hadoop definitive guide) I was able to execute an example program as follows:
不确定现在的答案有多大用处。我今天遇到了同样的问题,实际上我正在研究同一本书(Hadoop 权威指南)中的一个示例,我能够执行一个示例程序,如下所示:
Write your java code and save it as
.javafileCompile your java program using:
javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>Create a jar file containing your class file:
jar cvf <jar file> <class files to add separated by space>Execute the jar file using
hadoopcommand line:hadoop jar <jar file name> <class name containing your main method> <argument to the main method>e.g.
hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
编写您的 Java 代码并将其另存为
.java文件使用以下命令编译您的 java 程序:
javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>创建一个包含类文件的 jar 文件:
jar cvf <jar file> <class files to add separated by space>使用
hadoop命令行执行 jar 文件:hadoop jar <jar file name> <class name containing your main method> <argument to the main method>例如
hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
Hope it helps
希望能帮助到你
回答by Kishore Bhosale
step 1: Compile Java Program:
第一步:编译Java程序:
javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar
javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar
step 2: Create jar file :
第 2 步:创建 jar 文件:
jar cvf URLCat.jar URLCat.class
jar cvf URLCat.jar URLCat.class
Step 3: Execute program : (mention your hdfs file location)
第 3 步:执行程序:(提及您的 hdfs 文件位置)
hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt
hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt
回答by Eric Na
Go to the directory where your compiled .classfiles are residing.
转到编译.class文件所在的目录。
Use full class name including package name (refer to Receiving "wrong name" NoClassDefFoundError when executing a Java program from the command-linefor full class name or which directory to run the job in) when running hadoop URLCat hdfs://localhost/user/tom/quangle.txt.
使用完整的类名,包括包名(指在命令行执行Java程序时收到“错名” NoClassDefFoundError的全类名或目录运行作业)运行时hadoop URLCat hdfs://localhost/user/tom/quangle.txt。
In my case URLCat.javawas in com.tom.app, so the hadoop command was hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt.
在我的情况URLCat.java是com.tom.app,这样Hadoop的命令是hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt。
回答by kometen
I did this based on help found on this site and the hadoop tutorial.
我是根据在这个站点上找到的帮助和 hadoop 教程来做的。
mkdir urlcat_classes<br>
javac -classpath /usr/lib/hadoop/hadoop-0.20.2-cdh3u1-core.jar -d urlcat_classes URLCat.java<br>
jar -cvf urlcat.jar -C urlcat_classes .<br>
hadoop jar urlcat.jar no.gnome.URLCat
hdfs://localhost/user/claus/sample.txt<br>
<br>
no.gnome is from 'package no.gnome;' in URLCat.java.<br><br>
regards
Claus
问候
克劳斯
回答by Dmytro Molkov
To make the hadoop URLCat command work you need to get the jar (URLCat.jar) to be in your class path. You can put it in lib/ dir of hadoop for that.
要使 hadoop URLCat 命令工作,您需要将 jar (URLCat.jar) 放在您的类路径中。为此,您可以将其放在 hadoop 的 lib/ 目录中。
For the hadoop jar URLCat.jar to run you need to create a jar that will have Main class defined in it, otherwise it thinks that the next argument on the command line is the class name. What you can try is hadoop jar URLCat.jar URLCat hdfs://...
要运行 hadoop jar URLCat.jar,您需要创建一个 jar,其中将定义 Main 类,否则它认为命令行上的下一个参数是类名。你可以尝试的是 hadoop jar URLCat.jar URLCat hdfs://...
回答by Apurv
We can access HDFS through the hdfs api. My understanding of it is that you can use the hdfs api to contact a hadoop cluster running the dfs and fetch data from it.
我们可以通过 hdfs api 访问 HDFS。我的理解是,您可以使用 hdfs api 联系运行 dfs 的 hadoop 集群并从中获取数据。
Why do we need to invoke the command as hadoop jar URLCat.jar
为什么我们需要调用命令为hadoop jar URLCat.jar
why not just java URLCat
为什么不只是 java URLCat
Why does the client necessarily need to install hadoop and then contact the hadoop cluster?
为什么客户端一定要安装hadoop,然后联系hadoop集群呢?

