如何使用 Java 类运行 Hadoop?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3606679/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 02:37:42  来源:igfitidea点击:

How can I run Hadoop run with a Java class?

javahadoop

提问by Josh Morrison

I am following the book Hadoop: the definitive Guide.

我正在关注Hadoop:权威指南一书。

I am confused on example 3-1.

我对示例 3-1 感到困惑。

There is a Java source file, URLCat.java. I use javacto compile it into URLCat.class, then use jarto wrap it into a jar.

有一个 Java 源文件 URLCat.java。我用javac它编译成URLCat.class,然后用jar它包装成一个jar。

The book said to use

书上说要用

% hadoop URLCat hdfs://localhost/user/tom/quangle.txt

to run it. I have tried a lot of different ways, such as

运行它。我尝试了很多不同的方法,例如

% hadoop jar URLCat.jar .......

but didn't work. I got errors like this:

但没有用。我收到这样的错误:

Exception in thread "main" java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt

线程“main”中的异常 java.lang.ClassNotFoundException: hdfs://localhost/user/username/quangle/txt

What is the reason for this, and how do I do it right?

这是什么原因,我该怎么做?

采纳答案by khmarbaise

The syntaxof the command is a little bit different:

命令的语法有点不同:

hadoop fs -cat hdfs:///user/tom/quangle.txt

Do you have hadoop home in your path? can you call hadoop without any parameters?

你的路上有hadoop home吗?你可以不带任何参数调用hadoop吗?

回答by SquareCog

It's quite simple:

这很简单:

[me@myhost ~]$ hadoop jar
RunJar jarFile [mainClass] args...

So, what you want is hadoop jar yourJar.jar your.class.with.Main [any args]

所以,你想要的是 hadoop jar yourJar.jar your.class.with.Main [any args]

回答by Jean Gateau

Of course you could use cat, but that sort of isn't the point (i.e. you're learning, not just trying to get it to work).

当然,您可以使用 cat,但这不是重点(即您正在学习,而不仅仅是试图让它发挥作用)。

As per the book, you need to set your HADOOP_CLASSPATHenvironment variable. In my case, using the build example in the book, all of my classes are at: /media/data/hadefguide/book/build/classes

根据本书,您需要设置HADOOP_CLASSPATH环境变量。就我而言,使用书中的构建示例,我的所有类都位于:/media/data/hadefguide/book/build/classes

Here's an example:

下面是一个例子:

hduser@MuleBox ~ $ export HADOOP_CLASSPATH=
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
Exception in thread "main" java.lang.NoClassDefFoundError: URLCat
Caused by: java.lang.ClassNotFoundException: URLCat
    at java.net.URLClassLoader.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: URLCat.  Program will exit.
hduser@MuleBox ~ $ export HADOOP_CLASSPATH=/media/data/hadefguide/book/build/classes
hduser@MuleBox ~ $ hadoop URLCat hdfs://localhost/user/hduser/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.

回答by Punit Sachdev

Not sure how Useful is the answer now. I faced the same issue today in fact working on an example from the same book (Hadoop definitive guide) I was able to execute an example program as follows:

不确定现在的答案有多大用处。我今天遇到了同样的问题,实际上我正在研究同一本书(Hadoop 权威指南)中的一个示例,我能够执行一个示例程序,如下所示:

  • Write your java code and save it as .javafile

  • Compile your java program using:

    javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>
    
  • Create a jar file containing your class file:

    jar cvf <jar file> <class files to add separated by space>
    
  • Execute the jar file using hadoopcommand line:

    hadoop jar <jar file name> <class name containing your main method> <argument to the main method>
    

    e.g.

    hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
    
  • 编写您的 Java 代码并将其另存为.java文件

  • 使用以下命令编译您的 java 程序:

    javac -classpath <path to hadoop core and commons-cli jar file> <path to your java program file>
    
  • 创建一个包含类文件的 jar 文件:

    jar cvf <jar file> <class files to add separated by space>
    
  • 使用hadoop命令行执行 jar 文件:

    hadoop jar <jar file name> <class name containing your main method> <argument to the main method>
    

    例如

    hadoop jar FileSystemCat.jar FileSystemCat hdfs://localhost/user/root/MyFiles/meet_a_seer.txt
    

Hope it helps

希望能帮助到你

回答by Kishore Bhosale

step 1: Compile Java Program:

第一步:编译Java程序:

javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar

javac URLCat.java -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar

step 2: Create jar file :

第 2 步:创建 jar 文件:

jar cvf URLCat.jar URLCat.class

jar cvf URLCat.jar URLCat.class

Step 3: Execute program : (mention your hdfs file location)

第 3 步:执行程序:(提及您的 hdfs 文件位置)

hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt

hadoop jar URLCat.jar URLCat hdfs://localhost:9000/pcode/wcinput.txt

回答by Eric Na

Go to the directory where your compiled .classfiles are residing.

转到编译.class文件所在的目录。

Use full class name including package name (refer to Receiving "wrong name" NoClassDefFoundError when executing a Java program from the command-linefor full class name or which directory to run the job in) when running hadoop URLCat hdfs://localhost/user/tom/quangle.txt.

使用完整的类名,包括包名(指在命令行执行Java程序时收到“错名” NoClassDefFoundError的全类名或目录运行作业)运行时hadoop URLCat hdfs://localhost/user/tom/quangle.txt

In my case URLCat.javawas in com.tom.app, so the hadoop command was hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt.

在我的情况URLCat.javacom.tom.app,这样Hadoop的命令是hadoop com.tom.app.URLCat hdfs://localhost/user/tom/quangle.txt

回答by kometen

I did this based on help found on this site and the hadoop tutorial.

我是根据在这个站点上找到的帮助和 hadoop 教程来做的。

mkdir urlcat_classes<br>
javac -classpath /usr/lib/hadoop/hadoop-0.20.2-cdh3u1-core.jar -d     urlcat_classes URLCat.java<br>
jar -cvf urlcat.jar -C urlcat_classes .<br>
hadoop jar urlcat.jar no.gnome.URLCat       
hdfs://localhost/user/claus/sample.txt<br>
<br>
no.gnome is from 'package no.gnome;' in URLCat.java.<br><br>

regards
Claus

问候
克劳斯

回答by Dmytro Molkov

To make the hadoop URLCat command work you need to get the jar (URLCat.jar) to be in your class path. You can put it in lib/ dir of hadoop for that.

要使 hadoop URLCat 命令工作,您需要将 jar (URLCat.jar) 放在您的类路径中。为此,您可以将其放在 hadoop 的 lib/ 目录中。

For the hadoop jar URLCat.jar to run you need to create a jar that will have Main class defined in it, otherwise it thinks that the next argument on the command line is the class name. What you can try is hadoop jar URLCat.jar URLCat hdfs://...

要运行 hadoop jar URLCat.jar,您需要创建一个 jar,其中将定义 Main 类,否则它认为命令行上的下一个参数是类名。你可以尝试的是 hadoop jar URLCat.jar URLCat hdfs://...

回答by Apurv

We can access HDFS through the hdfs api. My understanding of it is that you can use the hdfs api to contact a hadoop cluster running the dfs and fetch data from it.

我们可以通过 hdfs api 访问 HDFS。我的理解是,您可以使用 hdfs api 联系运行 dfs 的 hadoop 集群并从中获取数据。

Why do we need to invoke the command as hadoop jar URLCat.jar

为什么我们需要调用命令为hadoop jar URLCat.jar

why not just java URLCat

为什么不只是 java URLCat

Why does the client necessarily need to install hadoop and then contact the hadoop cluster?

为什么客户端一定要安装hadoop,然后联系hadoop集群呢?