java 我如何构建/运行这个简单的 Mahout 程序而不会出现异常？

Question

提问by dranxo

I would like to run this code which I found in Mahout In Action:

我想运行我在 Mahout In Action 中找到的这段代码：

package org.help;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

public class SeqPrep {

    public static void main(String args[]) throws IOException{

        List<NamedVector> apples = new ArrayList<NamedVector>();

        NamedVector apple;

        apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple");        

        apples.add(apple);

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path("appledata/apples");

        SequenceFile.Writer writer = new SequenceFile.Writer(fs,  conf, path, Text.class, VectorWritable.class);

        VectorWritable vec = new VectorWritable();
        for(NamedVector vector : apples){
            vec.set(vector);
            writer.append(new Text(vector.getName()), vec);
        }
        writer.close();

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf);

        Text key = new Text();
        VectorWritable value = new VectorWritable();
        while(reader.next(key, value)){
            System.out.println(key.toString() + " , " + value.get().asFormatString());
        }
        reader.close();

    }

}

I compile it with:

我编译它：

$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java

I jar it:

我 jar 它：

$ jar -cvf SeqPrep.jar -C myjavac/ .

Now I'd like to run it on my local hadoop node. I've tried:

现在我想在我的本地 hadoop 节点上运行它。我试过了：

 hadoop jar SeqPrep.jar org.help.SeqPrep

But I get:

但我得到：

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

So I tried using the libjars parameter:

所以我尝试使用 libjars 参数：

$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar

and got the same problem. I don't know what else to try.

并遇到了同样的问题。我不知道还能尝试什么。

My eventual goal is to be able to read a .csv file on the hadoop fs into a sparse matrix and then multiply it by a random vector.

我的最终目标是能够将 hadoop fs 上的 .csv 文件读取到稀疏矩阵中，然后将其乘以随机向量。

edit:Looks like Razvan got it (note: see below for another way to do this that does not mess with your hadoop installation). For reference:

编辑：看起来 Razvan 得到了它（注意：请参阅下面的另一种方法来做到这一点，不会干扰您的 hadoop 安装）。以供参考：

$ find /usr/local/hadoop-1.0.3/. |grep mah
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar

and then:

接着：

$hadoop jar SeqPrep.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

edit:I'm trying to do this without copying the mahout jars into the hadoop lib/

编辑：我试图在不将 mahout jars 复制到 hadoop lib/ 的情况下执行此操作

$ rm /usr/local/hadoop-1.0.3/lib/mahout-*

and then of course:

然后当然：

hadoop jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

and when I try the mahout job file:

当我尝试 mahout 作业文件时：

$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
    at java.net.URLClassLoader.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

If I try to include the .jar file I made:

如果我尝试包含我制作的 .jar 文件：

$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar

edit:Apparently I can only send one jar at a time to hadoop. This means I need to add the class I made into the mahout core job file:

编辑：显然我一次只能向 hadoop 发送一个 jar。这意味着我需要将我创建的类添加到 mahout 核心作业文件中：

~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup

~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class .

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class

And then:

接着：

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep

edit:Ok, now I can do it without messing with my hadoop installation. I was updating the .jar wrong in that previous edit. It should be:

编辑：好的，现在我可以在不搞乱我的 hadoop 安装的情况下做到这一点。我在之前的编辑中错误地更新了 .jar。它应该是：

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class

then:

然后：

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

Answer 1

回答by Sean Owen

You need to use the "job" JAR file provided by Mahout. It packages up all the dependencies. You need to add your classes to it too. This is how all the Mahout examples work. You shouldn't put Mahout jars in the Hadoop lib since that sort of "installs" a program too deeply in Hadoop.

您需要使用 Mahout 提供的“作业”JAR 文件。它打包了所有依赖项。您还需要将您的课程添加到其中。这就是所有 Mahout 示例的工作方式。你不应该把 Mahout jars 放在 Hadoop 库中，因为这种“安装”程序在 Hadoop 中太深了。

Answer 2

回答by Alex Ott

if you will take code for examples from https://github.com/tdunning/MiArepository, then it contains ready to use pom.xmlfile for Maven. And when you compile code with mvn package, then it will create mia-0.1-job.jarin the targetdirectory - this archive contains all dependencies, except Hadoop's, so you can run it on Hadoop cluster without problems

如果您将从https://github.com/tdunning/MiA存储库中获取示例代码，那么它包含可pom.xml用于 Maven 的文件。当您使用编译代码时mvn package，它将mia-0.1-job.jar在target目录中创建- 该存档包含所有依赖项，除了 Hadoop 的依赖项，因此您可以在 Hadoop 集群上毫无问题地运行它

Answer 3

回答by caoimhin

<dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-math</artifactId>
    <version>0.7</version>
</dependency>

<dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-collections</artifactId>
    <version>1.0</version>
</dependency>

Answer 4

回答by user1456599

What I did is to set the HADOOP_CLASSPATH with my jar and all the mahout jar files as shown below.

我所做的是使用我的 jar 和所有 mahout jar 文件设置 HADOOP_CLASSPATH，如下所示。

export HADOOP_CLASSPATH=/home/xxx/my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-integration-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0.jar

export HADOOP_CLASSPATH=/home/xxx/my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0.jar :/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH -4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0。 p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout -integration-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0.jar

Then I was able to run hadoop com.mycompany.mahout.CSVtoVector iris/nb/iris1.csv iris/nb/data/iris.seq

然后我能够运行 hadoop com.mycompany.mahout.CSVtoVector iris/nb/iris1.csv iris/nb/data/iris.seq

So you have to include all your jars and the mahout jar in the HADOOP_CLASSPATH and then you can just run your class with
hadoop <classname>

所以你必须在 HADOOP_CLASSPATH 中包含你所有的 jars 和 mahout jar 然后你可以用
hadoop运行你的类<classname>

java 我如何构建/运行这个简单的 Mahout 程序而不会出现异常？

提问by dranxo

回答by Sean Owen

回答by Alex Ott

回答by caoimhin

回答by user1456599

相关推荐

最近更新

标签

java 我如何构建/运行这个简单的 Mahout 程序而不会出现异常？

提问by dranxo

回答by Sean Owen

回答by Alex Ott

回答by caoimhin

回答by user1456599

相关推荐

java JSTL fmt:message 和资源包导致 ???hello?

java Mockito - 0 匹配器预期，1 记录 (InvalidUseOfMatchersException)

java 使用 TransferHandler 拖动 JLabel（拖放）

java 一个 web.xml 中的多个 jersey servlet

相关推荐

最近更新

标签