Java 将外部 jar 设置为 hadoop 类路径

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26748811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 03:13:11  来源:igfitidea点击:

Setting external jars to hadoop classpath

javahadoopmapreducebigtop

提问by mnm

I am trying to set external jars to hadoop classpath but no luck so far.

我正在尝试将外部 jar 设置为 hadoop 类路径,但到目前为止还没有运气。

I have the following setup

我有以下设置

$ hadoop version
Hadoop 2.0.6-alpha Subversion https://git-wip-us.apache.org/repos/asf/bigtop.git-r ca4c88898f95aaab3fd85b5e9c194ffd647c2109 Compiled by jenkins on 2013-10-31T07:55Z From source with checksum 95e88b2a9589fa69d6d5c1dbd48d4e This command was run using /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar

$ hadoop 版本
Hadoop 2.0.6-alpha Subversion https://git-wip-us.apache.org/repos/asf/bigtop.git-r ca4c88898f95aaab3fd85b5e9c194ffd647c2109 由 jenkins 编译 2013-1758d59d59d56esum2013-108d58d258d258tc2108tc2108tc2108d58d56e58tc2108tc58d58e58tc58531此命令使用 /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar 运行

Classpath

类路径

$ echo $HADOOP_CLASSPATH
/home/tom/workspace/libs/opencsv-2.3.jar

$ echo $HADOOP_CLASSPATH
/home/tom/workspace/libs/opencsv-2.3.jar

I am able see the above HADOOP_CLASSPATH has been picked up by hadoop

我可以看到上面的 HADOOP_CLASSPATH 已被 hadoop 选中

$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/:/usr/lib/hadoop/.//:/home/tom/workspace/libs/opencsv-2.3.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/:/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/:/usr/lib/hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/:/usr/lib/hadoop-mapreduce/.//

$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/ :/usr/lib/hadoop/.//: /home/tom/workspace/libs/opencsv-2.3.jar:/usr/lib /hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/ :/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/ :/usr/lib/ hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/ :/usr/lib/hadoop-mapreduce/.//

Command

命令

$ sudo hadoop jarFlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result

$ sudo hadoop jarFlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result

I tried with -libjars option as well

我也试过 -libjars 选项

$ sudo hadoop jarFlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result -libjars/home/tom/workspace/libs/opencsv-2.3.jar

$ sudo hadoop jarFlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result -libjars/home/tom/workspace/libs/opencsv-2.3.jar

The stacktrace

堆栈跟踪

14/11/04 16:43:23 INFO mapreduce.Job: Running job: job_1415115532989_0001 14/11/04 16:43:55 INFO mapreduce.Job: Job job_1415115532989_0001 running in uber mode : false 14/11/04 16:43:56 INFO mapreduce.Job: map 0% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: map 50% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: Task Id : attempt_1415115532989_0001_m_000001_0, Status : FAILED Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParserat java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:19) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:10) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

14/11/04 16:43:23 INFO mapreduce.Job:运行作业:job_1415115532989_0001 14/11/04 16:43:55 INFO mapreduce.Job:Job job_1415115532986 inuber14 运行模式:1415115532989_0001 :56 INFO mapreduce.Job: map 0% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: map 50% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: 任务Id:尝试_1415115532989_0001_m_000001_0,状态:失败错误:java.lang。ClassNotFoundException:au.com.bytecode.opencsv。CSV解析器在 java.net.URLClassLoader$1.run(URLClassLoader.java:366) 在 java.net.URLClassLoader$1.run(URLClassLoader.java:355) 在 java.security.AccessController.doPrivileged(Native Method) 在 java.net.URLClassLoader .findClass(URLClassLoader.java:354) 在 java.lang.ClassLoader.loadClass(ClassLoader.java:425) 在 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 在 java.lang.ClassLoader.loadClass( ClassLoader.java:358) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:19) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:10) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org. apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) 在 org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) 在 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild .java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:第 1478 章

Any help is highly appreciated.

任何帮助都受到高度赞赏。

回答by blackSmith

Your external jar is missing on the node running maps. You have to add it to the cache to make it available. Try :

运行地图的节点上缺少您的外部 jar。您必须将其添加到缓存中才能使其可用。尝试 :

DistributedCache.addFileToClassPath(new Path("pathToJar"), conf);

Not sure in which version DistributedCachewas deprecated, but from Hadoop 2.2.0 onward you can use :

不确定哪个版本DistributedCache已被弃用,但从 Hadoop 2.2.0 开始,您可以使用:

job.addFileToClassPath(new Path("pathToJar")); 

回答by mnm

I found another workaround by implementing ToolRunner like below. With this approach hadoop accepts command line options. We can avoid hard coding of adding files to DistributedCache

我通过实现如下所示的 ToolRunner 找到了另一种解决方法。通过这种方法,hadoop 接受命令行选项。我们可以避免将文件添加到 DistributedCache 的硬编码

 public class FlightsByCarrier extends Configured implements Tool {

       public int run(String[] args) throws Exception {
         // Configuration processed by ToolRunner
         Configuration conf = getConf();

         // Create a JobConf using the processed conf
         JobConf job = new JobConf(conf, FlightsByCarrier.class);

         // Process custom command-line options
         Path in = new Path(args[1]);
         Path out = new Path(args[2]);

         // Specify various job-specific parameters     
         job.setJobName("my-app");
         job.setInputPath(in);
         job.setOutputPath(out);
         job.setMapperClass(MyMapper.class);
         job.setReducerClass(MyReducer.class);

         // Submit the job, then poll for progress until the job is complete
         JobClient.runJob(job);
         return 0;
       }

       public static void main(String[] args) throws Exception {
         // Let ToolRunner handle generic command-line options 
         int res = ToolRunner.run(new Configuration(), new FlightsByCarrier(), args);

         System.exit(res);
       }
     }

回答by user5856557

I found a very easy solution to the problem: login as root :

我找到了一个非常简单的问题解决方案:以 root 身份登录:

cd /usr/lib find . -name "opencsv.jar"

cd /usr/lib 找到。-name "opencsv.jar"

Pick up the locatin of the file. In my case >I found it under /usr/lib/hive/lib/opencsv*.jar

拿起文件的位置。在我的情况下>我在 /usr/lib/hive/lib/opencsv*.jar 下找到它

Now submit the command

现在提交命令

hadoop classpath

hadoop类路径

The result shows the direcory where hadoop searches jar-files. Pick up one directory and copy opencsv*jar to that directory.

结果显示了 hadoop 搜索 jar 文件的目录。选择一个目录并将 opencsv*jar 复制到该目录。

In my case it worked.

就我而言,它起作用了。

回答by Isaiah4110

If you are adding the external JAR to the Hadoop classpath then its better to copy your JAR to one of the existing directories that hadoop is looking at. On the command line run the command "hadoop classpath" and then find a suitable folder and copy your jar file to that location and hadoop will pick up the dependencies from there. This wont work with CloudEra etc as you may not have read/write rights to copy files to the hadoop classpath folders.

如果您要将外部 JAR 添加到 Hadoop 类路径,那么最好将您的 JAR 复制到 hadoop 正在查看的现有目录之一。在命令行上运行命令“hadoop classpath”,然后找到一个合适的文件夹并将您的 jar 文件复制到该位置,hadoop 将从那里获取依赖项。这不适用于 CloudEra 等,因为您可能没有将文件复制到 hadoop 类路径文件夹的读/写权限。

Looks like you tried the LIBJARs option as well, did you edit your driver class to implement the TOOL interface? First make sure that you edit your driver class as shown below:

看起来您也尝试过 LIBJARs 选项,您是否编辑了驱动程序类以实现 TOOL 接口?首先确保您编辑驱动程序类,如下所示:

    public class myDriverClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {

        // Configuration processed by ToolRunner 
        Configuration conf = getConf();
        Job job = new Job(conf, "My Job");

        ...
        ...

        return job.waitForCompletion(true) ? 0 : 1;
      }
    }

Now edit your "hadoop jar" command as shown below:

现在编辑您的“hadoop jar”命令,如下所示:

hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file

Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParserto parse the generic hadoop command line arguments and modifies the Configuration of the Tool.

现在让我们了解下面发生了什么。基本上,我们通过实现TOOL Interface来处理新的命令行参数。ToolRunner 用于运行实现 Tool 接口的类。它与GenericOptionsParser结合使用来解析通用的 hadoop 命令行参数并修改工具的配置。

Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args)- this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.

在我们调用的 Main() 中ToolRunner.run(new Configuration(), new myDriverClass(), args)-在使用给定的泛型参数解析后,它通过 Tool.run(String[]) 运行给定的 Tool 。它使用给定的配置,或者如果它为空则构建一个,然后使用可能修改过的 conf 版本设置工具的配置。

Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.

现在在 run 方法中,当我们调用 getConf() 时,我们将获得配置的修改版本。因此,请确保您的代码中有以下行。如果您实现了其他所有内容并仍然使用 Configuration conf = new Configuration(),则没有任何效果。

Configuration conf = getConf();

回答by nitinm

I tried setting the opencsv jar in the hadoop classpath but it didn't work.We need to explicitly copy the jar in the classpath for this to work.It did worked for me. Below are the steps i followed:

我尝试在 hadoop 类路径中设置 opencsv jar,但它不起作用。我们需要在类路径中明确复制 jar 才能使其工作。它确实对我有用。以下是我遵循的步骤:

I have done this in a HDP CLuster.I ahave copied my opencsv jar in hbase libs and exported it before running my jar

我在 HDP CLuster 中完成了此操作。我在 hbase 库中复制了我的 opencsv jar 并在运行我的 jar 之前将其导出

Copy ExternalJars to HDP LIBS:

将 ExternalJars 复制到 HDP LIBS:

To Run Open CSV Jar:1.Copy the opencsv jar in directory /usr/hdp/2.2.9.1-11/hbase/lib/ and /usr/hdp/2.2.9.1-11/hadoop-yarn/lib

运行 Open CSV Jar:1. 复制目录 /usr/hdp/2.2.9.1-11/hbase/lib/ 和 /usr/hdp/2.2.9.1-11/hadoop-yarn/lib 中的 opencsv jar

**sudo cp  /home/sshuser/Amedisys/lib/opencsv-3.7.jar /usr/hdp/2.2.9.1-11/hbase/lib/**

2.Give the file permissions using sudo chmod 777 opencsv-3.7.jar3.List Files ls -lrt

2.使用sudo chmod 777 opencsv-3.7.jar授予文件权限 3.List Files ls -lrt

4.exporthadoop classpath:hbase classpath

4.出口hadoop classpathhbase classpath

5.Now run your Jar.It will pick up the opencsv jar and will execute properly.

5.现在运行您的 Jar。它将拾取 opencsv jar 并正确执行。