java 在java中运行pig而不嵌入pig脚本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11152068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Run pig in java without embedding pig script
提问by Logan
I am new to pig script, Hadoop, Hbase. Here's what i need to know. I wanted to run a pig script, I don't want to embed the pig script in my java program and wanted to run it through any Pig Execution methods passing the necessary pig script and parameters (possibly parameter file). Does the core pig library or any other library provides that way to execute a pig script. I already tried with java run-time exec method, I pass some parameters with space separated strings so i dropped calling pig grunt command through run-time exec method since it is not the proper way to execute pig commands.
我是猪脚本、Hadoop、Hbase 的新手。这是我需要知道的。我想运行一个 pig 脚本,我不想将 pig 脚本嵌入到我的 java 程序中,而是想通过传递必要的 pig 脚本和参数(可能是参数文件)的任何 Pig Execution 方法来运行它。核心 pig 库或任何其他库是否提供了执行 pig 脚本的方式。我已经尝试过使用 java 运行时 exec 方法,我传递了一些带有空格分隔字符串的参数,所以我放弃了通过运行时 exec 方法调用 pig grunt 命令,因为它不是执行 pig 命令的正确方法。
回答by Riyaz
You can use org.apache.pig.PigServer to run pig scripts from Java programs.
您可以使用 org.apache.pig.PigServer 从 Java 程序运行 Pig 脚本。
PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
pigServer.registerScript("scripts/test.pig");
Requires 'pig.properties' on classpath.
需要类路径上的“pig.properties”。
fs.default.name=hdfs://<namenode-hostname>:<port>
mapred.job.tracker=<jobtracker-hostname>:<port>
Or pass an instance of java.util.Properties to PigServer constructor.
或者将 java.util.Properties 的一个实例传递给 PigServer 构造函数。
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://<namenode-hostname>:<port>");
props.setProperty("mapred.job.tracker", "<jobtracker-hostname>:<port>");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
回答by Joe23
I am not sure I understand what your are asking. Do you want to know how to run a Pig script from a Java program?
我不确定我明白你在问什么。您想知道如何从 Java 程序运行 Pig 脚本吗?
If so we use the class org.apache.pig.PigRunner
for this.
如果是这样,我们将使用该类org.apache.pig.PigRunner
。
PigStats pigStats = PigRunner.run(args, null);
Its Javadoc states:
它的 Javadoc 指出:
A utility to help run PIG scripts within a Java program.
帮助在 Java 程序中运行 PIG 脚本的实用程序。
However from my experience Pig is not really intended to be used in this way (at least in version 0.8). We have had problems, like FileStreams that are left open and temporary files that are not deleted.
然而,根据我的经验,Pig 并不是真的打算以这种方式使用(至少在 0.8 版中)。我们遇到了问题,例如 FileStreams 保持打开状态和临时文件未删除。
回答by Arun A K
Since others have well explained pig execution by embeding the same in java, let me just add on how to run parametrised pig without java.
由于其他人已经通过在 Java 中嵌入相同的方法很好地解释了 pig 执行,让我补充一下如何在没有 Java 的情况下运行参数化 pig。
In this scenarion, all you need is your pig lines of code saved as a pig file, say myFirstPigScript.pig
.
在这种情况下,您只需要保存为 pig 文件的猪代码行,例如myFirstPigScript.pig
.
The next thing that you need is parameters within. Well here is the way to run your myFirstPigScript.pig
with three input parameters.
接下来您需要的是其中的参数。好吧,这是myFirstPigScript.pig
使用三个输入参数运行您的方法。
pig -p in1=file1.txt -p in2=file2.txt -p outdirectory=outdirectory myFirstPigScript.pig
Your pig script will look like
你的猪脚本看起来像
A = load '$in1' USING PigStorage(',') AS (id_one:chararray,file1field1:chararray);
B = load '$in2' USING PigStorage(',') AS (id_two:chararray,file2field1:chararray);
C = join A by id_one, B by id_two;
store D into '$outdirectory' USING PigStorage(',') ;
Sample input files will be a two column csv file
示例输入文件将是一个两列的 csv 文件
Output 'part' files will be present in the outdirectory
输出“部分”文件将出现在 outdirectory 中