从 JAVA 调用 R 以获取卡方统计量和 p 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16019654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 21:38:18  来源:igfitidea点击:

Call R from JAVA to get Chi-squared statistic and p-value

javarstatisticschi-squared

提问by MadScone

I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts.

我在 JAVA 中有两个 4*4 矩阵,其中一个矩阵保存观察到的计数,另一个保存预期的计数。

I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; however, JAVA has no such function as far as I am aware.

我需要一种自动方法来根据这两个矩阵之间的卡方统计量计算 p 值;但是,据我所知,JAVA 没有这样的功能。

I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows:

我可以通过将两个矩阵作为 .csv 文件格式读入 R 来计算卡方及其 p 值,然后使用 chisq.test 函数如下:

obs<-read.csv("obs.csv")
exp<-read.csv("exp.csv")
chisq.test(obs,exp)

where the format of the .csv files would as follows:

其中 .csv 文件的格式如下:

A, C, G, T
A, 197.136, 124.32, 63.492, 59.052
C, 124.32, 78.4, 40.04, 37.24
G, 63.492, 40.04, 20.449, 19.019
T, 59.052, 37.24, 19.019, 17.689

Given these commands, R will give an output of the format:

给定这些命令,R 将给出以下格式的输出:

X-squared = 20.6236, df = 9, p-value = 0.01443

which includes the p-value I was looking for.

其中包括我正在寻找的 p 值。

Does anyone know of an efficient way to automate the process of:

有谁知道自动化以下过程的有效方法:

1) Outputting my matrices from JAVA into .csv files 2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 4) Returning the outputted p-value back into JAVA?

1) 将我的矩阵从 JAVA 输出到 .csv 文件 2) 将 .csv 文件上传到 R 3) 将 .csv 文件上的 chisq.test 调用到 R 4) 将输出的 p 值返回到 JAVA?

Thanks for any help....

谢谢你的帮助....

回答by MadScone

There are (at least) two ways of going about this.

有(至少)两种方法可以解决这个问题。



Command Line & Scripts

命令行和脚本

You can execute Rscripts from the command line with Rscript.exe. E.g. in your script you would have:

您可以使用Rscript.exe. 例如,在您的脚本中,您将拥有:

# Parse arguments.
# ...
# ...

chisq.test(obs, exp)

Rather than creating CSVs in Java and having R read them, you should be able to pass them straight to R. I don't see the need to create CSVs and pass data that way, UNLESS your matrices are quite big. There are limitations on the size of command line arguments you can pass (varies across operating system I think).

您应该能够将它们直接传递给 R,而不是在 Java 中创建 CSV 并让 R 读取它们。我认为不需要创建 CSV 并以这种方式传递数据,除非您的矩阵非常大。您可以传递的命令行参数的大小有限制(我认为因操作系统而异)。

You can pass arguments into Rscripts and parse them using the commandArgs()functions or with various packages (e.g. optparseor getopt). See this thread for more information.

您可以将参数传递到 Rscripts 并使用commandArgs()函数或各种包(例如optparsegetopt)解析它们。有关详细信息,请参阅此线程

There are several ways of calling and reading from the command line in Java. I don't know enough about it to give you advice but a bit of googling will give you a result. Calling a script from the command line is done like this:

在 Java 中有几种从命令行调用和读取的方法。我对它的了解还不够多,无法为您提供建议,但稍微谷歌一下就会给您一个结果。从命令行调用脚本是这样完成的:

Rscript my_script.R


JRI

联合研究所

JRI lets you talk to R straight from Java. Here's an example of how you would pass a double array to R and have R sum it (this is Java now):

JRI 让您可以直接从 Java 与 R 对话。这是一个示例,说明如何将双精度数组传递给 R 并让 R 求和(现在是 Java):

// Start R session.
Rengine re = new Rengine (new String [] {"--vanilla"}, false, null);

// Check if the session is working.
if (!re.waitForR()) {
    return;
}

re.assign("x", new double[] {1.5, 2.5, 3.5});
REXP result = re.eval("(sum(x))");
System.out.println(result.asDouble());
re.end();

The function assign()here is the same as doing this in R:

assign()这里的功能与在 R 中的功能相同:

x <- c(1.5, 2.5, 3.5)

You should be able to work out how to extend this to work with a matrix.

您应该能够弄清楚如何扩展它以使用矩阵。



I think JRI is quite difficult at the beginning. So if you want to get this done quickly the command line option is probably best. I would say the JRI approach is less messy once you get it set up though. And if you have situations where you have a lot of back and forth between R and Java it is definitely better than calling multiple scripts.

我认为 JRI 一开始是相当困难的。因此,如果您想快速完成此操作,命令行选项可能是最佳选择。我会说 JRI 方法一旦设置好就不会那么混乱。如果您在 R 和 Java 之间有很多来回的情况,那绝对比调用多个脚本要好。

  1. Link to JRI.
  2. Recommended Eclipse plugin to set up JRI.
  1. 链接到 JRI
  2. 推荐 Eclipse 插件来设置 JRI

回答by rarry

Check this page JRI

检查这个页面JRI

Description form their site:

描述来自他们的网站:

JRI is a Java/R Interface, which allows to run R inside Java applications as a single thread. Basically it loads R dynamic library into Java and provides a Java API to R functionality. It supports both simple calls to R functions and a full running REPL.

JRI 是 Java/R 接口,它允许在 Java 应用程序中作为单线程运行 R。基本上它将 R 动态库加载到 Java 中,并为 R 功能提供 Java API。它支持对 R 函数的简单调用和完整运行的 REPL。

回答by jbytecode

RCaller 2.2 can do what you want to do. Suppose the frequency matrix is given as in your question. The resulted p.value and df variables can be calculated and returned using the code below:

RCaller 2.2 可以做你想做的事。假设频率矩阵在您的问题中给出。可以使用以下代码计算并返回结果 p.value 和 df 变量:

double[][] data = new double[][]{
        {197.136, 124.32, 63.492, 59.052},
        {124.32, 78.4, 40.04, 37.24},
        {63.492, 40.04, 20.449, 19.019},
        {59.052, 37.24, 19.019, 17.689}
        };
    RCaller caller = new RCaller();
    Globals.detect_current_rscript();
    caller.setRscriptExecutable(Globals.Rscript_current);
    RCode code = new RCode();

    code.addDoubleMatrix("mydata", data);
    code.addRCode("result <- chisq.test(mydata)");
    code.addRCode("mylist <- list(pval = result$p.value, df=result$parameter)");

    caller.setRCode(code);
    caller.runAndReturnResult("mylist");

    double pvalue = caller.getParser().getAsDoubleArray("pval")[0];
    double df = caller.getParser().getAsDoubleArray("df")[0];
    System.out.println("Pvalue is : "+pvalue);
    System.out.println("Df is : "+df);

The output is:

输出是:

Pvalue is : 1.0
Df is : 9.0

You can get the technical details in here

您可以在此处获取技术详细信息

回答by jbytecode

Rserve is another way to get your data from Java to R and back. It is a server which takes R scripts as string inputs. You can use some string parsing and conversion in Java to convert the matrices into strings that can be input into R.

Rserve 是另一种将数据从 Java 获取到 R 并返回的方法。它是一个将 R 脚本作为字符串输入的服务器。您可以使用Java中的一些字符串解析和转换将矩阵转换为可以输入到R中的字符串。

import org.rosuda.REngine.REXP;
import org.rosuda.REngine.Rserve.RConnection;


public class RtestScript {

private String emailTestScript = "open <- c('O', 'O', 'N', 'N', 'O', 'O', 'N', 'N', 'N', 'O', " +
        " 'O', 'N', 'N', 'O', 'O', 'N', 'N', 'N', 'O');" +
        "testgroup <- c('A', 'A', 'A','A','A','A','A','A','A','A', 'B'," +
        "'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B');" +
        "emailTest <- data.frame(open, testgroup);" +
        "emailTable<- table(emailTest$open, emailTest$testgroup);" +
        "emailResults<- prop.test(emailTable, correct=FALSE);" +
        "print(emailResults$p.value);";

public void executeRscript() {
    try {
        //Make sure to type in library(Rserve); Rserve() in Rstudio before running this
        RConnection testConnection = new RConnection();

        REXP testExpression = testConnection.eval(emailTestScript);
        System.out.println("P value: " + testExpression.asString());
    } catch(Exception e) {
        e.printStackTrace();
    }
}
}

Here is some more information on Rserve. Incidentally, this is also how Tableau can communicate with R as well with their R connection.

这是有关 Rserve 的更多信息。顺便提一下,这也是 Tableau 与 R 以及 R 连接进行通信的方式。

https://cran.r-project.org/web/packages/Rserve/index.html

https://cran.r-project.org/web/packages/Rserve/index.html

回答by Vasily

1) Outputting my matrices from JAVA into .csv files

1) 将我的矩阵从 JAVA 输出到 .csv 文件中

Use any of CSV libraies, I would recommend http://opencsv.sourceforge.net/

使用任何 CSV 库,我会推荐http://opencsv.sourceforge.net/

2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R

2) 将 .csv 文件上传到 R 3) 将 .csv 文件上的 chisq.test 调用到 R

2 & 3 a pretty the same, You better create parametrized script to be run in R.

2 & 3 a 非常相同,您最好创建要在 R 中运行的参数化脚本。

obs<-read.csv(args[1])
exp<-read.csv(args[2])
chisq.test(obs,exp)

So you can run

所以你可以运行

RScript your_script.r path_to_csv1 path_to_csv2, 

and use unique names for the csv files for example:

并为 csv 文件使用唯一名称,例如:

UUID.randomUUID().toString().replace("-","")

And then you use

然后你用

Runtime.getRuntime().exec(command, environments, dataDir);

4) Returning the outputted p-value back into JAVA? You can only read the output of R if you are using getRuntime().exec() to invoke R.

4) 将输出的 p 值返回到 JAVA 中?如果您使用 getRuntime().exec() 调用 R,则只能读取 R 的输出。

I would also recommend to take a look at Apache's Statistics Lib& How to calculate PValue from ChiSquare. Maybe you can live without R at all :)

我还建议您查看Apache 的 Statistics Lib& How to calculate PValue from ChiSquare。也许你可以完全没有 R :)

回答by ziggystar

I recommend to simply use a Java library that does a ChiSquare test for you. There are enough of them:

我建议简单地使用一个 Java 库来为你做一个 ChiSquare 测试。有足够的:

This is not a complete list, but what I found in 5 minutes searching.

这不是一个完整的列表,而是我在 5 分钟搜索中发现的内容。