类似于 R data.frame 的 Java 对象

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20540831/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 02:38:42  来源:igfitidea点击:

Java object analogue to R data.frame

javardataframe

提问by Michael

I really like data.frames in R because you can store different types of data in one data structure and you have a lot of different methods to modify the data (add column, combine data.frames,...), it is really easy to extract a subsetfrom the data,...

我真的很喜欢 R 中的 data.frames 因为你可以在一个数据结构中存储不同类型的数据,并且你有很多不同的方法来修改数据(添加列,组合 data.frames,...),这真的很容易从数据中提取一个子集,...

Is there any Java library available which have the same functionality? I'm mostly interested in storing different types of data in a matrix-like fashion and be able to extract a subset of the data.

是否有任何可用的具有相同功能的 Java 库?我最感兴趣的是以类似矩阵的方式存储不同类型的数据,并能够提取数据的子集。

Using a two-dimensional array in Java can provide a similar structure, but it is much more difficult to add a column and afterwards extract the top k records.

在 Java 中使用二维数组可以提供类似的结构,但是添加一列然后提取前 k 条记录要困难得多。

采纳答案by Rahel Lüthy

I have just open-sourced a first draft version of Paleo, a Java 8 library which offers data frames based on typed columns (including support for primitive values). Columns can be created programmatically (through a simple builder API), or imported from text file.

我刚刚开源了Paleo的第一个草稿版本,这是一个 Java 8 库,它提供基于类型化列的数据帧(包括对原始值的支持)。可以通过编程方式(通过简单的构建器 API)创建列,或从文本文件导入。

Please refer to the READMEfor further details.

请参阅自述文件以获取更多详细信息。

The project is still wet from birth – I am very interested in feedback / PRs, tia!

该项目从一开始就处于湿润状态——我对反馈/公关非常感兴趣,tia!

回答by Ondrej Skopek

Not being very proficient with R, but you should have a look at Guava, specifically Tables. They do notprovide the exact functionality you want, but you could either extend them or their specification could help you in writing your own Collection.

不是很精通 R,但你应该看看Guava,特别是Tables。它们没有提供您想要的确切功能,但是您可以扩展它们,或者它们的规范可以帮助您编写自己的集合。

回答by Bryan Cardillo

I also found myself in need of a data frame structure while working in Java recently. Fortunately, after writing a very basic implementation I was able to get approval to release it as open source. You can find my implementation here: Joinery -- Data frames for Java. Contributions and feature requests are welcome.

我最近在使用 Java 时也发现自己需要一个数据帧结构。幸运的是,在编写了一个非常基本的实现之后,我能够获得批准将其作为开源发布。您可以在此处找到我的实现:Joinery -- Data frames for Java。欢迎贡献和功能请求。

回答by L. Blanc

Tablesaw (https://github.com/jtablesaw/tablesaw) is Java dataframe begun in 2015 and is under active development (2018). It's designed to be as scalable as possible without sacrificing ease-of-use. Features include filtering by rows and columns, descriptive stats, map/reduce functions, cross-tabs, plots, machine learning. Apache license.

Tablesaw ( https://github.com/jtablesaw/tablesaw) 是 2015 年开始的 Java 数据框,并且正在积极开发中 (2018)。它旨在在不牺牲易用性的情况下尽可能具有可扩展性。功能包括按行和列过滤、描述性统计、映射/归约函数、交叉表、绘图、机器学习。阿帕奇许可证。

In one query test it returned 500+ records from a 1/2 billion record table in 2 ms.

在一项查询测试中,它在 2 毫秒内从 1/20 亿记录表中返回了 500 多条记录。

Contributions, feature requests, and feedback are welcome.

欢迎投稿、功能请求和反馈。

回答by Xavier Witdouck

Morpheus (http://www.zavtech.com/morpheus/docs/) provides a DataFrame analogue to that of R. It is a high performance column store data structure that enables data to sorted, sliced, grouped, and aggregated in either the row or column dimension. It also supports parallel processing for many of these operations using the Fork & Join framework internally.

Morpheus ( http://www.zavtech.com/morpheus/docs/) 提供了一个类似于 R 的 DataFrame。它是一种高性能的列存储数据结构,使数据能够在行或列维度。它还支持在内部使用 Fork & Join 框架对许多这些操作进行并行处理。

You can easily read & write data to CSV files, databases and also a proprietary JSON format. Adapters to load data from Quandl, Google Finance and others are also available.

您可以轻松地将数据读取和写入 CSV 文件、数据库以及专有的 JSON 格式。还可以使用从 Quandl、Google Finance 和其他公司加载数据的适配器。

It has built in support for various styles of Linear Regressions, Principal Component Analysis, Linear Algebra and other types of analytics support. The feature set is still growing, but it is already a very capable framework.

它内置了对各种类型的线性回归、主成分分析、线性代数和其他类型的分析支持的支持。功能集仍在增长,但它已经是一个非常强大的框架。

回答by moldovean

In R we have the dataframe, in Python we have pandas, in Java: There is the Schemafrom the deeplearning4j

在 R 中,我们有数据框,在 Python 中,我们有Pandas ,在 Java 中:有来自 deeplearning4j的模式

There is also a version for the data analysis of the ubiquitous iris data if you want to just get started, here

还有一个版本可以对无处不在的虹膜数据进行数据分析,如果你想入门的话,这里

There are also other custom objects (from Weka, from Tensorflow that are more or less the same).

还有其他自定义对象(来自 Weka,来自 Tensorflow,或多或少相同)。