JavaScript 中的 Python Pandas 等价物
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30610675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas equivalent in JavaScript
提问by neversaint
With this CSV example:
使用此 CSV 示例:
Source,col1,col2,col3
foo,1,2,3
bar,3,4,5
The standard method I use Pandas is this:
我使用 Pandas 的标准方法是这样的:
Parse CSV
Select columns into a data frame (
col1
andcol3
)- Process the column (e.g. avarage the values of
col1
andcol3
)
解析CSV
将列选择到数据框中(
col1
和col3
)- 处理该柱(例如的avarage值
col1
和col3
)
Is there a JavaScript library that does that like Pandas?
有没有像 Pandas 那样的 JavaScript 库?
采纳答案by The Red Pea
All answers are good. Hoping my answer is comprehensive (i.e. tries to list alloptions). I hope to return and revise this answer with any criteria to help make a choice.
所有的答案都很好。希望我的回答是全面的(即尝试列出所有选项)。我希望以任何标准返回并修改此答案以帮助做出选择。
I hope anyone coming here is familiar with d3
. d3
is very useful "swiss army knife" for handling data in Javascript, like pandas
is helpful for Python. You may see d3
used frequently like pandas
, even if d3
is not exactly a DataFrame/Pandas replacement(i.e. d3
doesn't have the same API; d3
doesn't have Series
/ DataFrame
which behave like in pandas
)
我希望任何来这里的人都熟悉d3
. d3
在 Javascript 中处理数据是非常有用的“瑞士军刀”,就像pandas
对 Python 有帮助一样。您可能会看到d3
经常使用 like pandas
,即使d3
它不完全是 DataFrame/Pandas 的替代品(即d3
没有相同的 API;d3
没有Series
/ 的DataFrame
行为类似于pandas
)
Ahmed's answer explains how d3 can be usedto achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsDatawhich uses d3
and lodash
.
Ahmed 的回答解释了如何使用 d3来实现一些 DataFrame 功能,下面的一些库的灵感来自LearnJsData 之类的东西,它使用d3
和lodash
。
As for DataFrame-focused-features , I was overwhelmed with JS libraries which help. Here's a quick list of some of the options you might've encountered. I haven't checked any of them in detail yet (Most I found in combination Google + NPM search).
至于 DataFrame-focused-features ,我对 JS 库的帮助感到不知所措。这是您可能遇到的一些选项的快速列表。我还没有详细检查它们中的任何一个(大多数是我在 Google + NPM 搜索中找到的)。
Be careful you use a variety that you can work with; some are Node.js aka Server-side Javascript, some are browser-compatible aka client-side Javascript. Some are Typescript.
小心使用可以使用的品种;有些是 Node.js 又名服务器端 Javascript,有些是浏览器兼容的又名客户端 Javascript。有些是打字稿。
- pandas-js
- dataframe-js
- "DataFrame-js provides an immutable data structure for javascript and datascience, the DataFrame, which allows to work on rows and columns with a sql and functional programming inspired api."
- data-forge
- Seen in Ashley Davis' answer
- "JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ."
- Note the old data-forge JS repositoryis no longer maintained; now a new repository uses Typescript
- jsdataframe
- "Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas."
- dataframe
- "explore data by grouping and reducing."
- 熊猫-js
- 数据框-js
- “DataFrame-js 为 javascript 和数据科学提供了一个不可变的数据结构,即 DataFrame,它允许使用受 sql 和函数式编程启发的 api 处理行和列。”
- 数据伪造
- 在阿什利戴维斯的回答中看到
- “受 Pandas 和 LINQ 启发的 JavaScript 数据转换和分析工具包。”
- 请注意,不再维护旧的 data-forge JS 存储库;现在一个新的存储库使用 Typescript
- js数据框
- “Jsdataframe 是一个 JavaScript 数据整理库,其灵感来自 R 和 Python Pandas 中的数据框功能。”
- 数据框
- “通过分组和减少来探索数据。”
Then after coming to this question, checking other answers here and doing more searching, I found options like:
然后在来到这个问题之后,在这里检查其他答案并进行更多搜索,我找到了以下选项:
- Apache Arrow in JS
- Thanks to user Back2Basics suggestion:
- "Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...)"
- Observable
- At first glance, seems like a
JS
alternative to the IPython/Jupyter "notebooks" - Observable's page promises: "Reactive programming", a "Community", on a "Web Platform"
- See 5 minute intro here
- At first glance, seems like a
- recline(from Rufus' answer)
- I expected an emphasis on DataFrame's API, which Pandas itself tries to
preserve from Rdocument its replacement/improvement/correspondence to every R function. - Instead I find an emphasis recline's example emphasizes
the jQuery way of getting data into the DOMits (awesome) Multiview (the UI), which doesn't require jQuery but does require a browser! More examples - ...or an emphasis on its MVC-ish architecture; including back-end stuff (i.e. database connections)
- I am probably being too harsh; after all, one of the nice things about pandas is how it can create visualizations easily; out-of-the-box.
- I expected an emphasis on DataFrame's API, which Pandas itself tries to
- js-data
- Really more of an ORM! Most of its modulescorrespond to different data storagequestions (
js-data-mongodb
,js-data-redis
,js-data-cloud-datastore
), sorting, filtering, etc. - On plus-side does work on Node.js as a first-priority; "Works in Node.js and in the Browser."
- Really more of an ORM! Most of its modulescorrespond to different data storagequestions (
- miso(another suggestion from Rufus)
- AlaSQL
- "AlaSQL" is an open source SQL database for Javascript with a strong focus on query speed and data source flexibility for both relational data and schemaless data. It works in your browser, Node.js, and Cordova."
- Some thought experiments:
- JS 中的 Apache Arrow
- 感谢用户 Back2Basics 建议:
- “Apache Arrow 是一种列式内存布局规范,用于编码向量以及平面和嵌套数据的类似表格的容器。Apache Arrow 是大型内存列式数据(Spark、Pandas、Drill、Graphistry 等)的新兴标准。”
- 可观察的
- 斜倚(来自鲁弗斯的回答)
- 我希望重点放在 DataFrame 的 API 上,Pandas 本身也试图这样做
从 R 中保留记录它对每个 R 函数的替换/改进/对应。 - 相反,我发现强调斜倚的例子强调
将数据放入 DOM 的 jQuery 方式它的(很棒的)Multiview(UI),不需要 jQuery,但需要浏览器!更多例子 - ...或强调其MVC-ish 架构;包括后端的东西(即数据库连接)
- 我可能太苛刻了;毕竟,pandas 的优点之一是它可以轻松创建可视化;盒子外面。
- 我希望重点放在 DataFrame 的 API 上,Pandas 本身也试图这样做
- js-数据
- 味噌(来自鲁弗斯的另一个建议)
- 阿拉SQL
- “AlaSQL”是一个用于 Javascript 的开源 SQL 数据库,非常注重关系数据和无模式数据的查询速度和数据源灵活性。它适用于您的浏览器、Node.js 和 Cordova。”
- 一些思想实验:
I hope this post can become a community wiki, and evaluate (i.e. compare the different options above) against different criteria like:
我希望这篇文章可以成为社区维基,并根据不同的标准进行评估(即比较上面的不同选项),例如:
- Panda's criterias in its R comparison
- Performance
- Functionality/flexibility
- Ease-of-use
- My own suggestions
- Similarity to Pandas / Dataframe API's
- Specifically hits on their main features
- Data-science emphasis > UI emphasis
- Demonstrated integration in combination with other tools like
Jupyter
(interactive notebooks), etc
- 熊猫在其R 比较中的标准
- 表现
- 功能/灵活性
- 便于使用
- 我自己的建议
- 与 Pandas / Dataframe API 的相似之处
- 特别是他们的主要特点
- 数据科学重点 > UI 重点
- 与其他工具
Jupyter
(交互式笔记本)等结合的演示集成
Some things a JS library may never do (but could it?)
JS 库可能永远不会做的一些事情(但可以吗?)
- Use an underlying framework that is best-in-class Javascript numbers/math library? (i.e. an equivalent of a NumPy)
- Use any optimizing/compilers that might result in faster code (i.e. an equivalent of Pandas' use of Cython)
- Sponsored by any data-science-flavored consortiums, ala Pandas and NumFocus
- 使用一流的 Javascript 数字/数学库的底层框架?(即相当于NumPy)
- 使用任何可能导致更快代码的优化/编译器(即相当于 Pandas 对Cython的使用)
- 由任何具有数据科学风味的财团赞助,ala Pandas 和 NumFocus
回答by Ahmed Fasih
CeaveatThe following is applicable only to d3 v3, and not the latest d4v4!
Ceaveat以下仅适用于d3 v3,不适用于最新的d4v4!
I am partial to d3.js, and while it won't be a total replacement for Pandas, if you spend some time learning its paradigm, it should be able to take care of all your data wrangling for you. (And if you wind up wanting to display results in the browser, it's ideally suited to that.)
我偏爱d3.js,虽然它不会完全替代 Pandas,但如果你花一些时间学习它的范式,它应该能够为你处理所有的数据争吵。(如果你最终想要在浏览器中显示结果,它非常适合。)
Example. My CSV file data.csv
:
例子。我的 CSV 文件data.csv
:
name,age,color
Mickey,65,black
Donald,58,white
Pluto,64,orange
In the same directory, create an index.html
containing the following:
在同一目录中,创建一个index.html
包含以下内容:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>My D3 demo</title>
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
</head>
<body>
<script charset="utf-8" src="demo.js"></script>
</body>
</html>
and also a demo.js
file containing the following:
以及一个demo.js
包含以下内容的文件:
d3.csv('/data.csv',
// How to format each row. Since the CSV file has a header, `row` will be
// an object with keys derived from the header.
function(row) {
return {name : row.name, age : +row.age, color : row.color};
},
// Callback to run once all data's loaded and ready.
function(data) {
// Log the data to the JavaScript console
console.log(data);
// Compute some interesting results
var averageAge = data.reduce(function(prev, curr) {
return prev + curr.age;
}, 0) / data.length;
// Also, display it
var ulSelection = d3.select('body').append('ul');
var valuesSelection =
ulSelection.selectAll('li').data(data).enter().append('li').text(
function(d) { return d.age; });
var totalSelection =
ulSelection.append('li').text('Average: ' + averageAge);
});
In the directory, run python -m SimpleHTTPServer 8181
, and open http://localhost:8181in your browser to see a simple listing of the ages and their average.
在目录中,运行python -m SimpleHTTPServer 8181
并在浏览器中打开http://localhost:8181以查看年龄及其平均值的简单列表。
This simple example shows a few relevant features of d3:
这个简单的例子展示了 d3 的一些相关特性:
- Excellent support for ingesting online data (CSV, TSV, JSON, etc.)
- Data wrangling smarts baked in
- Data-driven DOM manipulation (maybe the hardest thing to wrap one's head around): your data gets transformed into DOM elements.
- 对摄取在线数据(CSV、TSV、JSON 等)的出色支持
- 数据争吵的智慧诞生
- 数据驱动的 DOM 操作(可能是最难理解的事情):您的数据被转换为 DOM 元素。
回答by Steve K
It's pretty easy to parse CSV in javascript because each line's already essentially a javascript array. If you load your csv into an array of strings (one per line) it's pretty easy to load an array of arrays with the values:
在 javascript 中解析 CSV 非常容易,因为每一行本质上已经是一个 javascript 数组。如果您将 csv 加载到字符串数组(每行一个)中,则很容易加载包含值的数组数组:
var pivot = function(data){
var result = [];
for (var i = 0; i < data.length; i++){
for (var j=0; j < data[i].length; j++){
if (i === 0){
result[j] = [];
}
result[j][i] = data[i][j];
}
}
return result;
};
var getData = function() {
var csvString = $(".myText").val();
var csvLines = csvString.split(/\n?$/m);
var dataTable = [];
for (var i = 0; i < csvLines.length; i++){
var values;
eval("values = [" + csvLines[i] + "]");
dataTable[i] = values;
}
return pivot(dataTable);
};
Then getData()
returns a multidimensional array of values by column.
然后getData()
按列返回值的多维数组。
I've demonstrated this in a jsFiddlefor you.
我已经在jsFiddle 中为您演示了这一点。
Of course, you can't do it quite this easily if you don't trust the input - if there could be script in your data which eval might pick up, etc.
当然,如果您不信任输入,您就不能这么容易地做到这一点 - 如果您的数据中可能存在 eval 可能会拾取的脚本等。
回答by Rufus Pollock
I think the closest thing are libraries like:
我认为最接近的是图书馆,如:
Recline in particular has a Dataset object with a structure somewhat similar to Pandas data frames. It then allows you to connect your data with "Views" such as a data grid, graphing, maps etc. Views are usually thin wrappers around existing best of breed visualization libraries such as D3, Flot, SlickGrid etc.
特别是 Recline 有一个 Dataset 对象,其结构有点类似于 Pandas 数据框。然后,它允许您将数据与“视图”连接起来,例如数据网格、图形、地图等。视图通常是围绕现有最佳可视化库(例如 D3、Flot、SlickGrid 等)的薄包装。
Here's an example for Recline:
以下是 Recline 的示例:
// Load some data var dataset = recline.Model.Dataset({ records: [ { value: 1, date: '2012-08-07' }, { value: 5, b: '2013-09-07' } ] // Load CSV data instead // (And Recline has support for many more data source types) // url: 'my-local-csv-file.csv', // backend: 'csv' }); // get an element from your HTML for the viewer var $el = $('#data-viewer'); var allInOneDataViewer = new recline.View.MultiView({ model: dataset, el: $el }); // Your new Data Viewer will be live!
回答by Ashley Davis
I've been working on a data wrangling library for JavaScript called data-forge. It's inspired by LINQ and Pandas.
我一直在为 JavaScript 开发一个名为 data-forge 的数据整理库。它的灵感来自 LINQ 和 Pandas。
It can be installed like this:
它可以像这样安装:
npm install --save data-forge
Your example would work like this:
你的例子会像这样工作:
var csvData = "Source,col1,col2,col3\n" +
"foo,1,2,3\n" +
"bar,3,4,5\n";
var dataForge = require('data-forge');
var dataFrame =
dataForge.fromCSV(csvData)
.parseInts([ "col1", "col2", "col3" ])
;
If your data was in a CSV file you could load it like this:
如果您的数据在 CSV 文件中,您可以像这样加载它:
var dataFrame = dataForge.readFileSync(fileName)
.parseCSV()
.parseInts([ "col1", "col2", "col3" ])
;
You can use the select
method to transform rows.
您可以使用该select
方法来转换行。
You can extract a column using getSeries
then use the select
method to transform values in that column.
您可以getSeries
使用该select
方法提取一列,然后使用该方法转换该列中的值。
You get your data back out of the data-frame like this:
您可以像这样从数据框中取回数据:
var data = dataFrame.toArray();
To average a column:
要平均一列:
var avg = dataFrame.getSeries("col1").average();
There is much more you can do with this.
你可以用它做更多的事情。
You can find more documentation on npm.
您可以在npm上找到更多文档。
回答by Manuel
Here is an dynamic approach assuming an existing header on line 1. The csv is loaded with d3.js
.
这是假设第 1 行存在标题的动态方法。 csv 加载了d3.js
.
function csvToColumnArrays(csv) {
var mainObj = {},
header = Object.keys(csv[0]);
for (var i = 0; i < header.length; i++) {
mainObj[header[i]] = [];
};
csv.map(function(d) {
for (key in mainObj) {
mainObj[key].push(d[key])
}
});
return mainObj;
}
d3.csv(path, function(csv) {
var df = csvToColumnArrays(csv);
});
Then you are able to access each column of the data similar an R, python or Matlab dataframe with df.column_header[row_number]
.
然后您就可以访问类似于 R、python 或 Matlab 数据框的数据的每一列df.column_header[row_number]
。
回答by STEEL
Below is Python numpy and pandas
下面是 Python numpy 和 pandas
```
``
import numpy as np
import pandas as pd
data_frame = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
data_frame[5] = np.random.randint(1, 50, 5)
print(data_frame.loc[['C', 'D'], [2, 3]])
# axis 1 = Y | 0 = X
data_frame.drop(5, axis=1, inplace=True)
print(data_frame)
```
``
The same can be achieved in JavaScript* [numjs works only with Node.js] But D3.js has much advanced Data file set options. Both numjs and Pandas-js still in works..
同样可以在 JavaScript* 中实现 [ numjs 仅适用于 Node.js] 但 D3.js 具有许多高级数据文件集选项。numjs 和 Pandas-js 仍在工作中。
import np from 'numjs';
import { DataFrame } from 'pandas-js';
const df = new DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
// df
/*
1 2 3 4
A 0.023126 1.078130 -0.521409 -1.480726
B 0.920194 -0.201019 0.028180 0.558041
C -0.650564 -0.505693 -0.533010 0.441858
D -0.973549 0.095626 -1.302843 1.109872
E -0.989123 -1.382969 -1.682573 -0.637132
*/