JavaScript 中的 Python Pandas 等价物

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30610675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:41:58  来源:igfitidea点击:

Python Pandas equivalent in JavaScript

javascriptpythonpandas

提问by neversaint

With this CSV example:

使用此 CSV 示例:

   Source,col1,col2,col3
   foo,1,2,3
   bar,3,4,5

The standard method I use Pandas is this:

我使用 Pandas 的标准方法是这样的:

  1. Parse CSV

  2. Select columns into a data frame (col1and col3)

  3. Process the column (e.g. avarage the values of col1and col3)
  1. 解析CSV

  2. 将列选择到数据框中(col1col3

  3. 处理该柱(例如的avarage值 col1col3

Is there a JavaScript library that does that like Pandas?

有没有像 Pandas 那样的 JavaScript 库?

采纳答案by The Red Pea

All answers are good. Hoping my answer is comprehensive (i.e. tries to list alloptions). I hope to return and revise this answer with any criteria to help make a choice.

所有的答案都很好。希望我的回答是全面的(即尝试列出所有选项)。我希望以任何标准返回并修改此答案以帮助做出选择。

I hope anyone coming here is familiar with d3. d3is very useful "swiss army knife" for handling data in Javascript, like pandasis helpful for Python. You may see d3used frequently like pandas, even if d3is not exactly a DataFrame/Pandas replacement(i.e. d3doesn't have the same API; d3doesn't have Series/ DataFramewhich behave like in pandas)

我希望任何来这里的人都熟悉d3. d3在 Javascript 中处理数据是非常有用的“瑞士军刀”,就像pandas对 Python 有帮助一样。您可能会看到d3经常使用 like pandas,即使d3不完全是 DataFrame/Pandas 的替代品(即d3没有相同的 API;d3没有Series/ 的DataFrame行为类似于pandas

Ahmed's answer explains how d3 can be usedto achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsDatawhich uses d3and lodash.

Ahmed 的回答解释了如何使用 d3来实现一些 DataFrame 功能,下面的一些库的灵感来自LearnJsData 之类的东西,它使用d3lodash

As for DataFrame-focused-features , I was overwhelmed with JS libraries which help. Here's a quick list of some of the options you might've encountered. I haven't checked any of them in detail yet (Most I found in combination Google + NPM search).

至于 DataFrame-focused-features ,我对 JS 库的帮助感到不知所措。这是您可能遇到的一些选项的快速列表。我还没有详细检查它们中的任何一个(大多数是我在 Google + NPM 搜索中找到的)。

Be careful you use a variety that you can work with; some are Node.js aka Server-side Javascript, some are browser-compatible aka client-side Javascript. Some are Typescript.

小心使用可以使用的品种;有些是 Node.js 又名服务器端 Javascript,有些是浏览器兼容的又名客户端 Javascript。有些是打字稿。

  • pandas-js
    • From STEELand Feras' answers
    • "pandas.js is an open source (experimental) library mimicking the Python pandas library. It relies on Immutable.js as the NumPy logical equivalent. The main data objects in pandas.js are, like in Python pandas, the Series and the DataFrame."
  • dataframe-js
    • "DataFrame-js provides an immutable data structure for javascript and datascience, the DataFrame, which allows to work on rows and columns with a sql and functional programming inspired api."
  • data-forge
  • jsdataframe
    • "Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas."
  • dataframe
    • "explore data by grouping and reducing."
  • 熊猫-js
    • 来自STEELFeras的回答
    • “pandas.js 是一个模仿 Python pandas 库的开源(实验性)库。它依赖于 Immutable.js 作为 NumPy 逻辑等价物。pandas.js 中的主要数据对象是,就像在 Python pandas 中一样,Series 和 DataFrame .”
  • 数据框-js
    • “DataFrame-js 为 javascript 和数据科学提供了一个不可变的数据结构,即 DataFrame,它允许使用受 sql 和函数式编程启发的 api 处理行和列。”
  • 数据伪造
  • js数据框
    • “Jsdataframe 是一个 JavaScript 数据整理库,其灵感来自 R 和 Python Pandas 中的数据框功能。”
  • 数据框
    • “通过分组和减少来探索数据。”

Then after coming to this question, checking other answers here and doing more searching, I found options like:

然后在来到这个问题之后,在这里检查其他答案并进行更多搜索,我找到了以下选项:

  • Apache Arrow in JS
    • Thanks to user Back2Basics suggestion:
    • "Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...)"
  • Observable
    • At first glance, seems like a JSalternative to the IPython/Jupyter "notebooks"
    • Observable's page promises: "Reactive programming", a "Community", on a "Web Platform"
    • See 5 minute intro here
  • recline(from Rufus' answer)
    • I expected an emphasis on DataFrame's API, which Pandas itself tries to preserve from Rdocument its replacement/improvement/correspondence to every R function.
    • Instead I find an emphasis recline's example emphasizes the jQuery way of getting data into the DOMits (awesome) Multiview (the UI), which doesn't require jQuery but does require a browser! More examples
    • ...or an emphasis on its MVC-ish architecture; including back-end stuff (i.e. database connections)
    • I am probably being too harsh; after all, one of the nice things about pandas is how it can create visualizations easily; out-of-the-box.
  • js-data
    • Really more of an ORM! Most of its modulescorrespond to different data storagequestions (js-data-mongodb, js-data-redis, js-data-cloud-datastore), sorting, filtering, etc.
    • On plus-side does work on Node.js as a first-priority; "Works in Node.js and in the Browser."
  • miso(another suggestion from Rufus)
  • AlaSQL
    • "AlaSQL" is an open source SQL database for Javascript with a strong focus on query speed and data source flexibility for both relational data and schemaless data. It works in your browser, Node.js, and Cordova."
  • Some thought experiments:
  • JS 中的 Apache Arrow
    • 感谢用户 Back2Basics 建议:
    • “Apache Arrow 是一种列式内存布局规范,用于编码向量以及平面和嵌套数据的类似表格的容器。Apache Arrow 是大型内存列式数据(Spark、Pandas、Drill、Graphistry 等)的新兴标准。”
  • 可观察的
    • 乍一看,似乎JS是 IPython/Jupyter “笔记本”的替代品
    • Observable 的页面承诺:“响应式编程”、“社区”、“Web 平台”
    • 在此处查看 5 分钟介绍
  • 斜倚(来自鲁弗斯的回答
    • 我希望重点放在 DataFrame 的 API 上,Pandas 本身也试图这样做 从 R 中保留记录它对每个 R 函数的替换/改进/对应
    • 相反,我发现强调斜倚的例子强调 将数据放入 DOM 的 jQuery 方式它的(很棒的)Multiview(UI),不需要 jQuery,但需要浏览器!更多例子
    • ...或强调其MVC-ish 架构;包括后端的东西(即数据库连接)
    • 我可能太苛刻了;毕竟,pandas 的优点之一是它可以轻松创建可视化;盒子外面。
  • js-数据
    • 真的更像是一个ORM!它的大部分模块对应不同的数据存储问题(js-data-mongodbjs-data-redisjs-data-cloud-datastore)、排序、过滤等。
    • 好的一面是在 Node.js 上工作作为第一优先;“适用于 Node.js 和浏览器。”
  • 味噌(来自鲁弗斯的另一个建议)
  • 阿拉SQL
    • “AlaSQL”是一个用于 Javascript 的开源 SQL 数据库,非常注重关系数据和无模式数据的查询速度和数据源灵活性。它适用于您的浏览器、Node.js 和 Cordova。”
  • 一些思想实验:

I hope this post can become a community wiki, and evaluate (i.e. compare the different options above) against different criteria like:

我希望这篇文章可以成为社区维基,并根据不同的标准进行评估(即比较上面的不同选项),例如:

  • Panda's criterias in its R comparison
    • Performance
    • Functionality/flexibility
    • Ease-of-use
  • My own suggestions
    • Similarity to Pandas / Dataframe API's
    • Specifically hits on their main features
    • Data-science emphasis > UI emphasis
    • Demonstrated integration in combination with other tools like Jupyter(interactive notebooks), etc
  • 熊猫在其R 比较中的标准
    • 表现
    • 功能/灵活性
    • 便于使用
  • 我自己的建议
    • 与 Pandas / Dataframe API 的相似之处
    • 特别是他们的主要特点
    • 数据科学重点 > UI 重点
    • 与其他工具Jupyter(交互式笔记本)等结合的演示集成

Some things a JS library may never do (but could it?)

JS 库可能永远不会做的一些事情(但可以吗?)

回答by Ahmed Fasih

CeaveatThe following is applicable only to d3 v3, and not the latest d4v4!

Ceaveat以下仅适用于d3 v3,不适用于最新的d4v4!

I am partial to d3.js, and while it won't be a total replacement for Pandas, if you spend some time learning its paradigm, it should be able to take care of all your data wrangling for you. (And if you wind up wanting to display results in the browser, it's ideally suited to that.)

我偏爱d3.js,虽然它不会完全替代 Pandas,但如果你花一些时间学习它的范式,它应该能够为你处理所有的数据争吵。(如果你最终想要在浏览器中显示结果,它非常适合。)

Example. My CSV file data.csv:

例子。我的 CSV 文件data.csv

name,age,color
Mickey,65,black
Donald,58,white
Pluto,64,orange

In the same directory, create an index.htmlcontaining the following:

在同一目录中,创建一个index.html包含以下内容:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8"/>
    <title>My D3 demo</title>

    <script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
  </head>
  <body>

      <script charset="utf-8" src="demo.js"></script>
  </body>
</html>

and also a demo.jsfile containing the following:

以及一个demo.js包含以下内容的文件:

d3.csv('/data.csv',

       // How to format each row. Since the CSV file has a header, `row` will be
       // an object with keys derived from the header.
       function(row) {
         return {name : row.name, age : +row.age, color : row.color};
       },

       // Callback to run once all data's loaded and ready.
       function(data) {
         // Log the data to the JavaScript console
         console.log(data);

         // Compute some interesting results
         var averageAge = data.reduce(function(prev, curr) {
           return prev + curr.age;
         }, 0) / data.length;

         // Also, display it
         var ulSelection = d3.select('body').append('ul');
         var valuesSelection =
             ulSelection.selectAll('li').data(data).enter().append('li').text(
                 function(d) { return d.age; });
         var totalSelection =
             ulSelection.append('li').text('Average: ' + averageAge);
       });

In the directory, run python -m SimpleHTTPServer 8181, and open http://localhost:8181in your browser to see a simple listing of the ages and their average.

在目录中,运行python -m SimpleHTTPServer 8181并在浏览器中打开http://localhost:8181以查看年龄及其平均值的简单列表。

This simple example shows a few relevant features of d3:

这个简单的例子展示了 d3 的一些相关特性:

  • Excellent support for ingesting online data (CSV, TSV, JSON, etc.)
  • Data wrangling smarts baked in
  • Data-driven DOM manipulation (maybe the hardest thing to wrap one's head around): your data gets transformed into DOM elements.
  • 对摄取在线数据(CSV、TSV、JSON 等)的出色支持
  • 数据争吵的智慧诞生
  • 数据驱动的 DOM 操作(可能是最难理解的事情):您的数据被转换为 DOM 元素。

回答by Steve K

It's pretty easy to parse CSV in javascript because each line's already essentially a javascript array. If you load your csv into an array of strings (one per line) it's pretty easy to load an array of arrays with the values:

在 javascript 中解析 CSV 非常容易,因为每一行本质上已经是一个 javascript 数组。如果您将 csv 加载到字符串数组(每行一个)中,则很容易加载包含值的数组数组:

var pivot = function(data){
    var result = [];
    for (var i = 0; i < data.length; i++){
        for (var j=0; j < data[i].length; j++){
            if (i === 0){
                result[j] = [];
            }
            result[j][i] = data[i][j];
        }
    }
    return result;
};

var getData = function() {
    var csvString = $(".myText").val();
    var csvLines = csvString.split(/\n?$/m);

    var dataTable = [];

    for (var i = 0; i < csvLines.length; i++){
        var values;
        eval("values = [" + csvLines[i] + "]");
        dataTable[i] = values;
    }

    return pivot(dataTable);
};

Then getData()returns a multidimensional array of values by column.

然后getData()按列返回值的多维数组。

I've demonstrated this in a jsFiddlefor you.

我已经在jsFiddle 中为您演示了这一点。

Of course, you can't do it quite this easily if you don't trust the input - if there could be script in your data which eval might pick up, etc.

当然,如果您不信任输入,您就不能这么容易地做到这一点 - 如果您的数据中可能存在 eval 可能会拾取的脚本等。

回答by Rufus Pollock

I think the closest thing are libraries like:

我认为最接近的是图书馆,如:

Recline in particular has a Dataset object with a structure somewhat similar to Pandas data frames. It then allows you to connect your data with "Views" such as a data grid, graphing, maps etc. Views are usually thin wrappers around existing best of breed visualization libraries such as D3, Flot, SlickGrid etc.

特别是 Recline 有一个 Dataset 对象,其结构有点类似于 Pandas 数据框。然后,它允许您将数据与“视图”连接起来,例如数据网格、图形、地图等。视图通常是围绕现有最佳可视化库(例如 D3、Flot、SlickGrid 等)的薄包装。

Here's an example for Recline:

以下是 Recline 的示例:

// Load some data
var dataset = recline.Model.Dataset({
  records: [
    { value: 1, date: '2012-08-07' },
    { value: 5, b: '2013-09-07' }
  ]
  // Load CSV data instead
  // (And Recline has support for many more data source types)
  // url: 'my-local-csv-file.csv',
  // backend: 'csv'
});

// get an element from your HTML for the viewer
var $el = $('#data-viewer');

var allInOneDataViewer = new recline.View.MultiView({
  model: dataset,
  el: $el
});
// Your new Data Viewer will be live!

回答by Ashley Davis

I've been working on a data wrangling library for JavaScript called data-forge. It's inspired by LINQ and Pandas.

我一直在为 JavaScript 开发一个名为 data-forge 的数据整理库。它的灵感来自 LINQ 和 Pandas。

It can be installed like this:

它可以像这样安装:

npm install --save data-forge

Your example would work like this:

你的例子会像这样工作:

var csvData = "Source,col1,col2,col3\n" +
    "foo,1,2,3\n" +
    "bar,3,4,5\n";

var dataForge = require('data-forge');
var dataFrame = 
    dataForge.fromCSV(csvData)
        .parseInts([ "col1", "col2", "col3" ])
        ;

If your data was in a CSV file you could load it like this:

如果您的数据在 CSV 文件中,您可以像这样加载它:

var dataFrame = dataForge.readFileSync(fileName)
    .parseCSV()
    .parseInts([ "col1", "col2", "col3" ])
    ;

You can use the selectmethod to transform rows.

您可以使用该select方法来转换行。

You can extract a column using getSeriesthen use the selectmethod to transform values in that column.

您可以getSeries使用该select方法提取一列,然后使用该方法转换该列中的值。

You get your data back out of the data-frame like this:

您可以像这样从数据框中取回数据:

var data = dataFrame.toArray();

To average a column:

要平均一列:

 var avg = dataFrame.getSeries("col1").average();

There is much more you can do with this.

你可以用它做更多的事情。

You can find more documentation on npm.

您可以在npm上找到更多文档。

回答by Manuel

Here is an dynamic approach assuming an existing header on line 1. The csv is loaded with d3.js.

这是假设第 1 行存在标题的动态方法。 csv 加载了d3.js.

function csvToColumnArrays(csv) {

    var mainObj = {},
    header = Object.keys(csv[0]);

    for (var i = 0; i < header.length; i++) {

        mainObj[header[i]] = [];
    };

    csv.map(function(d) {

        for (key in mainObj) {
            mainObj[key].push(d[key])
        }

    });        

    return mainObj;

}


d3.csv(path, function(csv) {

    var df = csvToColumnArrays(csv);         

});

Then you are able to access each column of the data similar an R, python or Matlab dataframe with df.column_header[row_number].

然后您就可以访问类似于 R、python 或 Matlab 数据框的数据的每一列df.column_header[row_number]

回答by STEEL

Below is Python numpy and pandas

下面是 Python numpy 和 pandas

```

``

import numpy as np
import pandas as pd

data_frame = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])

data_frame[5] = np.random.randint(1, 50, 5)

print(data_frame.loc[['C', 'D'], [2, 3]])

# axis 1 = Y | 0 = X
data_frame.drop(5, axis=1, inplace=True)

print(data_frame)

```

``

The same can be achieved in JavaScript* [numjs works only with Node.js] But D3.js has much advanced Data file set options. Both numjs and Pandas-js still in works..

同样可以在 JavaScript* 中实现 [ numjs 仅适用于 Node.js] 但 D3.js 具有许多高级数据文件集选项。numjs 和 Pandas-js 仍在工作中。

import np from 'numjs';
import { DataFrame } from 'pandas-js';

const df = new DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])

// df
/*

          1         2         3         4
A  0.023126  1.078130 -0.521409 -1.480726
B  0.920194 -0.201019  0.028180  0.558041
C -0.650564 -0.505693 -0.533010  0.441858
D -0.973549  0.095626 -1.302843  1.109872
E -0.989123 -1.382969 -1.682573 -0.637132

*/

回答by Feras

Pandas.jsat the moment is an experimental library, but seems very promising it uses under the hood immutable.js and NumpPy logic, both data objects series and DataFrame are there..

Pandas.js目前是一个实验性的库,但它看起来很有前途,它在 immutable.js 和 NumpPy 逻辑下使用,数据对象系列和 DataFrame 都在那里。