从 Numpy 3d 数组有效地创建 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36235180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:56:24  来源:igfitidea点击:

Efficiently Creating A Pandas DataFrame From A Numpy 3d array

numpypandasmultidimensional-arrayvectorization

提问by Ami Tavory

Suppose we start with

假设我们从

import numpy as np
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

How can this be efficiently be made into a pandas DataFrame equivalent to

如何有效地将其制作成相当于的 Pandas DataFrame

import pandas as pd
>>> pd.DataFrame({'a': [0, 0, 1, 1], 'b': [1, 3, 5, 7], 'c': [2, 4, 6, 8]})

   a  b  c
0  0  1  2
1  0  3  4
2  1  5  6
3  1  7  8

The idea is to have the acolumn have the index in the first dimension in the original array, and the rest of the columns be a vertical concatenation of the 2d arrays in the latter two dimensions in the original array.

这个想法是让a列在原始数组的第一维中具有索引,其余的列是原始数组中后两个维度中二维数组的垂直串联。

(This is easy to do with loops; the question is how to do it without them.)

(用循环很容易做到这一点;问题是没有它们怎么办。)



Longer Example

更长的例子

Using @Divakar's excellent suggestion:

使用@Divakar 的绝妙建议:

>>> np.random.randint(0,9,(4,3,2))
array([[[0, 6],
    [6, 4],
    [3, 4]],

   [[5, 1],
    [1, 3],
    [6, 4]],

   [[8, 0],
    [2, 3],
    [3, 1]],

   [[2, 2],
    [0, 0],
    [6, 3]]])

Should be made to something like:

应该是这样的:

>>> pd.DataFrame({
    'a': [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], 
    'b': [0, 6, 3, 5, 1, 6, 8, 2, 3, 2, 0, 6], 
    'c': [6, 4, 4, 1, 3, 4, 0, 3, 1, 2, 0, 3]})
    a  b  c
0   0  0  6
1   0  6  4
2   0  3  4
3   1  5  1
4   1  1  3
5   1  6  4
6   2  8  0
7   2  2  3
8   2  3  1
9   3  2  2
10  3  0  0
11  3  6  3

回答by Divakar

Here's one approach that does most of the processing on NumPy before finally putting it out as a DataFrame, like so -

这是一种在 NumPy 上进行大部分处理的方法,然后最终将其作为 DataFrame 发布,如下所示 -

m,n,r = a.shape
out_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))
out_df = pd.DataFrame(out_arr)

If you precisely know that the number of columns would be 2, such that we would have band cas the last two columns and aas the first one, you can add column names like so -

如果您确切地知道列数为2,那么我们将把bc作为最后两列和a第一列,您可以像这样添加列名 -

out_df = pd.DataFrame(out_arr,columns=['a', 'b', 'c'])

Sample run -

样品运行 -

>>> a
array([[[2, 0],
        [1, 7],
        [3, 8]],

       [[5, 0],
        [0, 7],
        [8, 0]],

       [[2, 5],
        [8, 2],
        [1, 2]],

       [[5, 3],
        [1, 6],
        [3, 2]]])
>>> out_df
    a  b  c
0   0  2  0
1   0  1  7
2   0  3  8
3   1  5  0
4   1  0  7
5   1  8  0
6   2  2  5
7   2  8  2
8   2  1  2
9   3  5  3
10  3  1  6
11  3  3  2

回答by B. M.

Using Panel:

使用Panel

a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
b=pd.Panel(rollaxis(a,2)).to_frame()
c=b.set_index(b.index.labels[0]).reset_index()
c.columns=list('abc')

then ais :

然后a是:

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

bis :

b是 :

             0  1
major minor      
0     0      1  2
      1      3  4
1     0      5  6
      1      7  8

and cis :

并且c是:

   a  b  c
0  0  1  2
1  0  3  4
2  1  5  6
3  1  7  8