从 Numpy 3d 数组有效地创建 Pandas DataFrame

Question

提问by Ami Tavory

Suppose we start with

假设我们从

import numpy as np
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

How can this be efficiently be made into a pandas DataFrame equivalent to

如何有效地将其制作成相当于的 Pandas DataFrame

import pandas as pd
>>> pd.DataFrame({'a': [0, 0, 1, 1], 'b': [1, 3, 5, 7], 'c': [2, 4, 6, 8]})

   a  b  c
0  0  1  2
1  0  3  4
2  1  5  6
3  1  7  8

The idea is to have the acolumn have the index in the first dimension in the original array, and the rest of the columns be a vertical concatenation of the 2d arrays in the latter two dimensions in the original array.

这个想法是让a列在原始数组的第一维中具有索引，其余的列是原始数组中后两个维度中二维数组的垂直串联。

(This is easy to do with loops; the question is how to do it without them.)

（用循环很容易做到这一点；问题是没有它们怎么办。）

Longer Example

更长的例子

Using @Divakar's excellent suggestion:

使用@Divakar 的绝妙建议：

>>> np.random.randint(0,9,(4,3,2))
array([[[0, 6],
    [6, 4],
    [3, 4]],

   [[5, 1],
    [1, 3],
    [6, 4]],

   [[8, 0],
    [2, 3],
    [3, 1]],

   [[2, 2],
    [0, 0],
    [6, 3]]])

Should be made to something like:

应该是这样的：

>>> pd.DataFrame({
    'a': [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], 
    'b': [0, 6, 3, 5, 1, 6, 8, 2, 3, 2, 0, 6], 
    'c': [6, 4, 4, 1, 3, 4, 0, 3, 1, 2, 0, 3]})
    a  b  c
0   0  0  6
1   0  6  4
2   0  3  4
3   1  5  1
4   1  1  3
5   1  6  4
6   2  8  0
7   2  2  3
8   2  3  1
9   3  2  2
10  3  0  0
11  3  6  3

Answer 1

回答by Divakar

Here's one approach that does most of the processing on NumPy before finally putting it out as a DataFrame, like so -

这是一种在 NumPy 上进行大部分处理的方法，然后最终将其作为 DataFrame 发布，如下所示 -

m,n,r = a.shape
out_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))
out_df = pd.DataFrame(out_arr)

If you precisely know that the number of columns would be 2, such that we would have band cas the last two columns and aas the first one, you can add column names like so -

如果您确切地知道列数为2，那么我们将把b和c作为最后两列和a第一列，您可以像这样添加列名 -

out_df = pd.DataFrame(out_arr,columns=['a', 'b', 'c'])

Sample run -

样品运行 -

>>> a
array([[[2, 0],
        [1, 7],
        [3, 8]],

       [[5, 0],
        [0, 7],
        [8, 0]],

       [[2, 5],
        [8, 2],
        [1, 2]],

       [[5, 3],
        [1, 6],
        [3, 2]]])
>>> out_df
    a  b  c
0   0  2  0
1   0  1  7
2   0  3  8
3   1  5  0
4   1  0  7
5   1  8  0
6   2  2  5
7   2  8  2
8   2  1  2
9   3  5  3
10  3  1  6
11  3  3  2

Answer 2

回答by B. M.

Using Panel:

使用Panel：

a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
b=pd.Panel(rollaxis(a,2)).to_frame()
c=b.set_index(b.index.labels[0]).reset_index()
c.columns=list('abc')

then ais :

然后a是：

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

bis :

b是：

             0  1
major minor      
0     0      1  2
      1      3  4
1     0      5  6
      1      7  8

and cis :

并且c是：

从 Numpy 3d 数组有效地创建 Pandas DataFrame

提问by Ami Tavory

回答by Divakar

回答by B. M.

相关推荐

最近更新

标签

从 Numpy 3d 数组有效地创建 Pandas DataFrame

提问by Ami Tavory

回答by Divakar

回答by B. M.

相关推荐

pandas 熊猫行到 json

pandas python中的并行处理

Python Pandas，从 .groupby().apply() 中的组切片行

Python Pandas：如何分组并为组中的所有项目分配一个 id？

相关推荐

最近更新

标签