具有 MultiIndex 到 Numpy 矩阵的 Pandas DataFrame

Question

提问by Ty Pavicich

I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...)but this matrix has shape (n_rows, 1). I want a matrix of shape (n_index1_rows, n_index2_rows, 1).

我有一个带有 2 个索引的 Pandas DataFrame。(MultiIndex) 我想用类似的东西取出一个 Numpy 矩阵，df.as_matrix(...)但这个矩阵有 shape (n_rows, 1)。我想要一个形状矩阵(n_index1_rows, n_index2_rows, 1)。

Is there a way to use .groupby(...)then a .values.tolist()or .as_matrix(...)to get the desired shape?

有没有办法使用.groupby(...)then a.values.tolist()或.as_matrix(...)获得所需的形状？

EDIT: Data

编辑：数据

                                                              value  
current_date                  temp_date                                        
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489   30.497100   
                              1970-01-01 00:00:01.446237494    9.584300   
                              1970-01-01 00:00:01.446237455   10.134200   
                              1970-01-01 00:00:01.446237494    7.803683   
                              1970-01-01 00:00:01.446237400   10.678700   
                              1970-01-01 00:00:01.446237373    9.700000   
                              1970-01-01 00:00:01.446237180   15.000000   
                              1970-01-01 00:00:01.446236961   12.928866   
                              1970-01-01 00:00:01.446237032   10.458800

This is kind of the idea:

这是一种想法：

np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])

Answer 1

回答by jakevdp

I think what you want is to unstack the multiindex, e.g.

我认为你想要的是解开多索引，例如

df.unstack().values[:, :, np.newaxis]

Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_tableinstead:

编辑：如果您有重复的索引，则取消堆叠将不起作用，您可能需要一个pivot_table：

pivoted = df.reset_index().pivot_table(index='current_date',
                                       columns='temp_date',
                                       aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)

Here's a full example of unstack. First we'll create some data:

这是一个完整的示例unstack。首先，我们将创建一些数据：

current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
                                 names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
                  index=ind)
df.head()
#                               val
# current_date temp_date           
# 2015-01-01   2015-01-01  0.309488
#              2015-01-02  0.697876
#              2015-01-03  0.621318
#              2015-01-04  0.308298
#              2015-01-05  0.936828

Now we unstack the multiindex: we'll show the first 4x4 slice of the data:

现在我们解开多索引：我们将显示数据的第一个 4x4 切片：

df.unstack().iloc[:4, :4]
#                     val                                 
# temp_date    2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date                                            
# 2015-01-01     0.309488   0.697876   0.621318   0.308298
# 2015-01-02     0.323530   0.751486   0.507087   0.995565
# 2015-01-03     0.805709   0.101129   0.358664   0.501209
# 2015-01-04     0.360644   0.941200   0.727570   0.884314

Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:

现在提取 numpy 数组，并按照您在问题中的指定重新整形为 [nrows x ncols x 1]：

vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)

具有 MultiIndex 到 Numpy 矩阵的 Pandas DataFrame

提问by Ty Pavicich

回答by jakevdp

相关推荐

最近更新

标签

具有 MultiIndex 到 Numpy 矩阵的 Pandas DataFrame

提问by Ty Pavicich

回答by jakevdp

相关推荐

pandas 在附加中格式化数据帧

Pandas DataFrame 中每月记录的平均每日计数

比较 PandaS DataFrames 并返回第一个缺失的行

Pandas TimeSeries 重采样产生 NaN

相关推荐

最近更新

标签