具有 MultiIndex 到 Numpy 矩阵的 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33508026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame with MultiIndex to Numpy Matrix
提问by Ty Pavicich
I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...)
but this matrix has shape (n_rows, 1)
. I want a matrix of shape (n_index1_rows, n_index2_rows, 1)
.
我有一个带有 2 个索引的 Pandas DataFrame。(MultiIndex) 我想用类似的东西取出一个 Numpy 矩阵,df.as_matrix(...)
但这个矩阵有 shape (n_rows, 1)
。我想要一个形状矩阵(n_index1_rows, n_index2_rows, 1)
。
Is there a way to use .groupby(...)
then a .values.tolist()
or .as_matrix(...)
to get the desired shape?
有没有办法使用.groupby(...)
then a.values.tolist()
或.as_matrix(...)
获得所需的形状?
EDIT: Data
编辑:数据
value
current_date temp_date
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489 30.497100
1970-01-01 00:00:01.446237494 9.584300
1970-01-01 00:00:01.446237455 10.134200
1970-01-01 00:00:01.446237494 7.803683
1970-01-01 00:00:01.446237400 10.678700
1970-01-01 00:00:01.446237373 9.700000
1970-01-01 00:00:01.446237180 15.000000
1970-01-01 00:00:01.446236961 12.928866
1970-01-01 00:00:01.446237032 10.458800
This is kind of the idea:
这是一种想法:
np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])
回答by jakevdp
I think what you want is to unstack the multiindex, e.g.
我认为你想要的是解开多索引,例如
df.unstack().values[:, :, np.newaxis]
Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_table
instead:
编辑:如果您有重复的索引,则取消堆叠将不起作用,您可能需要一个pivot_table
:
pivoted = df.reset_index().pivot_table(index='current_date',
columns='temp_date',
aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)
Here's a full example of unstack
. First we'll create some data:
这是一个完整的示例unstack
。首先,我们将创建一些数据:
current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
index=ind)
df.head()
# val
# current_date temp_date
# 2015-01-01 2015-01-01 0.309488
# 2015-01-02 0.697876
# 2015-01-03 0.621318
# 2015-01-04 0.308298
# 2015-01-05 0.936828
Now we unstack the multiindex: we'll show the first 4x4 slice of the data:
现在我们解开多索引:我们将显示数据的第一个 4x4 切片:
df.unstack().iloc[:4, :4]
# val
# temp_date 2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date
# 2015-01-01 0.309488 0.697876 0.621318 0.308298
# 2015-01-02 0.323530 0.751486 0.507087 0.995565
# 2015-01-03 0.805709 0.101129 0.358664 0.501209
# 2015-01-04 0.360644 0.941200 0.727570 0.884314
Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:
现在提取 numpy 数组,并按照您在问题中的指定重新整形为 [nrows x ncols x 1]:
vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)