pandas 来自另一个数据帧的熊猫多索引分配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28431519/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas multiindex assignment from another dataframe
提问by Matti Lyra
I am trying to understand pandasMultiIndexDataFrames and how to assign data to them. Specifically I'm interested in assigning entire blocks that match another smaller data frame.
我试图了解pandasMultiIndexDataFrames 以及如何将数据分配给它们。具体来说,我有兴趣分配与另一个较小数据框匹配的整个块。
ix = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c', 'd']])
df = pd.DataFrame(index=ix, columns=['1st', '2nd', '3rd'], dtype=np.float64)
df_ = pd.DataFrame(index=['a', 'b', 'c', 'd'], columns=['1st', '2nd', '3rd'], data=np.random.rand(4, 3))
df_
1st 2nd 3rd
a 0.730251 0.468134 0.876926
b 0.104990 0.082461 0.129083
c 0.993608 0.117799 0.341811
d 0.784950 0.840145 0.016777
dfis the same except that all the values are NaNand there are two blocks Aand B. Now if I want to assign the values from df_to dfI would imagine I can do something like
df是相同的,只是所有的值都是,NaN并且有两个块A和B。现在,如果我想从 to 分配值df_,df我想我可以做类似的事情
df.loc['A',:] = df_ # Runs, does not work
df.loc[('A','a'):('A','d')] = df_ # AssertionError (??) 'Start slice bound is non-scalar'
df.loc[('A','a'):('A','d')] # No AssertionError (??)
idx = pd.IndexSlice
df.loc[idx['A', :]] = df_ # Runs, does not work
None of these work, they leave all the values in dfas NaN, although df.loc[idx['A', :]]gives me a slice of the data frame that exactly matches that of the sub frame (df_). So is this a case of setting values on a view? Explicitly iterating over the index in df_works
这些都不起作用,它们将所有值保留在dfas 中NaN,尽管df.loc[idx['A', :]]给了我一个与子帧 ( df_)完全匹配的数据帧切片。那么这是在视图上设置值的情况吗?明确地遍历索引df_作品
# this is fine
for v in df_.index:
df.loc[idx['A', v]] = df_.loc[v]
# this is also fine
for v in df_.index:
df.loc['A', v] = df_.loc[v]
Is it even possible to assign whole blocks like this (sort of like NumPy)? If not, that's fine, I am simply trying to understand how the system works.
甚至可以像这样分配整个块(有点像NumPy)?如果没有,那很好,我只是想了解系统是如何工作的。
There's a related question about index slicers, but it's about assigning a single value to a masked portion of the DataFrame, not about assigning blocks.
Pandas : Proper way to set values based on condition for subset of multiindex dataframe
有一个关于索引切片器的相关问题,但它是关于将单个值分配给 的屏蔽部分,而DataFrame不是分配块。
Pandas:根据多索引数据帧子集的条件设置值的正确方法
回答by unutbu
When you use
当你使用
df.loc['A', :] = df_
Pandas tries to align the index of df_with the index of a sub-DataFrame of
df. However, at the point in the codewhere alignment is performed, the
sub-DataFrame has a MultiIndex, not the single index you see as the result
of df.loc['A', :].
Pandas 尝试将 的索引df_与 的子 DataFrame 的索引
对齐df。但是,在执行对齐的代码中,子 DataFrame 有一个MultiIndex,而不是您看到的作为df.loc['A', :].
So the alignment fails because df_has a single index, not the MultiIndex that
is needed. To see that the index of df_is indeed the problem, note that
因此对齐失败是因为df_只有一个索引,而不是所需的 MultiIndex。要查看索引df_确实是问题所在,请注意
ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])
df_.index = ix_
df.loc['A', :] = df_
print(df)
succeeds, yielding something like
成功,产生类似
A a 0.229970 0.730824 0.784356
b 0.584390 0.628337 0.318222
c 0.257192 0.624273 0.221279
d 0.787023 0.056342 0.240735
B a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
d NaN NaN NaN
Of course, you probably do not want to have to create a new MultiIndex every time you want to assign a block of values. So instead, to work around this alignment problem, you can use a NumPy array as the assignment value:
当然,您可能不希望每次要分配值块时都必须创建新的 MultiIndex。因此,要解决此对齐问题,您可以使用 NumPy 数组作为赋值:
df.loc['A', :] = df_.values
Since df_.valuesis a NumPy array and an array has no index, no alignment is
performedand the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes
applies to many situations when using Pandas.
由于df_.values是一个 NumPy 数组并且数组没有索引,所以不执行对齐并且赋值产生与上面相同的结果。当您不希望索引对齐时,使用 NumPy 数组的这个技巧适用于使用 Pandas 时的许多情况。
Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:
另请注意,按 NumPy 数组赋值还可以帮助您执行更复杂的赋值,例如对不连续的行进行赋值:
idx = pd.IndexSlice
df.loc[idx[:,('a','b')], :] = df_.values
yields
产量
In [85]: df
Out[85]:
1st 2nd 3rd
A a 0.229970 0.730824 0.784356
b 0.584390 0.628337 0.318222
c NaN NaN NaN
d NaN NaN NaN
B a 0.257192 0.624273 0.221279
b 0.787023 0.056342 0.240735
c NaN NaN NaN
d NaN NaN NaN
for example.
例如。
回答by behzad.nouri
I did 8480a while back, which makes sub-frame assignment with columns work. so, you may do as follows as a work-around:
不久前我做了8480,这使得带有列的子框架分配工作。因此,您可以执行以下操作作为解决方法:
>>> rf
1st 2nd 3rd
a 0.730 0.468 0.877
b 0.105 0.082 0.129
c 0.994 0.118 0.342
d 0.785 0.840 0.017
>>> df.T['A'] = rf.T # take transpose of both sides
>>> df
1st 2nd 3rd
A a 0.730 0.468 0.877
b 0.105 0.082 0.129
c 0.994 0.118 0.342
d 0.785 0.840 0.017
B a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
d NaN NaN NaN
that said, you may want to post this as a bug on github.
也就是说,您可能想将此作为错误发布在 github 上。
edit: seems that adding a dummy slice at the end also works:
编辑:似乎在最后添加一个虚拟切片也有效:
>>> df.loc['A'][:] = rf
>>> df
1st 2nd 3rd
A a 0.730 0.468 0.877
b 0.105 0.082 0.129
c 0.994 0.118 0.342
d 0.785 0.840 0.017
B a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
d NaN NaN NaN

