pandas 来自另一个数据帧的熊猫多索引分配

Question

提问by Matti Lyra

I am trying to understand pandasMultiIndexDataFrames and how to assign data to them. Specifically I'm interested in assigning entire blocks that match another smaller data frame.

我试图了解pandasMultiIndexDataFrames 以及如何将数据分配给它们。具体来说，我有兴趣分配与另一个较小数据框匹配的整个块。

ix = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c', 'd']])
df = pd.DataFrame(index=ix, columns=['1st', '2nd', '3rd'], dtype=np.float64)
df_ = pd.DataFrame(index=['a', 'b', 'c', 'd'], columns=['1st', '2nd', '3rd'], data=np.random.rand(4, 3))
df_

    1st     2nd     3rd
a   0.730251    0.468134    0.876926
b   0.104990    0.082461    0.129083
c   0.993608    0.117799    0.341811
d   0.784950    0.840145    0.016777

dfis the same except that all the values are NaNand there are two blocks Aand B. Now if I want to assign the values from df_to dfI would imagine I can do something like

df是相同的，只是所有的值都是，NaN并且有两个块A和B。现在，如果我想从 to 分配值df_，df我想我可以做类似的事情

df.loc['A',:] = df_                # Runs, does not work
df.loc[('A','a'):('A','d')] = df_  # AssertionError (??) 'Start slice bound is non-scalar'
df.loc[('A','a'):('A','d')]        # No AssertionError (??)

idx = pd.IndexSlice
df.loc[idx['A', :]] = df_          # Runs, does not work

None of these work, they leave all the values in dfas NaN, although df.loc[idx['A', :]]gives me a slice of the data frame that exactly matches that of the sub frame (df_). So is this a case of setting values on a view? Explicitly iterating over the index in df_works

这些都不起作用，它们将所有值保留在dfas 中NaN，尽管df.loc[idx['A', :]]给了我一个与子帧 ( df_)完全匹配的数据帧切片。那么这是在视图上设置值的情况吗？明确地遍历索引df_作品

# this is fine
for v in df_.index:
    df.loc[idx['A', v]] = df_.loc[v]

# this is also fine
for v in df_.index:
    df.loc['A', v] = df_.loc[v]

Is it even possible to assign whole blocks like this (sort of like NumPy)? If not, that's fine, I am simply trying to understand how the system works.

甚至可以像这样分配整个块（有点像NumPy）？如果没有，那很好，我只是想了解系统是如何工作的。

There's a related question about index slicers, but it's about assigning a single value to a masked portion of the DataFrame, not about assigning blocks. Pandas : Proper way to set values based on condition for subset of multiindex dataframe

有一个关于索引切片器的相关问题，但它是关于将单个值分配给的屏蔽部分，而DataFrame不是分配块。 Pandas：根据多索引数据帧子集的条件设置值的正确方法

Answer 1

回答by unutbu

When you use

当你使用

df.loc['A', :] = df_

Pandas tries to align the index of df_with the index of a sub-DataFrame of df. However, at the point in the codewhere alignment is performed, the sub-DataFrame has a MultiIndex, not the single index you see as the result of df.loc['A', :].

Pandas 尝试将的索引df_与的子 DataFrame 的索引对齐df。但是，在执行对齐的代码中，子 DataFrame 有一个MultiIndex，而不是您看到的作为df.loc['A', :].

So the alignment fails because df_has a single index, not the MultiIndex that is needed. To see that the index of df_is indeed the problem, note that

因此对齐失败是因为df_只有一个索引，而不是所需的 MultiIndex。要查看索引df_确实是问题所在，请注意

ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])
df_.index = ix_
df.loc['A', :] = df_
print(df)

succeeds, yielding something like

成功，产生类似

A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c  0.257192  0.624273  0.221279
  d  0.787023  0.056342  0.240735
B a       NaN       NaN       NaN
  b       NaN       NaN       NaN
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

Of course, you probably do not want to have to create a new MultiIndex every time you want to assign a block of values. So instead, to work around this alignment problem, you can use a NumPy array as the assignment value:

当然，您可能不希望每次要分配值块时都必须创建新的 MultiIndex。因此，要解决此对齐问题，您可以使用 NumPy 数组作为赋值：

df.loc['A', :] = df_.values

Since df_.valuesis a NumPy array and an array has no index, no alignment is performedand the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes applies to many situations when using Pandas.

由于df_.values是一个 NumPy 数组并且数组没有索引，所以不执行对齐并且赋值产生与上面相同的结果。当您不希望索引对齐时，使用 NumPy 数组的这个技巧适用于使用 Pandas 时的许多情况。

Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:

另请注意，按 NumPy 数组赋值还可以帮助您执行更复杂的赋值，例如对不连续的行进行赋值：

idx = pd.IndexSlice
df.loc[idx[:,('a','b')], :] = df_.values

yields

产量

In [85]: df
Out[85]: 
          1st       2nd       3rd
A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN
B a  0.257192  0.624273  0.221279
  b  0.787023  0.056342  0.240735
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

for example.

例如。

Answer 2

回答by behzad.nouri

I did 8480a while back, which makes sub-frame assignment with columns work. so, you may do as follows as a work-around:

不久前我做了8480，这使得带有列的子框架分配工作。因此，您可以执行以下操作作为解决方法：

>>> rf
     1st    2nd    3rd
a  0.730  0.468  0.877
b  0.105  0.082  0.129
c  0.994  0.118  0.342
d  0.785  0.840  0.017
>>> df.T['A'] = rf.T  # take transpose of both sides
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

that said, you may want to post this as a bug on github.

也就是说，您可能想将此作为错误发布在 github 上。

edit: seems that adding a dummy slice at the end also works:

编辑：似乎在最后添加一个虚拟切片也有效：

>>> df.loc['A'][:] = rf
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

pandas 来自另一个数据帧的熊猫多索引分配

提问by Matti Lyra

回答by unutbu

回答by behzad.nouri

相关推荐

最近更新

标签

pandas 来自另一个数据帧的熊猫多索引分配

提问by Matti Lyra

回答by unutbu

回答by behzad.nouri

相关推荐

Pandas 能否在不修改文件的其余部分的情况下读取和修改单个 Excel 文件工作表（选项卡）？

从 Pandas 数据框中的其他列分配列的值

pandas 从python中的groupby对象中选择特定行

pandas 熊猫数据框中的条件列算术

相关推荐

最近更新

标签