Python Pandas:根据多索引数据帧子集的条件设置值的正确方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28002197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:35:25  来源:igfitidea点击:

Pandas : Proper way to set values based on condition for subset of multiindex dataframe

pythonpandasmulti-index

提问by pbreach

I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy).

我不确定如何在没有链式分配的情况下执行此操作(这可能无论如何都行不通,因为我要设置副本)。

I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero.

我不想采用多索引熊猫数据帧的子集,测试小于零的值并将它们设置为零。

For example:

例如:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

df[df['A']<0] = 0.0

gives

In [37]:

df

Out[37]:
    A   B
    a   b   a   b
0   -1  0   -20 -200
1   -1  1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  -1  20  200

Which shows that it was not able to set based on the condition. Alternatively if I did a chained assignment:

这表明它无法根据条件进行设置。或者,如果我做了一个链式作业:

df.loc[:,'A'][df['A']<0] = 0.0

This gives the same result (and setting with copy warning)

这给出了相同的结果(并设置了复制警告)

I could loop through each column based on the condition that the first level is the one that I want:

我可以根据第一级是我想要的条件来遍历每一列:

for one,two in df.columns.values:
    if one == 'A':
        df.loc[df[(one,two)]<0, (one,two)] = 0.0

which gives the desired result:

这给出了所需的结果:

In [64]:

df

Out[64]:
    A   B
    a   b   a   b
0   0   0   -20 -200
1   0   1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  0   20  200

But somehow I feel there is a better way to do this than looping through the columns. What is the best way to do this in pandas?

但不知何故,我觉得有比循环遍历列更好的方法。在熊猫中做到这一点的最佳方法是什么?

采纳答案by Jeff

This is an application of (and one of the main motivations for using MultiIndex slicers), see docs here

这是一个应用程序(也是使用 MultiIndex 切片器的主要动机之一),请参阅此处的文档

In [20]: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

In [21]: df
Out[21]: 
    A      B     
    a  b   a    b
0  -1  0 -20 -200
1  -1  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12 -1  20  200

In [22]: idx = pd.IndexSlice

In [23]: mask = df.loc[:,idx['A',:]]<0

In [24]: mask
Out[24]: 
       A       
       a      b
0   True  False
1   True  False
2  False  False
3  False  False
4  False   True

In [25]: df[mask] = 0

In [26]: df
Out[26]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200

Since you are working with the 1st level of the columns index, the following will work as well. The above example is more general, say you wanted to do this for 'a'.

由于您正在使用列索引的第一级,因此以下内容也适用。上面的例子更笼统,假设你想为“a”做这个。

In [30]: df[df[['A']]<0] = 0

In [31]: df
Out[31]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200