Python Pandas：根据多索引数据帧子集的条件设置值的正确方法

Question

提问by pbreach

I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy).

我不确定如何在没有链式分配的情况下执行此操作（这可能无论如何都行不通，因为我要设置副本）。

I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero.

我不想采用多索引熊猫数据帧的子集，测试小于零的值并将它们设置为零。

For example:

例如：

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

df[df['A']<0] = 0.0

gives

给

In [37]:

df

Out[37]:
    A   B
    a   b   a   b
0   -1  0   -20 -200
1   -1  1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  -1  20  200

Which shows that it was not able to set based on the condition. Alternatively if I did a chained assignment:

这表明它无法根据条件进行设置。或者，如果我做了一个链式作业：

df.loc[:,'A'][df['A']<0] = 0.0

This gives the same result (and setting with copy warning)

这给出了相同的结果（并设置了复制警告）

I could loop through each column based on the condition that the first level is the one that I want:

我可以根据第一级是我想要的条件来遍历每一列：

for one,two in df.columns.values:
    if one == 'A':
        df.loc[df[(one,two)]<0, (one,two)] = 0.0

which gives the desired result:

这给出了所需的结果：

In [64]:

df

Out[64]:
    A   B
    a   b   a   b
0   0   0   -20 -200
1   0   1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  0   20  200

But somehow I feel there is a better way to do this than looping through the columns. What is the best way to do this in pandas?

但不知何故，我觉得有比循环遍历列更好的方法。在熊猫中做到这一点的最佳方法是什么？

Answer 1

采纳答案by Jeff

This is an application of (and one of the main motivations for using MultiIndex slicers), see docs here

这是一个应用程序（也是使用 MultiIndex 切片器的主要动机之一），请参阅此处的文档

In [20]: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

In [21]: df
Out[21]: 
    A      B     
    a  b   a    b
0  -1  0 -20 -200
1  -1  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12 -1  20  200

In [22]: idx = pd.IndexSlice

In [23]: mask = df.loc[:,idx['A',:]]<0

In [24]: mask
Out[24]: 
       A       
       a      b
0   True  False
1   True  False
2  False  False
3  False  False
4  False   True

In [25]: df[mask] = 0

In [26]: df
Out[26]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200

Since you are working with the 1st level of the columns index, the following will work as well. The above example is more general, say you wanted to do this for 'a'.

由于您正在使用列索引的第一级，因此以下内容也适用。上面的例子更笼统，假设你想为“a”做这个。

In [30]: df[df[['A']]<0] = 0

In [31]: df
Out[31]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200

Python Pandas：根据多索引数据帧子集的条件设置值的正确方法

提问by pbreach

采纳答案by Jeff

相关推荐

最近更新

标签

Python Pandas：根据多索引数据帧子集的条件设置值的正确方法

提问by pbreach

采纳答案by Jeff

相关推荐

使用 ConfigParser Python 更改 ini 文件中的值

Python Scrapy，只关注内部 URL，但提取找到的所有链接

Python 正则表达式中的空格

Python 根据“不在”条件从数据框中删除行

相关推荐

最近更新

标签