pandas 熊猫在数据框中有条件地选择特定列,另一个条件会导致串联

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48497809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:06:10  来源:igfitidea点击:

pandas select conditionally specific columns in dataframe with another condition results in concatenation

pythonpandas

提问by yeinhorn

I want to select specific columns by their name using loc, since i want to do it with another condition. I get weird behavior trying to achieve this using

我想使用 loc 按名称选择特定列,因为我想用另一个条件来做。我试图使用

df.loc[,conditionOne | conditionTwo]

one of the conditions is whether a column name is in a specific list of names, and the second condition is another condtion (here is the median of the column):

其中一个条件是列名是否在特定的名称列表中,第二个条件是另一个条件(这里是列的中位数):

df = pd.DataFrame({'A' : [0,0,0,0], 'B' : [1,2,3, 5],  'C' : [10,20,30, 50]})
df.columns.values
keepColumnsNames = ['A', 'c']
condtionOne = df.mean()>2
print(condtionOne)
"#A    False"
"#B     True"
"#C     True"
"#dtype: bool" 
condtionTwo=pd.DataFrame(df.columns.values).iloc[:,0].isin(keepColumnsNames)
print(condtionTwo)
"#A    False"
"#B     True"
"#C     True"

Now when i want to do an or operator between the two conditions i get the next weird behavior:

现在,当我想在两个条件之间执行 or 运算符时,我会得到下一个奇怪的行为:

print(condtionOne | condtionTwo )
"#0    False"
"#1    False"
"#2    False"
"#A    False"
"#B     True"
"#C     True"
"#dtype: bool"

while I would expect to get

虽然我希望得到

"#False"
"#True"
"#True"

采纳答案by jezrael

You need same indices in both masks:

您需要在两个掩码中使用相同的索引:

condtionTwo=pd.DataFrame(df.columns.values,index=df.columns).iloc[:,0].isin(keepColumnsNames)
print(condtionTwo)
A     True
B    False
C    False
Name: 0, dtype: bool

Or better, thanks @Julien Marrec for comment is create array with no indices:

或者更好,感谢@Julien Marrec 的评论是创建没有索引的数组:

condtionTwo = df.columns.isin(keepColumnsNames) 
print(condtionTwo)
[ True False False]

print(condtionOne | condtionTwo)
A    True
B    True
C    True
dtype: bool

All together:

全部一起:

df1 = df.loc[:, condtionOne | condtionTwo]
print (df1)
   A  B   C
0  0  1  10
1  0  2  20
2  0  3  30
3  0  5  50

What is same as:

什么是相同的:

df1 = df.loc[:, (df.mean() > 2) | (df.columns.isin(keepColumnsNames))]
print (df1)
   A  B   C
0  0  1  10
1  0  2  20
2  0  3  30
3  0  5  50

回答by zipa

This should do it in fewer characters:

这应该用更少的字符来完成:

condtionOne = df.mean()>2
condtionTwo = ['A', 'C']
df.loc[:, (conditionOne).values|(df.columns.isin(condition2))]