pandas 熊猫在数据框中有条件地选择特定列,另一个条件会导致串联
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48497809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas select conditionally specific columns in dataframe with another condition results in concatenation
提问by yeinhorn
I want to select specific columns by their name using loc, since i want to do it with another condition. I get weird behavior trying to achieve this using
我想使用 loc 按名称选择特定列,因为我想用另一个条件来做。我试图使用
df.loc[,conditionOne | conditionTwo]
one of the conditions is whether a column name is in a specific list of names, and the second condition is another condtion (here is the median of the column):
其中一个条件是列名是否在特定的名称列表中,第二个条件是另一个条件(这里是列的中位数):
df = pd.DataFrame({'A' : [0,0,0,0], 'B' : [1,2,3, 5], 'C' : [10,20,30, 50]})
df.columns.values
keepColumnsNames = ['A', 'c']
condtionOne = df.mean()>2
print(condtionOne)
"#A False"
"#B True"
"#C True"
"#dtype: bool"
condtionTwo=pd.DataFrame(df.columns.values).iloc[:,0].isin(keepColumnsNames)
print(condtionTwo)
"#A False"
"#B True"
"#C True"
Now when i want to do an or operator between the two conditions i get the next weird behavior:
现在,当我想在两个条件之间执行 or 运算符时,我会得到下一个奇怪的行为:
print(condtionOne | condtionTwo )
"#0 False"
"#1 False"
"#2 False"
"#A False"
"#B True"
"#C True"
"#dtype: bool"
while I would expect to get
虽然我希望得到
"#False"
"#True"
"#True"
采纳答案by jezrael
You need same indices in both masks:
您需要在两个掩码中使用相同的索引:
condtionTwo=pd.DataFrame(df.columns.values,index=df.columns).iloc[:,0].isin(keepColumnsNames)
print(condtionTwo)
A True
B False
C False
Name: 0, dtype: bool
Or better, thanks @Julien Marrec for comment is create array with no indices:
或者更好,感谢@Julien Marrec 的评论是创建没有索引的数组:
condtionTwo = df.columns.isin(keepColumnsNames)
print(condtionTwo)
[ True False False]
print(condtionOne | condtionTwo)
A True
B True
C True
dtype: bool
All together:
全部一起:
df1 = df.loc[:, condtionOne | condtionTwo]
print (df1)
A B C
0 0 1 10
1 0 2 20
2 0 3 30
3 0 5 50
What is same as:
什么是相同的:
df1 = df.loc[:, (df.mean() > 2) | (df.columns.isin(keepColumnsNames))]
print (df1)
A B C
0 0 1 10
1 0 2 20
2 0 3 30
3 0 5 50
回答by zipa
This should do it in fewer characters:
这应该用更少的字符来完成:
condtionOne = df.mean()>2
condtionTwo = ['A', 'C']
df.loc[:, (conditionOne).values|(df.columns.isin(condition2))]