Pandas - 带条件公式的 Groupby
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45083000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - Groupby with conditional formula
提问by George Vince
Survived SibSp Parch
0 0 1 0
1 1 1 0
2 1 0 0
3 1 1 0
4 0 0 1
Given the above dataframe, is there an elegant way to groupby
with a condition?
I want to split the data into two groups based on the following conditions:
鉴于上述数据框,是否有一种优雅的方式来groupby
处理条件?我想根据以下条件将数据分成两组:
(df['SibSp'] > 0) | (df['Parch'] > 0) = New Group -"Has Family"
(df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"
then take the means of both of these groups and end up with an output like this:
然后采用这两个组的方法,最终得到如下输出:
SurvivedMean
Has Family Mean
No Family Mean
Can it be done using groupby or would I have to append a new column using the above conditional statement?
可以使用 groupby 完成还是必须使用上述条件语句附加新列?
回答by ayhan
An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.
一种简单的分组方法是使用这两列的总和。如果其中任何一个为正,则结果将大于 1。并且 groupby 接受任意数组,只要长度与 DataFrame 的长度相同,因此您不需要添加新列。
family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out:
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
回答by jezrael
Use only one condition if never values in columns SibSp
and Parch
are less as 0
:
如果列中从未有值SibSp
并且Parch
小于 ,则仅使用一种条件0
:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
If is impossible use first use both conditions:
如果不可能使用首先使用两个条件:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family',
np.where(m2, 'No Family', 'Not'))
df = df.groupby(a)['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
回答by Zwackelmann
You could define your conditions in a list and use the function group_by_condition
below to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:
您可以在列表中定义您的条件,并使用group_by_condition
下面的函数为每个条件创建一个过滤列表。之后,您可以使用模式匹配选择结果项目:
df = [
{"Survived": 0, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 0, "Parch": 0}]
conditions = [
lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0), # has family
lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0) # no family
]
def group_by_condition(l, conditions):
return [[item for item in l if condition(item)] for condition in conditions]
[has_family, no_family] = group_by_condition(df, conditions)