Pandas - 带条件公式的 Groupby

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45083000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:59:29  来源:igfitidea点击:

Pandas - Groupby with conditional formula

pythonpandasdataframeconditional-statementspandas-groupby

提问by George Vince

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1

Given the above dataframe, is there an elegant way to groupbywith a condition? I want to split the data into two groups based on the following conditions:

鉴于上述数据框,是否有一种优雅的方式来groupby处理条件?我想根据以下条件将数据分成两组:

(df['SibSp'] > 0) | (df['Parch'] > 0) =   New Group -"Has Family"
 (df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"

then take the means of both of these groups and end up with an output like this:

然后采用这两个组的方法,最终得到如下输出:

               SurvivedMean
 Has Family    Mean
 No Family     Mean

Can it be done using groupby or would I have to append a new column using the above conditional statement?

可以使用 groupby 完成还是必须使用上述条件语句附加新列?

回答by ayhan

An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.

一种简单的分组方法是使用这两列的总和。如果其中任何一个为正,则结果将大于 1。并且 groupby 接受任意数组,只要长度与 DataFrame 的长度相同,因此您不需要添加新列。

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

回答by jezrael

Use only one condition if never values in columns SibSpand Parchare less as 0:

如果列中从未有值SibSp并且Parch小于 ,则仅使用一种条件0

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

If is impossible use first use both conditions:

如果不可能使用首先使用两个条件:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family', 
    np.where(m2, 'No Family', 'Not'))

df = df.groupby(a)['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

回答by Zwackelmann

You could define your conditions in a list and use the function group_by_conditionbelow to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:

您可以在列表中定义您的条件,并使用group_by_condition下面的函数为每个条件创建一个过滤列表。之后,您可以使用模式匹配选择结果项目:

df = [
  {"Survived": 0, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 0, "Parch": 0}]

conditions = [
  lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0),  # has family
  lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0)  # no family
]

def group_by_condition(l, conditions):
    return [[item for item in l if condition(item)] for condition in conditions]

[has_family, no_family] = group_by_condition(df, conditions)