Pandas - 带条件公式的 Groupby

Question

提问by George Vince

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1

Given the above dataframe, is there an elegant way to groupbywith a condition? I want to split the data into two groups based on the following conditions:

鉴于上述数据框，是否有一种优雅的方式来groupby处理条件？我想根据以下条件将数据分成两组：

(df['SibSp'] > 0) | (df['Parch'] > 0) =   New Group -"Has Family"
 (df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"

then take the means of both of these groups and end up with an output like this:

然后采用这两个组的方法，最终得到如下输出：

               SurvivedMean
 Has Family    Mean
 No Family     Mean

Can it be done using groupby or would I have to append a new column using the above conditional statement?

可以使用 groupby 完成还是必须使用上述条件语句附加新列？

Answer 1

回答by ayhan

An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.

一种简单的分组方法是使用这两列的总和。如果其中任何一个为正，则结果将大于 1。并且 groupby 接受任意数组，只要长度与 DataFrame 的长度相同，因此您不需要添加新列。

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

Answer 2

回答by jezrael

Use only one condition if never values in columns SibSpand Parchare less as 0:

如果列中从未有值SibSp并且Parch小于，则仅使用一种条件0：

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

If is impossible use first use both conditions:

如果不可能使用首先使用两个条件：

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family', 
    np.where(m2, 'No Family', 'Not'))

df = df.groupby(a)['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

Answer 3

回答by Zwackelmann

You could define your conditions in a list and use the function group_by_conditionbelow to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:

您可以在列表中定义您的条件，并使用group_by_condition下面的函数为每个条件创建一个过滤列表。之后，您可以使用模式匹配选择结果项目：

df = [
  {"Survived": 0, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 0, "Parch": 0}]

conditions = [
  lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0),  # has family
  lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0)  # no family
]

def group_by_condition(l, conditions):
    return [[item for item in l if condition(item)] for condition in conditions]

[has_family, no_family] = group_by_condition(df, conditions)

Pandas - 带条件公式的 Groupby

提问by George Vince

回答by ayhan

回答by jezrael

回答by Zwackelmann

相关推荐

最近更新

标签

Pandas - 带条件公式的 Groupby

提问by George Vince

回答by ayhan

回答by jezrael

回答by Zwackelmann

相关推荐

.div 在 Pandas (Python) 中有什么作用

将 Pandas DataFrame 切片为新的 DataFrame

Pandas：astype error string to float（无法将字符串转换为浮点数：'7,50'）

Pandas：如何在数据框列中找到特定模式？

相关推荐

最近更新

标签