使用 groupby 划分两列的 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42046885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:54:35  来源:igfitidea点击:

Pandas division of two columns with groupby

pythonpython-3.xpandas

提问by Sanjeev

This is obviously simple, but as a pandas newbe I'm getting stuck.

这显然很简单,但作为一个Pandas新手,我被卡住了。

I have a CSV file that contains 3 columns, the State, bene_1_count, and bene_2_count.

我有一个包含 3 列的 CSV 文件,即 State、bene_1_count 和 bene_2_count。

I want to calculate the ratio of 'bene_1_count' and 'bene_2_count' in a given state.

我想计算给定状态下 'bene_1_count' 和 'bene_2_count' 的比率。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

I am trying the following, but it is giving me an error: 'No objects to concatenate'

我正在尝试以下操作,但它给了我一个错误:“没有要连接的对象”

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

I am not able to figure out how to "reach up" to the state level of the groupby to take the ratio of columns.

我无法弄清楚如何“达到”到 groupby 的状态级别以获取列的比率。

I want the ratio of columns w.r.t a state, like I want my output as follows:

我想要一个状态的列的比率,就像我想要我的输出如下:

    State       ratio

    CA  
    WA  
    CO  
    AZ  

回答by ansonw

Alternatively, stated: You can create custom functions that accept a dataframe. The groupby will return sub-dataframes. You can then use the apply function to apply your custom function to each sub-dataframe.

或者,声明:您可以创建接受数据帧的自定义函数。groupby 将返回子数据帧。然后,您可以使用 apply 函数将您的自定义函数应用于每个子数据帧。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

def divide_two_cols(df_sub):
    return df_sub['bene_1_count'].sum() / float(df_sub['bene_2_count'].sum())

df.groupby('state').apply(divide_two_cols)

Now say you want each row to be divided by the sum of each group (e.g., the total sum of AZ) and also retain all the original columns. Just adjust the above function (change the calculation and return the whole sub dataframe):

现在假设您希望每一行除以每个组的总和(例如,AZ 的总和),并且还保留所有原始列。只需调整上述函数(更改计算并返回整个子数据帧):

def divide_two_cols(df_sub):
    df_sub['divs'] = df_sub['bene_1_count'] / float(df_sub['bene_2_count'].sum())
    return df_sub

df.groupby('state').apply(divide_two_cols)

回答by rtk22

I believe what you first need to do is sum the counts by state before finding the ratio. You can use applyto access the other columns in the df, and then store them in a dictionary to map to the corresponding state in the original dataframe.

我相信您首先需要做的是在找到比率之前按州对计数求和。您可以使用apply访问 df 中的其他列,然后将它们存储在字典中以映射到原始数据帧中的相应状态。

import pandas as pd
import numpy as np
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
            'bene_1_count': [np.random.randint(10000, 99999)
                      for _ in range(12)],
            'bene_2_count': [np.random.randint(10000, 99999)
                      for _ in range(12)]})

ratios = df.groupby('state').apply(lambda x: x['bene_1_count'].sum() /
                                   x['bene_2_count'].sum().astype(float)).to_dict()

df['ratio'] = df['state'].map(ratios)