使用 groupby 划分两列的 Pandas

Question

提问by Sanjeev

This is obviously simple, but as a pandas newbe I'm getting stuck.

这显然很简单，但作为一个Pandas新手，我被卡住了。

I have a CSV file that contains 3 columns, the State, bene_1_count, and bene_2_count.

我有一个包含 3 列的 CSV 文件，即 State、bene_1_count 和 bene_2_count。

I want to calculate the ratio of 'bene_1_count' and 'bene_2_count' in a given state.

我想计算给定状态下 'bene_1_count' 和 'bene_2_count' 的比率。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

I am trying the following, but it is giving me an error: 'No objects to concatenate'

我正在尝试以下操作，但它给了我一个错误：“没有要连接的对象”

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

I am not able to figure out how to "reach up" to the state level of the groupby to take the ratio of columns.

我无法弄清楚如何“达到”到 groupby 的状态级别以获取列的比率。

I want the ratio of columns w.r.t a state, like I want my output as follows:

我想要一个状态的列的比率，就像我想要我的输出如下：

    State       ratio

    CA  
    WA  
    CO  
    AZ

Answer 1

回答by ansonw

Alternatively, stated: You can create custom functions that accept a dataframe. The groupby will return sub-dataframes. You can then use the apply function to apply your custom function to each sub-dataframe.

或者，声明：您可以创建接受数据帧的自定义函数。groupby 将返回子数据帧。然后，您可以使用 apply 函数将您的自定义函数应用于每个子数据帧。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

def divide_two_cols(df_sub):
    return df_sub['bene_1_count'].sum() / float(df_sub['bene_2_count'].sum())

df.groupby('state').apply(divide_two_cols)

Now say you want each row to be divided by the sum of each group (e.g., the total sum of AZ) and also retain all the original columns. Just adjust the above function (change the calculation and return the whole sub dataframe):

现在假设您希望每一行除以每个组的总和（例如，AZ 的总和），并且还保留所有原始列。只需调整上述函数（更改计算并返回整个子数据帧）：

def divide_two_cols(df_sub):
    df_sub['divs'] = df_sub['bene_1_count'] / float(df_sub['bene_2_count'].sum())
    return df_sub

df.groupby('state').apply(divide_two_cols)

Answer 2

回答by rtk22

I believe what you first need to do is sum the counts by state before finding the ratio. You can use applyto access the other columns in the df, and then store them in a dictionary to map to the corresponding state in the original dataframe.

我相信您首先需要做的是在找到比率之前按州对计数求和。您可以使用apply访问 df 中的其他列，然后将它们存储在字典中以映射到原始数据帧中的相应状态。

import pandas as pd
import numpy as np
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
            'bene_1_count': [np.random.randint(10000, 99999)
                      for _ in range(12)],
            'bene_2_count': [np.random.randint(10000, 99999)
                      for _ in range(12)]})

ratios = df.groupby('state').apply(lambda x: x['bene_1_count'].sum() /
                                   x['bene_2_count'].sum().astype(float)).to_dict()

df['ratio'] = df['state'].map(ratios)

使用 groupby 划分两列的 Pandas

提问by Sanjeev

回答by ansonw

回答by rtk22

相关推荐

最近更新

标签

使用 groupby 划分两列的 Pandas

提问by Sanjeev

回答by ansonw

回答by rtk22

相关推荐

pandas 如何执行分类列之间的相关性

使用 apply 和 map 在 Pandas 中添加两列的值

为什么我使用 Pandas 读取 csv 文件的对象是 TextFileReader 对象

pandas 你如何对 Python DataFrames 进行单元测试

相关推荐

最近更新

标签