Python 如何循环分组 Pandas 数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27405483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:45:45  来源:igfitidea点击:

How to loop over grouped Pandas dataframe?

pythonpandas

提问by Tjorriemorrie

DataFrame:

数据框:

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

Code:

代码:

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

I'm trying to just loop over the aggregated data, but I get the error:

我正在尝试仅遍历聚合数据,但出现错误:

ValueError: too many values to unpack

值错误:解包的值太多

@EdChum, here's the expected output:

@EdChum,这是预期的输出:

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

The output is not the problem, I wish to loop over every group.

输出不是问题,我希望遍历每个组。

采纳答案by joris

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))does already return a dataframe, so you cannot loop over the groups anymore.

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))确实已经返回了一个数据帧,所以你不能再循环遍历这些组了。

In general:

一般来说:

  • df.groupby(...)returns a GroupByobject (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:

    grouped = df.groupby('A')
    
    for name, group in grouped:
        ...
    
  • When you apply a function on the groupby, in your example df.groupby(...).agg(...)(but this can also be transform, apply, mean, ...), you combinethe result of applyingthe function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

  • df.groupby(...)返回一个GroupBy对象(DataFrameGroupBy或SeriesGroupBy),以及与此,您可以迭代通过组(如文档解释这里)。您可以执行以下操作:

    grouped = df.groupby('A')
    
    for name, group in grouped:
        ...
    
  • 当您应用在GROUPBY,在你的榜样的功能df.groupby(...).agg(...)(但是这也可以是transformapplymean,...),你结合的结果应用的功能,不同的群体集中在一个数据框(在适用和结合的步骤groupby 的“拆分-应用-组合”范例)。因此,其结果将始终再次成为 DataFrame(或取决于应用的功能的系列)。

回答by khiner

You can iterate over the index values if your dataframe has already been created.

如果您的数据框已经创建,您可以迭代索引值。

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

回答by Andrei Sura

Here is an example of iterating over a pd.DataFramegrouped by the column atable. For an sample usecase, "create" statements for an SQL database are generated within the forloop:

这是一个迭代pd.DataFrame按列分组的示例atable。对于示例用例,SQL 数据库的“创建”语句是在for循环内生成的:

import pandas as pd

df1 = pd.DataFrame({
    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})

df1_grouped = df1.groupby('atable')

# iterate over each group
for group_name, df_group in df1_grouped:
    print('\nCREATE TABLE {}('.format(group_name))

    for row_index, row in df_group.iterrows():
        col = row['column']
        column_type = row['column_type']
        is_null = 'NOT NULL' if row['is_null'] == 'NO' else ''
        print('\t{} {} {},'.format(col, column_type, is_null))

    print(");")