Python 如何循环分组 Pandas 数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27405483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to loop over grouped Pandas dataframe?
提问by Tjorriemorrie
DataFrame:
数据框:
c_os_family_ss c_os_major_is l_customer_id_i
0 Windows 7 90418
1 Windows 7 90418
2 Windows 7 90418
Code:
代码:
print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
print name
print group
I'm trying to just loop over the aggregated data, but I get the error:
我正在尝试仅遍历聚合数据,但出现错误:
ValueError: too many values to unpack
值错误:解包的值太多
@EdChum, here's the expected output:
@EdChum,这是预期的输出:
c_os_family_ss \
l_customer_id_i
131572 Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467 Windows 7,Windows 7,Windows 7,Windows 7,Window...
c_os_major_is
l_customer_id_i
131572 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
The output is not the problem, I wish to loop over every group.
输出不是问题,我希望遍历每个组。
采纳答案by joris
df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
does already return a dataframe, so you cannot loop over the groups anymore.
df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
确实已经返回了一个数据帧,所以你不能再循环遍历这些组了。
In general:
一般来说:
df.groupby(...)
returns aGroupBy
object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:grouped = df.groupby('A') for name, group in grouped: ...
When you apply a function on the groupby, in your example
df.groupby(...).agg(...)
(but this can also betransform
,apply
,mean
, ...), you combinethe result of applyingthe function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).
df.groupby(...)
返回一个GroupBy
对象(DataFrameGroupBy或SeriesGroupBy),以及与此,您可以迭代通过组(如文档解释这里)。您可以执行以下操作:grouped = df.groupby('A') for name, group in grouped: ...
当您应用在GROUPBY,在你的榜样的功能
df.groupby(...).agg(...)
(但是这也可以是transform
,apply
,mean
,...),你结合的结果应用的功能,不同的群体集中在一个数据框(在适用和结合的步骤groupby 的“拆分-应用-组合”范例)。因此,其结果将始终再次成为 DataFrame(或取决于应用的功能的系列)。
回答by khiner
You can iterate over the index values if your dataframe has already been created.
如果您的数据框已经创建,您可以迭代索引值。
df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
print name
print df.loc[name]
回答by Andrei Sura
Here is an example of iterating over a pd.DataFrame
grouped by the column atable
. For an sample usecase, "create" statements for an SQL database are generated within the for
loop:
这是一个迭代pd.DataFrame
按列分组的示例atable
。对于示例用例,SQL 数据库的“创建”语句是在for
循环内生成的:
import pandas as pd
df1 = pd.DataFrame({
'atable': ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
'column': ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
'is_null': ['No', 'No', 'Yes', 'No', 'Yes'],
})
df1_grouped = df1.groupby('atable')
# iterate over each group
for group_name, df_group in df1_grouped:
print('\nCREATE TABLE {}('.format(group_name))
for row_index, row in df_group.iterrows():
col = row['column']
column_type = row['column_type']
is_null = 'NOT NULL' if row['is_null'] == 'NO' else ''
print('\t{} {} {},'.format(col, column_type, is_null))
print(");")