pandas 迭代数据帧中的组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46230895/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Iterating over groups in a dataframe
提问by Tolki
The issue I am having is that I want to group the dataframe and then use functions to manipulate the data after its been grouped. For example I want to group the data by Date and then iterate through each row in the date groups to parse to a function?
我遇到的问题是我想对数据框进行分组,然后在分组后使用函数来操作数据。例如,我想按日期对数据进行分组,然后遍历日期组中的每一行以解析为函数?
The issue is groupby seems to create a tuple of the key and then a massive string consisting of all of the rows in the data making iterating through each row impossible
问题是 groupby 似乎创建了一个键的元组,然后是一个由数据中的所有行组成的大量字符串,使得遍历每一行变得不可能
回答by cs95
When you apply groupby
on a dataframe, you don't get rows, you get groups of dataframe. For example, consider:
当您应用groupby
数据框时,您不会获得行,而是获得数据框组。例如,考虑:
df
ID Date Days Volume/Day
0 111 2016-01-01 20 50
1 111 2016-02-01 25 40
2 111 2016-03-01 31 35
3 111 2016-04-01 30 30
4 111 2016-05-01 31 25
5 112 2016-01-01 31 55
6 112 2016-01-02 26 45
7 112 2016-01-03 31 40
8 112 2016-01-04 30 35
9 112 2016-01-05 31 30
for i, g in df.groupby('ID'):
print(g, '\n')
ID Date Days Volume/Day
0 111 2016-01-01 20 50
1 111 2016-02-01 25 40
2 111 2016-03-01 31 35
3 111 2016-04-01 30 30
4 111 2016-05-01 31 25
ID Date Days Volume/Day
5 112 2016-01-01 31 55
6 112 2016-01-02 26 45
7 112 2016-01-03 31 40
8 112 2016-01-04 30 35
9 112 2016-01-05 31 30
For your case, you should probably look into dfGroupby.apply
, if you want to apply some function on your groups, dfGroupby.transform
to produce like indexed dataframe (see docs for explanation) or dfGroupby.agg
, if you want to produce aggregated results.
对于您的情况,您可能应该查看dfGroupby.apply
,如果您想对您的组应用某些功能,dfGroupby.transform
以生成类似索引的数据框(请参阅文档以获取解释),或者dfGroupby.agg
,如果您想生成聚合结果。
You'd do something like:
你会做这样的事情:
r = df.groupby('Date').apply(your_function)
You'd define your function as:
您可以将函数定义为:
def your_function(df):
... # operation on df
return result
If you have problems with the implementation, please open a new question, post your data and your code, and any associated errors/tracebacks. Happy coding.
如果您在实施中遇到问题,请打开一个新问题,发布您的数据和代码,以及任何相关的错误/回溯。快乐编码。