迭代组(Python pandas 数据框)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29642404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:12:20  来源:igfitidea点击:

Iterating over groups (Python pandas dataframe)

pythonpandasiteratordataframegrouping

提问by Bunny

I want to iterate over groups that are grouped by strings or dates.

我想遍历按字符串或日期分组的组。

df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
                   'B': ['me', 'you', 'me'] * 2,
                   'C': [5, 2, 3, 4, 6, 9]}) 
groups = df.groupby('A')

For eg in this code, I have groups by their names 'foo' and 'bar', and I can loop over them using;

例如,在这段代码中,我有名称为“foo”和“bar”的组,我可以使用;

for name, group in groups:
   print name

My problem is I need to run another loop inside this loop and everytime I need to call different set of groups. like (assume groups has size n)

我的问题是我需要在这个循环中运行另一个循环,并且每次我需要调用不同的组。喜欢(假设组的大小为 n)

for name,group in groups:
   for name1 in range(name, name + 9):  # + 9 to get first 9 groups for every iteration`

Since, name is a string I am unable to do that. In short I just want a method by which I can access groups by numbers so that I can easily call required groups for computation.Something like

因为, name 是一个字符串,我无法做到这一点。简而言之,我只想要一种可以按数字访问组的方法,以便我可以轻松调用所需的组进行计算。就像是

groups = df.group('A')
for i in range(0,n):
    print group(i)[] + group(i+1)[]  

so if I have following groups [g1,g2,g3,g4,g5], i want to iteratively call them in pairs like [g1,g2], [g2,g3], [g3,g4] .... and take the intersection of the 2 groups of series everytime. I am looking for way to call groups [g1,g2,..g5] by index or some no. so that I can use them for loop operations. Currently only way I know to call groups is through the names of the group, as mentioned above in example 'foo' and 'bar'. I want power to do operations such as:

所以如果我有以下组 [g1,g2,g3,g4,g5],我想像 [g1,g2], [g2,g3], [g3,g4] .... 2组系列的交集每次。我正在寻找通过索引或某些编号调用组 [g1,g2,..g5] 的方法。这样我就可以将它们用于循环操作。目前我知道的唯一调用组的方法是通过组的名称,如上面示例“foo”和“bar”中所述。我想要权力做以下操作:

for name,group in groups-1:
   print gb.get_group(name)
   print gb.get_group(name+1)

I know this might be a simple problem, but I have been struggling for this part since a while. I would appreciate any kind of help.

我知道这可能是一个简单的问题,但一段时间以来我一直在为这部分苦苦挣扎。我将不胜感激任何形式的帮助。

回答by S Anand

The .groupby()object has a .groupsattribute that returns a Python dict of indices. In this case:

.groupby()对象具有.groups返回索引的 Python 字典的属性。在这种情况下:

In [26]: df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
   ....:                    'B': ['me', 'you', 'me'] * 2,
   ....:                    'C': [5, 2, 3, 4, 6, 9]})

In [27]: groups = df.groupby('A')

In [28]: groups.groups
Out[28]: {'bar': [1L, 3L, 5L], 'foo': [0L, 2L, 4L]}

You can iterate over this as follows:

您可以按如下方式迭代:

keys = groups.groups.keys()
for index in range(0, len(keys) - 1):
    g1 = df.ix[groups.groups[keys[index]]]
    g2 = df.ix[groups.groups[keys[index + 1]]]
    # Do something with g1, g2

However, please remember that using forloops to iterate over Pandas objects is generally slower than vector operations. Depending on what you need done, and if it needs to be fast, you may want to try other approaches.

但是,请记住,使用for循环来迭代 Pandas 对象通常比向量操作慢。根据您需要完成的工作,如果需要快速完成,您可能需要尝试其他方法。