pandas groupby 分组和亚组级别分析

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35095128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:35:31  来源:igfitidea点击:

pandas groupby group and subgroup level analysis

python-2.7pandasdataframe

提问by Siraj S.

On a multi-column groupbyobject, how do I access only the outer column? For e.g. below, i can access the inner column (entertainment content) through: df.get_group(('media', 'entertainment content'))command, I desire to be able to also access something like: df.get_group(('media')) but it throws an error: "ValueError: must supply a tuple to get_group with multiple grouping keys"

在多列groupby对象上,如何仅访问外列?例如下面,我可以通过:df.get_group(('media', 'entertainment content'))命令访问内列(娱乐内容),我希望也能够访问类似的东西:df.get_group(('media')) 但它抛出一个错误:“ValueError: must supply带有多个分组键的 get_group 元组”

[('media', 'entertainment content'),('media', 'internet media')]

df.get_group(('media', 'entertainment content'))
                                     lasts      vol        prev ticker
industry sub_industry                                                 
media    entertainment content  379.200012  1828139  354.000000  suntv
         entertainment content  420.049988  2675741  404.600006      z

temp.get_group(('media'))
ValueError: must supply a tuple to get_group with multiple grouping keys

回答by Siraj S.

If you just want to access 'media', you don't need the extra set of parentheses when you call get_group. So it'd just be get_group('media').

如果您只想访问“媒体”,则在调用get_group. 所以它只是get_group('media')

If you wanted to retrieve multiple groups, that's when you would use an extra set of parentheses, which would create the tuple. For instance: get_group(('media','pizza'))

如果您想检索多个组,那么您将使用一组额外的括号,这将创建元组。例如:get_group(('media','pizza'))

回答by Alberto

As with pandas.get_groupit looks like it is not possible to access a single key after grouping by more than one key, I suggest the following alternative method.

由于pandas.get_group看起来不可能在按多个键分组后访问单个键,我建议使用以下替代方法。

Generating the data frame:

生成数据框:

import pandas as pd
import numpy as np

rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['foo', 'bar'] * 12,
               'B': rand.randn(24),
               'C': rand.randint(0, 20, 24),
               'D': ['aaa','bbb','ccc'] * 8})

Grouping by multiple keys ('A' and 'D') and using pandas.ngroupto assign a group number, storing it in a new column:

按多个键('A' 和 'D')分组并pandas.ngroup用于分配组号,将其存储在新列中:

df["grouping_AandD"] = df.groupby(["A", "D"]).ngroup()

Using the just created column to display all combinations in a loop but show only those containing the 'wanted key' ('foo' in this case):

使用刚刚创建的列在循环中显示所有组合,但只显示那些包含“想要的键”(在本例中为“foo”)的组合:

wanted_key = "foo"
for i in range(0, df.grouping_AandD.nunique()):
    grouped_df = df[df.grouping_AandD == i]
    if (grouped_df.A.all() == wanted_key):
        print(grouped_df)

回答by Mike Müller

Just do what the error message says and use a tuple:

只需按照错误消息的说明操作并使用元组:

temp.get_group(('media',))

Note the trailing comma.

注意结尾的逗号。

回答by Arjun Varshney

I was trying to do something similar (creating columns for each subgroups). But, as far as I know, the approach below suited me and would help you as well. I tried to find the solution in the cookbook pandas documentation has provided, but it didn't help. Here is the way, I would suggest,

我试图做一些类似的事情(为每个子组创建列)。但是,据我所知,下面的方法适合我,也会对你有所帮助。我试图在Pandas文档提供的食谱中找到解决方案,但没有帮助。这是我建议的方法,

grp = df.groupby('industry', 'sub_industry') values = []

grp = df.groupby('industry', 'sub_industry') values = []

for sub_ind in (df.sub_industry.unique()): values.append(grp.get_group(('media', sub_ind)))

for sub_ind in (df.sub_industry.unique()): values.append(grp.get_group(('media', sub_ind)))