pandas groupby 分组和亚组级别分析

Question

提问by Siraj S.

On a multi-column groupbyobject, how do I access only the outer column? For e.g. below, i can access the inner column (entertainment content) through: df.get_group(('media', 'entertainment content'))command, I desire to be able to also access something like: df.get_group(('media')) but it throws an error: "ValueError: must supply a tuple to get_group with multiple grouping keys"

在多列groupby对象上，如何仅访问外列？例如下面，我可以通过：df.get_group(('media', 'entertainment content'))命令访问内列（娱乐内容），我希望也能够访问类似的东西：df.get_group(('media')) 但它抛出一个错误：“ValueError: must supply带有多个分组键的 get_group 元组”

[('media', 'entertainment content'),('media', 'internet media')]

df.get_group(('media', 'entertainment content'))
                                     lasts      vol        prev ticker
industry sub_industry                                                 
media    entertainment content  379.200012  1828139  354.000000  suntv
         entertainment content  420.049988  2675741  404.600006      z

temp.get_group(('media'))
ValueError: must supply a tuple to get_group with multiple grouping keys

Answer 1

回答by Siraj S.

If you just want to access 'media', you don't need the extra set of parentheses when you call get_group. So it'd just be get_group('media').

如果您只想访问“媒体”，则在调用get_group. 所以它只是get_group('media')。

If you wanted to retrieve multiple groups, that's when you would use an extra set of parentheses, which would create the tuple. For instance: get_group(('media','pizza'))

如果您想检索多个组，那么您将使用一组额外的括号，这将创建元组。例如：get_group(('media','pizza'))

Answer 2

回答by Alberto

As with pandas.get_groupit looks like it is not possible to access a single key after grouping by more than one key, I suggest the following alternative method.

由于pandas.get_group看起来不可能在按多个键分组后访问单个键，我建议使用以下替代方法。

Generating the data frame:

生成数据框：

import pandas as pd
import numpy as np

rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['foo', 'bar'] * 12,
               'B': rand.randn(24),
               'C': rand.randint(0, 20, 24),
               'D': ['aaa','bbb','ccc'] * 8})

Grouping by multiple keys ('A' and 'D') and using pandas.ngroupto assign a group number, storing it in a new column:

按多个键（'A' 和 'D'）分组并pandas.ngroup用于分配组号，将其存储在新列中：

df["grouping_AandD"] = df.groupby(["A", "D"]).ngroup()

Using the just created column to display all combinations in a loop but show only those containing the 'wanted key' ('foo' in this case):

使用刚刚创建的列在循环中显示所有组合，但只显示那些包含“想要的键”（在本例中为“foo”）的组合：

wanted_key = "foo"
for i in range(0, df.grouping_AandD.nunique()):
    grouped_df = df[df.grouping_AandD == i]
    if (grouped_df.A.all() == wanted_key):
        print(grouped_df)

Answer 3

回答by Mike Müller

Just do what the error message says and use a tuple:

只需按照错误消息的说明操作并使用元组：

temp.get_group(('media',))

Note the trailing comma.

注意结尾的逗号。

Answer 4

回答by Arjun Varshney

I was trying to do something similar (creating columns for each subgroups). But, as far as I know, the approach below suited me and would help you as well. I tried to find the solution in the cookbook pandas documentation has provided, but it didn't help. Here is the way, I would suggest,

我试图做一些类似的事情（为每个子组创建列）。但是，据我所知，下面的方法适合我，也会对你有所帮助。我试图在Pandas文档提供的食谱中找到解决方案，但没有帮助。这是我建议的方法，

grp = df.groupby('industry', 'sub_industry') values = []

for sub_ind in (df.sub_industry.unique()): values.append(grp.get_group(('media', sub_ind)))

pandas groupby 分组和亚组级别分析

提问by Siraj S.

回答by Siraj S.

回答by Alberto

回答by Mike Müller

回答by Arjun Varshney

相关推荐

最近更新

标签

pandas groupby 分组和亚组级别分析

提问by Siraj S.

回答by Siraj S.

回答by Alberto

回答by Mike Müller

回答by Arjun Varshney

相关推荐

pandas 类型错误：不能将非类别项目附加到 CategoricalIndex

pandas 通过另一列熊猫找到列组的最大值

pandas 在sklearn中将文本列转换为数字

pandas 使用熊猫分组数据的堆积条形图

相关推荐

最近更新

标签