pandas 熊猫 groupby 后缺少列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24456365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:12:08  来源:igfitidea点击:

missing column after pandas groupby

pythonpandasgroup-bydataframe

提问by user3439329

I've got a pandas dataframe df. I group it by 3 columns, and count the results. When I do this I lose some information, specifically, the namecolumn. This column is mapped 1:1 with the desk_idcolumn. Is there anyway to include both in my final dataframe?

我有一个Pandas数据框df。我将其按 3 列分组,并计算结果。当我这样做时,我会丢失一些信息,特别是name列。此列与desk_id列按1:1 映射。无论如何都要将两者都包含在我的最终数据框中?

here is the dataframe:

这是数据框:

   shift_id    shift_start_time      shift_end_time        name                   end_time       desk_id  shift_hour
0  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.040000  15557987           2
1  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.096000  15557987           2
2  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:52:17.402000  15557987           2
3  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 11:06:59.083000  15557987           3
4  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 08:27:57.998000  15557987           0

I group it like this:

我这样分组:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour']).size()
grouped = grouped.reset_index()

And here is the result, missing the namecolumn.

这是结果,缺少name列。

    desk_id  shift_id  shift_hour  0
0  14468690  37729081           0  7
1  14468690  37729081           1  3
2  14468690  37729081           2  6
3  14468690  37729081           3  5
4  14468690  37729082           0  5

Also, anyway to rename the count column as 'count' instead of '0'?

另外,无论如何要将计数列重命名为“计数”而不是“0”?

采纳答案by CT Zhu

You need to include 'name'in groupbyby groups:

您需要包括'name'groupby通过组:

In [43]:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour', 'name']).size()
grouped = grouped.reset_index()
grouped.columns=np.where(grouped.columns==0, 'count', grouped.columns) #replace the default 0 to 'count'
print grouped
    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           2  Adam Scott      3
2  15557987  37423064           3  Adam Scott      1

If the name-to-id relationship is a many-to-one type, say we have a pete scott for the same set of data, the result will become:

如果 name-to-id 关系是多对一类型,假设我们有一个 pete scott 用于同一组数据,结果将变为:

    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           0  Pete Scott      1
2  15557987  37423064           2  Adam Scott      3
3  15557987  37423064           2  Pete Scott      3
4  15557987  37423064           3  Adam Scott      1
5  15557987  37423064           3  Pete Scott      1