Python 如何按熊猫中的值对系列进行分组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33483670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:26:02  来源:igfitidea点击:

How to group a Series by values in pandas?

pythonpandasgroup-byseries

提问by Martín Fixman

I currently have a pandas Serieswith dtype Timestamp, and I want to group it by date (and have many rows with different times in each group).

我目前有一个Series带有 dtype的熊猫Timestamp,我想按日期对其进行分组(并且每组中有许多行的时间不同)。

The seemingly obvious way of doing this would be something similar to

这样做的看似明显的方式将类似于

grouped = s.groupby(lambda x: x.date())

However, pandas' groupbygroups Series by its index. How can I make it group by value instead?

但是,pandasgroupby按其索引对 Series 进行分组。我怎样才能让它按值分组?

回答by mirthbottle

You should convert it to a DataFrame, then add a column that is the date(). You can do groupby on the DataFrame with the date column.

您应该将其转换为 DataFrame,然后添加一列 date()。您可以使用日期列对 DataFrame 进行 groupby。

df = pandas.DataFrame(s, columns=["datetime"])
df["date"] = df["datetime"].apply(lambda x: x.date())
df.groupby("date")

Then "date" becomes your index. You have to do it this way because the final grouped object needs an index so you can do things like select a group.

然后“日期”成为您的索引。您必须这样做,因为最终分组的对象需要一个索引,以便您可以执行诸如选择组之类的操作

回答by luca

grouped = s.groupby(s)

Or:

或者:

grouped = s.groupby(lambda x: s[x])

回答by Hangyu Liu

Three methods:

三种方法:

DataFrame: pd.groupby(['column']).size()

数据框: pd.groupby(['column']).size()

Series: sel.groupby(sel).size()

系列: sel.groupby(sel).size()

Series to DataFrame:

系列到数据帧:

pd.DataFrame( sel, columns=['column']).groupby(['column']).size()

pd.DataFrame( sel, columns=['column']).groupby(['column']).size()

回答by Andy Jones

For anyone else who wants to do this inline without throwing a lambda in (which tends to kill performance):

对于任何想要内联而不抛出 lambda 的人(这往往会降低性能):

s.to_frame(0).groupby(0)[0]

回答by mchl_k

To add another suggestion, I often use the following as it uses simple logic:

要添加另一个建议,我经常使用以下内容,因为它使用简单的逻辑:

pd.Series(index=s.values).groupby(level=0)