Python 如何按熊猫中的值对系列进行分组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33483670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to group a Series by values in pandas?
提问by Martín Fixman
I currently have a pandas Series
with dtype Timestamp
, and I want to group it by date (and have many rows with different times in each group).
我目前有一个Series
带有 dtype的熊猫Timestamp
,我想按日期对其进行分组(并且每组中有许多行的时间不同)。
The seemingly obvious way of doing this would be something similar to
这样做的看似明显的方式将类似于
grouped = s.groupby(lambda x: x.date())
However, pandas' groupby
groups Series by its index. How can I make it group by value instead?
但是,pandasgroupby
按其索引对 Series 进行分组。我怎样才能让它按值分组?
回答by mirthbottle
You should convert it to a DataFrame, then add a column that is the date(). You can do groupby on the DataFrame with the date column.
您应该将其转换为 DataFrame,然后添加一列 date()。您可以使用日期列对 DataFrame 进行 groupby。
df = pandas.DataFrame(s, columns=["datetime"])
df["date"] = df["datetime"].apply(lambda x: x.date())
df.groupby("date")
Then "date" becomes your index. You have to do it this way because the final grouped object needs an index so you can do things like select a group.
然后“日期”成为您的索引。您必须这样做,因为最终分组的对象需要一个索引,以便您可以执行诸如选择组之类的操作。
回答by luca
grouped = s.groupby(s)
Or:
或者:
grouped = s.groupby(lambda x: s[x])
回答by Hangyu Liu
Three methods:
三种方法:
DataFrame: pd.groupby(['column']).size()
数据框: pd.groupby(['column']).size()
Series: sel.groupby(sel).size()
系列: sel.groupby(sel).size()
Series to DataFrame:
系列到数据帧:
pd.DataFrame( sel, columns=['column']).groupby(['column']).size()
pd.DataFrame( sel, columns=['column']).groupby(['column']).size()
回答by Andy Jones
For anyone else who wants to do this inline without throwing a lambda in (which tends to kill performance):
对于任何想要内联而不抛出 lambda 的人(这往往会降低性能):
s.to_frame(0).groupby(0)[0]
回答by mchl_k
To add another suggestion, I often use the following as it uses simple logic:
要添加另一个建议,我经常使用以下内容,因为它使用简单的逻辑:
pd.Series(index=s.values).groupby(level=0)