pandas 按月份名称对熊猫的数据框系列进行排序？

Question

提问by J_p

I have a Series object that has:

我有一个 Series 对象，它具有：

    date   price
    dec      12
    may      15
    apr      13
    ..

Problem statement:I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.

问题陈述：我想让它按月出现并计算每个月的平均价格并按月排序。

Desired Output:

期望输出：

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

I thought of making a list and passing it in a sort function:

我想制作一个列表并将其传递给排序函数：

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

but the sort_valuesdoesn't support that for series.

但sort_values不支持系列。

One big problem I have is that even though

我遇到的一个大问题是，即使

df = df.sort_values(by='date',ascending=True,inplace=True)works to the initial dfbut after I did a groupby, it didn't maintain the order coming out from the sorted df.

df = df.sort_values(by='date',ascending=True,inplace=True)适用于初始df但在我做了一个之后groupby，它没有保持从 sorted 出来的顺序df。

To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.

总而言之，我需要从初始数据框中获得这两列。使用月份 (dt.strftime('%B')) 对 datetime 列进行排序并通过 groupby 排序变得混乱。现在我必须按月份名称对其进行排序。

My code:

我的代码：

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically

Answer 1

采纳答案by Tai

Thanks @Brad Solomon for offering a faster way to capitalize string!

感谢@Brad Solomon 提供了一种更快的字符串大写方法！

Note 1@Brad Solomon's answer using pd.categoricalshould save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P

注意 1@Brad Solomon's answer usingpd.categorical应该比我的答案更节省您的资源。他展示了如何为您的分类数据分配顺序。你不应该错过它：P

Alternatively, you can use.

或者，您可以使用。

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                  ["aug", 11], ["jan", 11], ["jan", 1]], 
                   columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()

# Now the dataset should look like
#   Month Price
#   -----------
#    Dec    XX
#    Jan    XX
#    Apr    XX

# make it a datetime so that we can sort it: 
# use %b because the data use the abbriviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")

total = (df.groupby(df['Month"])['Price'].mean())

# total 
Month
1     17.333333
3     11.000000
8     16.000000
12    12.000000

Note 2groupbyby default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY)and total = (df.groupby(df[SAME_KEY])['Price'].mean()).Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way?for more information.

Note 2groupby默认情况下会为您排序组键。请注意在和中使用相同的键进行排序和分组df = df.sort_values(by=SAME_KEY)，total = (df.groupby(df[SAME_KEY])['Price'].mean()).否则可能会出现意外行为。请参阅Groupby 保留组之间的顺序？用哪种方法？想要查询更多的信息。

Note 3A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don't need dfto be sorted.

注 3一种计算效率更高的方法是先计算均值，然后按月进行排序。这样，您只需要对 12 个项目而不是整个df. 如果不需要df排序，它将降低计算成本。

Note 4For people already have monthas index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex@jezrael has a working example on making categorical index ordered in Pandas series sort by month index

Note 4对于已经有monthas index并且想知道如何将其分类的人，请查看pandas。CategoricalIndex@jezrael 有一个在Pandas 系列中按月索引排序分类索引的工作示例

Answer 2

回答by Brad Solomon

You can use categorical data to enable proper sorting:

您可以使用分类数据来启用正确的排序：

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...)  # same as you have now; can use inplace=True

When you specify the categories, pandas remembers the order of specification as the default sort order.

当您指定类别时，pandas 会记住指定的顺序作为默认排序顺序。

Docs: Pandas categories > sorting & order.

文档：Pandas 类别 >排序和排序。

Answer 3

回答by Abhay S

You should consider re-indexing it based on axis 0 (indexes)

您应该考虑根据轴 0（索引）重新索引它

new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

df1 = df.reindex(new_order, axis=0)

Answer 4

回答by anky

I would use the calendermodule and reindex:

我会使用calender模块和reindex：

series.str.capitalizehelps capitalizing the series , then we create a dictionary with the calendermodule and mapwith the series to get month number.

series.str.capitalize帮助大写 series ，然后我们用calender模块和map系列创建一个字典来获取月份数。

Once we have the month number we can sort_values()and get the index. Then reindex.

一旦我们有了月份数，我们就可以sort_values()得到索引。然后reindex。

import calendar
df.date=df.date.str.capitalize() #capitalizes the series
d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary
#d={i[:3]:e for e,i in enumerate(calendar.month_name)} 
df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index

  date  price
2  Apr     13
1  May     15
0  Dec     12

Answer 5

回答by Dinesh Babu

use Sort_Dataframeby_Monthfunction to sort month names in chronological order

使用Sort_Dataframeby_Month函数按时间顺序对月份名称进行排序

Packages need to install.

需要安装包。

$ pip install sorted-months-weekdays
$ pip install sort-dataframeby-monthorweek

example:

例子：

from sorted_months_weekdays import *

from sort_dataframeby_monthorweek import *

df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum'])
df
Out[11]: 
  Month  Sum
0   Jan   23
1   Jan   16
2   Dec   35
3   Apr   79
4   Mar   53
5   Mar   12
6   Feb    3

To sort dataframe by Month use below function

要按月对数据框进行排序，请使用以下函数

Sort_Dataframeby_Month(df=df,monthcolumnname='Month')
Out[14]: 
  Month  Sum
0   Jan   23
1   Jan   16
2   Feb    3
3   Mar   53
4   Mar   12
5   Apr   79
6   Dec   35

Answer 6

回答by Zellint

You can add the numerical month value together with the name in the index (i.e "01 January"), do a sort then strip off the number:

您可以将数字月份值与索引中的名称（即“01 January”）一起添加，进行排序然后去掉数字：

total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index()

It may look sth like this:

它可能看起来像这样：

01 January  xxx
02 February     yyy
03 March    zzz
04 April    ttt

 total.index = [ x.split()[1] for x in total.index ]

January xxx
February yyy
March zzz
April ttt

pandas 按月份名称对熊猫的数据框系列进行排序？

提问by J_p

采纳答案by Tai

回答by Brad Solomon

回答by Abhay S

回答by anky

回答by Dinesh Babu

回答by Zellint

相关推荐

最近更新

标签

pandas 按月份名称对熊猫的数据框系列进行排序？

提问by J_p

采纳答案by Tai

回答by Brad Solomon

回答by Abhay S

回答by anky

回答by Dinesh Babu

回答by Zellint

相关推荐

Pandas - 将时间戳四舍五入到最接近的秒

pandas 如何将字符串转换为整数熊猫

pandas 使用 pymssql 将数据插入 SQL Server 表

pandas 根据条件从数据框中删除行

相关推荐

最近更新

标签