将 python pandas 中的一列从 STRING MONTH 转换为 INT

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42684530/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:08:34  来源:igfitidea点击:

convert a column in a python pandas from STRING MONTH into INT

python-2.7pandasmonthcalendar

提问by Chubaka

In Python 2.7.11 & Pandas 0.18.1:

在 Python 2.7.11 和 Pandas 0.18.1 中:

If we have the following csv file:

如果我们有以下 csv 文件:

YEAR,MONTH,ID
2011,JAN,1
2011,FEB,1
2011,MAR,1

Is there any way to read it as a Pandas data frame and convert the MONTH column into strings like this?

有什么方法可以将其作为 Pandas 数据框读取并将 MONTH 列转换为这样的字符串?

YEAR,MONTH,ID
2011,1,1
2011,2,1
2011,3,1

Some pandas functions such as "dt.strftime('%b')" doesn't seem to work. Could someone enlighten?

一些 Pandas 函数,例如“dt.strftime('%b')”似乎不起作用。有人能指教吗?

回答by MaxU

I guess the easiest and one of the fastest method would be to create a mapping dict and map like as follows:

我想最简单也是最快的方法之一是创建一个映射字典和映射,如下所示:

In [2]: df
Out[2]:
   YEAR MONTH  ID
0  2011   JAN   1
1  2011   FEB   1
2  2011   MAR   1

In [3]: d = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, }

In [4]: df.MONTH = df.MONTH.map(d)

In [5]: df
Out[5]:
   YEAR  MONTH  ID
0  2011      1   1
1  2011      2   1
2  2011      3   1

you may want to use df.MONTH = df.MONTH.str.upper().map(d)if not all MONTHvalues are in upper case

df.MONTH = df.MONTH.str.upper().map(d)如果不是所有MONTH值都大写,您可能想要使用

another more slower but more robust method:

另一种更慢但更健壮的方法:

In [11]: pd.to_datetime(df.MONTH, format='%b').dt.month
Out[11]:
0    1
1    2
2    3
Name: MONTH, dtype: int64

UPDATE:we can create a mapping automatically (thanks to @Quetzalcoatl)

更新:我们可以自动创建映射(感谢@Quetzalcoatl

import calendar

d = dict((v,k) for k,v in enumerate(calendar.month_abbr))

or alternatively (using only Pandas):

或者(仅使用 Pandas):

d = dict(zip(range(1,13), pd.date_range('2000-01-01', freq='M', periods=12).strftime('%b')))