python pandas 从日期时间中提取年份 --- df['year'] = df['date'].year 不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30405413/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:22:48  来源:igfitidea点击:

python pandas extract year from datetime --- df['year'] = df['date'].year is not working

pythondatetimepandasextractdataframe

提问by MJS

Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.

抱歉,这个问题似乎是重复的——我希望答案会让我觉得自己像个傻瓜……但我在使用 SO 上类似问题的答案时没有任何运气。

I am importing data in through read_csv, but for some reason which I cannot figure out, I am not able to extract the year or month from the dataframe series df['date'].

我正在通过 导入数据read_csv,但由于某种我无法弄清楚的原因,我无法从数据帧系列中提取年份或月份df['date']

date    Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469

df = pd.read_csv('sample_data.csv',parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].year
df['month'] = df['date'].month

But this returns:

但这会返回:

AttributeError: 'Series' object has no attribute 'year'

AttributeError: 'Series' 对象没有属性 'year'

Thanks in advance.

提前致谢。

UPDATE:

更新:

df = pd.read_csv('sample_data.csv',parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

this generates the same "AttributeError: 'Series' object has no attribute 'dt' "

这会生成相同的“AttributeError: 'Series' 对象没有属性 'dt'”

FOLLOW UP:

跟进:

I am using Spyder 2.3.1 with Python 3.4.1 64bit, but cannot update pandas to a newer release (currently on 0.14.1). Each of the following generates an invalid syntax error:

我将 Spyder 2.3.1 与 Python 3.4.1 64 位一起使用,但无法将 Pandas 更新到较新的版本(目前为 0.14.1)。以下每个都会产生无效的语法错误:

conda update pandas

conda install pandas==0.15.2

conda install -f pandas

Any ideas?

有任何想法吗?

采纳答案by EdChum

If you're running a recent-ish version of pandas then you can use the datetime attribute dtto access the datetime components:

如果您运行的是最新版本的 Pandas,那么您可以使用 datetime 属性dt来访问 datetime 组件:

In [6]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

EDIT

编辑

It looks like you're running an older version of pandas in which case the following would work:

看起来您正在运行旧版本的熊猫,在这种情况下,以下内容将起作用:

In [18]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Regarding why it didn't parse this into a datetime in read_csvyou need to pass the ordinal position of your column ([0]) because when Trueit tries to parse columns [1,2,3]see the docs

关于为什么它没有将其解析为日期时间,read_csv您需要传递列 ( [0])的序号位置,因为当True它尝试解析列时,[1,2,3]请参阅文档

In [20]:

t="""date   Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date     5 non-null datetime64[ns]
Count    5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes

So if you pass param parse_dates=[0]to read_csvthere shouldn't be any need to call to_datetimeon the 'date' column after loading.

因此,如果您将 param 传递parse_dates=[0]给加载后,则read_csv无需调用to_datetime“日期”列。

回答by Mike Müller

This works:

这有效:

df['date'].dt.year

Now:

现在:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

gives this data frame:

给出这个数据框:

        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

回答by Jimmy

What worked for me was upgrading pandas to latest version:

对我有用的是将熊猫升级到最新版本:

From Command Line do:

从命令行执行:

conda update pandas

回答by jpp

When to use dtaccessor

何时使用dt存取器

A common source of confusion revolves around when to use .yearand when to use .dt.year.

一个常见的混淆来源围绕着何时使用.year和何时使用.dt.year

The former is an attribute for pd.DatetimeIndexobjects; the latter for pd.Seriesobjects. Consider this dataframe:

前者是pd.DatetimeIndex对象的属性;后者用于pd.Series对象。考虑这个数据框:

df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
                  index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))

The definition of the series and index look similar, but the pd.DataFrameconstructor converts them to different types:

系列和索引的定义看起来很相似,但pd.DataFrame构造函数将它们转换为不同的类型:

type(df.index)     # pandas.tseries.index.DatetimeIndex
type(df['Dates'])  # pandas.core.series.Series

The DatetimeIndexobject has a direct yearattribute, while the Seriesobject must use the dtaccessor. Similarly for month:

DatetimeIndex对象有一个直接的year属性,而Series对象必须使用的dt访问。同样对于month

df.index.month               # array([1, 1, 1])
df['Dates'].dt.month.values  # array([ 1, 10, 12], dtype=int64)

A subtle but important difference worth noting is that df.index.monthgives a NumPy array, while df['Dates'].dt.monthgives a Pandas series. Above, we use pd.Series.valuesto extract the NumPy array representation.

值得注意的一个微妙但重要的区别是,它df.index.month给出了一个 NumPy 数组,而df['Dates'].dt.month给出了一个 Pandas 系列。上面,我们pd.Series.values用来提取 NumPy 数组表示。

回答by Amit Gupta

Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day

可能已经来不及回答了,但是由于您在加载数据时已经解析了日期,因此您可以这样做以获得这一天

df['date'] = pd.DatetimeIndex(df['date']).year