python pandas 从日期时间中提取年份 --- df['year'] = df['date'].year 不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30405413/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas extract year from datetime --- df['year'] = df['date'].year is not working
提问by MJS
Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.
抱歉,这个问题似乎是重复的——我希望答案会让我觉得自己像个傻瓜……但我在使用 SO 上类似问题的答案时没有任何运气。
I am importing data in through read_csv
, but for some reason which I cannot figure out, I am not able to extract the year or month from the dataframe series df['date']
.
我正在通过 导入数据read_csv
,但由于某种我无法弄清楚的原因,我无法从数据帧系列中提取年份或月份df['date']
。
date Count
6/30/2010 525
7/30/2010 136
8/31/2010 125
9/30/2010 84
10/29/2010 4469
df = pd.read_csv('sample_data.csv',parse_dates=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].year
df['month'] = df['date'].month
But this returns:
但这会返回:
AttributeError: 'Series' object has no attribute 'year'
AttributeError: 'Series' 对象没有属性 'year'
Thanks in advance.
提前致谢。
UPDATE:
更新:
df = pd.read_csv('sample_data.csv',parse_dates=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
this generates the same "AttributeError: 'Series' object has no attribute 'dt' "
这会生成相同的“AttributeError: 'Series' 对象没有属性 'dt'”
FOLLOW UP:
跟进:
I am using Spyder 2.3.1 with Python 3.4.1 64bit, but cannot update pandas to a newer release (currently on 0.14.1). Each of the following generates an invalid syntax error:
我将 Spyder 2.3.1 与 Python 3.4.1 64 位一起使用,但无法将 Pandas 更新到较新的版本(目前为 0.14.1)。以下每个都会产生无效的语法错误:
conda update pandas
conda install pandas==0.15.2
conda install -f pandas
Any ideas?
有任何想法吗?
采纳答案by EdChum
If you're running a recent-ish version of pandas then you can use the datetime attribute dt
to access the datetime components:
如果您运行的是最新版本的 Pandas,那么您可以使用 datetime 属性dt
来访问 datetime 组件:
In [6]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
EDIT
编辑
It looks like you're running an older version of pandas in which case the following would work:
看起来您正在运行旧版本的熊猫,在这种情况下,以下内容将起作用:
In [18]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
Regarding why it didn't parse this into a datetime in read_csv
you need to pass the ordinal position of your column ([0]
) because when True
it tries to parse columns [1,2,3]
see the docs
关于为什么它没有将其解析为日期时间,read_csv
您需要传递列 ( [0]
)的序号位置,因为当True
它尝试解析列时,[1,2,3]
请参阅文档
In [20]:
t="""date Count
6/30/2010 525
7/30/2010 136
8/31/2010 125
9/30/2010 84
10/29/2010 4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date 5 non-null datetime64[ns]
Count 5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes
So if you pass param parse_dates=[0]
to read_csv
there shouldn't be any need to call to_datetime
on the 'date' column after loading.
因此,如果您将 param 传递parse_dates=[0]
给加载后,则read_csv
无需调用to_datetime
“日期”列。
回答by Mike Müller
This works:
这有效:
df['date'].dt.year
Now:
现在:
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
gives this data frame:
给出这个数据框:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
回答by Jimmy
What worked for me was upgrading pandas to latest version:
对我有用的是将熊猫升级到最新版本:
From Command Line do:
从命令行执行:
conda update pandas
回答by jpp
When to use dt
accessor
何时使用dt
存取器
A common source of confusion revolves around when to use .year
and when to use .dt.year
.
一个常见的混淆来源围绕着何时使用.year
和何时使用.dt.year
。
The former is an attribute for pd.DatetimeIndex
objects; the latter for pd.Series
objects. Consider this dataframe:
前者是pd.DatetimeIndex
对象的属性;后者用于pd.Series
对象。考虑这个数据框:
df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))
The definition of the series and index look similar, but the pd.DataFrame
constructor converts them to different types:
系列和索引的定义看起来很相似,但pd.DataFrame
构造函数将它们转换为不同的类型:
type(df.index) # pandas.tseries.index.DatetimeIndex
type(df['Dates']) # pandas.core.series.Series
The DatetimeIndex
object has a direct year
attribute, while the Series
object must use the dt
accessor. Similarly for month
:
该DatetimeIndex
对象有一个直接的year
属性,而Series
对象必须使用的dt
访问。同样对于month
:
df.index.month # array([1, 1, 1])
df['Dates'].dt.month.values # array([ 1, 10, 12], dtype=int64)
A subtle but important difference worth noting is that df.index.month
gives a NumPy array, while df['Dates'].dt.month
gives a Pandas series. Above, we use pd.Series.values
to extract the NumPy array representation.
值得注意的一个微妙但重要的区别是,它df.index.month
给出了一个 NumPy 数组,而df['Dates'].dt.month
给出了一个 Pandas 系列。上面,我们pd.Series.values
用来提取 NumPy 数组表示。
回答by Amit Gupta
Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day
可能已经来不及回答了,但是由于您在加载数据时已经解析了日期,因此您可以这样做以获得这一天
df['date'] = pd.DatetimeIndex(df['date']).year