pandas 将日期列和时间列合并为日期时间列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20009408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combine date column and time column into datetime column
提问by yasar
I have a Pandas dataframe like this; (obtained by parsing an excel file)
我有一个这样的 Pandas 数据框;(通过解析excel文件获得)
| | COMPANY NAME | MEETING DATE | MEETING TIME|
-----------------------------------------------------------------------|
|YKSGR| YAPI KRED? S?GORTA A.?. | 2013-12-16 00:00:00 |14:00:00 |
|TRCAS| TURCAS PETROL A.?. | 2013-12-12 00:00:00 |13:30:00 |
Column MEETING DATEis a timestamp with a representation like Timestamp('2013-12-20 00:00:00', tz=None)and MEETING TIMEis a datetime.timeobject with a representation like datetime.time(14, 0)
列MEETING DATE是具有类似表示的时间戳,Timestamp('2013-12-20 00:00:00', tz=None)并且MEETING TIME是datetime.time具有类似表示的对象datetime.time(14, 0)
I want to combine MEETING DATEand MEETING TIMEinto one column. datetime.combineseems to do what I want, however, I need to apply this function column-wise somehow. How can I achieve this?
我想将MEETING DATE和MEETING TIME合并为一列。datetime.combine似乎做我想做的,但是,我需要以某种方式逐列应用这个函数。我怎样才能做到这一点?
回答by Roman Pekar
You can use apply method, and apply combine like this:
您可以使用 apply 方法,并像这样应用组合:
>>> df.apply(lambda x: combine(x['MEETING DATE'], x['MEETING TIME']), axis=1)
0 2013-12-16 14:00:00
1 2013-12-12 13:00:00
回答by jabellcu
Other solutions didn't work for me, so I came up with a workaround using replaceinstead of combine:
其他解决方案对我不起作用,所以我想出了一个使用replace代替的解决方法combine:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1
)
In your case:
在你的情况下:
combine_date_time(df, 'MEETING DATE', 'MEETING TIME')
It feels slow (I haven't timed it properly), but it works.
感觉很慢(我没有正确计时),但它有效。
UPDATE: I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar run times, but using combineis faster (59s for replacevs 50s for combine). Also, see jezraelanswer on this.
更新:我已经为相对较大的数据集(> 500.000 行)计时了这两种方法,它们都有相似的运行时间,但使用combine速度更快(59 秒replacevs 50 秒combine)。另外,请参阅jezrael 对此的回答。
UPDATE2: I have tried jezrael's approach:
UPDATE2:我尝试过jezrael的方法:
def combine_date_time(df, datecol, timecol):
return pd.to_datetime(df[datecol].dt.date.astype(str)
+ ' '
+ df[timecol].astype(str))
This approach is blazing fast in comparison, jezraelis right. I haven't been able to measure it though, but it is evident.
相比之下,这种方法非常快,jezrael是对的。虽然我无法测量它,但它很明显。
回答by jezrael
You can convert Timecolumn first to stringand then to_timedelta, then is easy sum both columns:
您可以Time先将 column转换为string,然后to_timedelta,然后很容易将两列相加:
print (type(df['MEETING DATE'].iat[0]))
<class 'pandas.tslib.Timestamp'>
print (type(df['MEETING TIME'].iat[0]))
<class 'datetime.time'>
print (df['MEETING DATE'] + pd.to_timedelta(df['MEETING TIME'].astype(str)))
YKSGR 2013-12-16 14:00:00
TRCAS 2013-12-12 13:30:00
dtype: datetime64[ns]

