pandas 将日期列和时间列合并为日期时间列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20009408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:20:47  来源:igfitidea点击:

Combine date column and time column into datetime column

pythondatetimepandasdata-analysis

提问by yasar

I have a Pandas dataframe like this; (obtained by parsing an excel file)

我有一个这样的 Pandas 数据框;(通过解析excel文件获得)

|     |     COMPANY NAME           | MEETING DATE        | MEETING TIME|
-----------------------------------------------------------------------|
|YKSGR|    YAPI KRED? S?GORTA A.?. | 2013-12-16 00:00:00 |14:00:00     |
|TRCAS|    TURCAS PETROL A.?.      | 2013-12-12 00:00:00 |13:30:00     |

Column MEETING DATEis a timestamp with a representation like Timestamp('2013-12-20 00:00:00', tz=None)and MEETING TIMEis a datetime.timeobject with a representation like datetime.time(14, 0)

MEETING DATE是具有类似表示的时间戳,Timestamp('2013-12-20 00:00:00', tz=None)并且MEETING TIMEdatetime.time具有类似表示的对象datetime.time(14, 0)

I want to combine MEETING DATEand MEETING TIMEinto one column. datetime.combineseems to do what I want, however, I need to apply this function column-wise somehow. How can I achieve this?

我想将MEETING DATEMEETING TIME合并为一列。datetime.combine似乎做我想做的,但是,我需要以某种方式逐列应用这个函数。我怎样才能做到这一点?

回答by Roman Pekar

You can use apply method, and apply combine like this:

您可以使用 apply 方法,并像这样应用组合:

>>> df.apply(lambda x: combine(x['MEETING DATE'], x['MEETING TIME']), axis=1)
0   2013-12-16 14:00:00
1   2013-12-12 13:00:00

回答by jabellcu

Other solutions didn't work for me, so I came up with a workaround using replaceinstead of combine:

其他解决方案对我不起作用,所以我想出了一个使用replace代替的解决方法combine

def combine_date_time(df, datecol, timecol):
   return df.apply(lambda row: row[datecol].replace(
      hour=row[timecol].hour,
      minute=row[timecol].minute),
      axis=1
   )

In your case:

在你的情况下:

combine_date_time(df, 'MEETING DATE', 'MEETING TIME')

It feels slow (I haven't timed it properly), but it works.

感觉很慢(我没有正确计时),但它有效

UPDATE: I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar run times, but using combineis faster (59s for replacevs 50s for combine). Also, see jezraelanswer on this.

更新:我已经为相对较大的数据集(> 500.000 行)计时了这两种方法,它们都有相似的运行时间,但使用combine速度更快(59 秒replacevs 50 秒combine)。另外,请参阅jezrael 对此的回答。

UPDATE2: I have tried jezrael's approach:

UPDATE2:我尝试过jezrael的方法:

def combine_date_time(df, datecol, timecol):
    return pd.to_datetime(df[datecol].dt.date.astype(str)
                          + ' '
                          + df[timecol].astype(str))

This approach is blazing fast in comparison, jezraelis right. I haven't been able to measure it though, but it is evident.

相比之下,这种方法非常快,jezrael是对的。虽然我无法测量它,但它很明显。

回答by jezrael

You can convert Timecolumn first to stringand then to_timedelta, then is easy sum both columns:

您可以Time先将 column转换为string,然后to_timedelta,然后很容易将两列相加:

print (type(df['MEETING DATE'].iat[0]))
<class 'pandas.tslib.Timestamp'>

print (type(df['MEETING TIME'].iat[0]))
<class 'datetime.time'>

print (df['MEETING DATE'] + pd.to_timedelta(df['MEETING TIME'].astype(str)))
YKSGR   2013-12-16 14:00:00
TRCAS   2013-12-12 13:30:00
dtype: datetime64[ns]