使用 python pandas 合并日期和时间列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17978092/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combine Date and Time columns using python pandas
提问by richie
I have a pandas dataframe with the following columns;
我有一个包含以下列的熊猫数据框;
Date Time
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
How do I combine data['Date'] & data['Time'] to get the following? Is there a way of doing it using pd.to_datetime?
如何组合 data['Date'] 和 data['Time'] 以获得以下内容?有没有办法做到这一点pd.to_datetime?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
采纳答案by Andy Hayden
It's worth mentioning that you may have been able to read this in directlye.g. if you were using read_csvusing parse_dates=[['Date', 'Time']].
值得一提的是,您可能已经能够直接阅读此内容,例如,如果您使用read_csvusing parse_dates=[['Date', 'Time']].
Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime:
假设这些只是字符串,您可以简单地将它们添加在一起(带空格),允许您应用to_datetime:
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raiseargument).
注意:令人惊讶的是(对我而言),这在将 NaN 转换为 NaT 时效果很好,但值得担心的是转换(可能使用raise参数)。
回答by jka.ne
The accepted answer works for columns that are of datatype string. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
接受的答案适用于数据类型的列string。为了完整起见:当列的数据类型为日期和时间时,我在搜索如何执行此操作时遇到了这个问题。
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
回答by jabellcu
I don't have enough reputation to comment on jka.neso:
我没有足够的声誉来评论jka.ne所以:
I had to amend jka.ne's linefor it to work:
我不得不修改jka.ne 的行才能让它工作:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
这可能会帮助其他人。
Also, I have tested a different approach, using replaceinstead of combine:
另外,我测试了一种不同的方法,使用replace代替combine:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP's case would be:
在 OP 的情况下,这将是:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combineis faster (59s for replacevs 50s for combine).
我已经为相对较大的数据集(> 500.000 行)计时了这两种方法,它们都有相似的运行时间,但使用combine速度更快(59 秒replacevs 50 秒combine)。
回答by M.K Rana
You can use this to merge date and time into the same column of dataframe.
您可以使用它来将日期和时间合并到数据框的同一列中。
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
读取具有合并列 Date_Time 的 .csv 文件:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
您也可以使用此行来保留其他两列。
data.set_index(['Date', 'Time'], drop=False)
回答by Chris PERE
You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :
如果类型不同(日期时间和时间戳或 str),您可以转换列并使用 to_datetime :
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
结果 :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
最好的事物,
回答by toto_tico
The answer really depends on what your column types are. In my case, I had datetimeand timedelta.
答案实际上取决于您的列类型是什么。就我而言,我有datetime和timedelta。
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
如果这是您的情况,那么您只需要添加列:
> df['Date'] + df['Time']
回答by jpp
You can also convert to datetimewithout string concatenation, by combining datetimeand timedeltaobjects. Combined with pd.DataFrame.pop, you can remove the source series simultaneously:
您还可以datetime通过组合datetime和timedelta对象转换为不连接字符串。结合pd.DataFrame.pop,您可以同时删除源系列:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
回答by queise
First make sure to have the right data types:
首先确保拥有正确的数据类型:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
然后您可以轻松地将它们组合起来:
df["DateTime"] = df["Date"] + df["Time"]
回答by Stephen
Use the combinefunction:
使用 combine函数:
datetime.datetime.combine(date, time)
回答by tgbrooks
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
我的数据集有几天的 1 秒分辨率数据,这里建议的方法解析非常慢。相反,我使用了:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=Truemakes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
请注意,使用cache=True使得解析日期非常有效,因为我的文件中只有几个唯一的日期,这对于组合的日期和时间列而言并非如此。

