pandas 从熊猫中的对象日期中剥离时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26387986/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Strip time from an object date in pandas
提问by trench
I am having trouble with some dates from zipped xlsx files. These files are loaded into a sqlite database then exported as .csv. Each file is about 40,000 rows per day. The issue I run into is that pd.to_datetimedoes not seem to work on these objects (dates from Excel format is causing the issue I think - pure .csv files work fine with this command). This is fine actually - I do not need them to be in datetime format.
我在处理压缩的 xlsx 文件中的某些日期时遇到问题。这些文件被加载到一个 sqlite 数据库中,然后导出为 .csv。每个文件每天大约有 40,000 行。我遇到的问题是这pd.to_datetime似乎不适用于这些对象(我认为来自 Excel 格式的日期导致了这个问题 - 纯 .csv 文件可以很好地使用此命令)。这实际上很好 - 我不需要它们采用日期时间格式。
What I am trying to achieve is creating a column called ShortDate which is %m/%d/%Y. How can I do this on a datetime object (format is mm/dd/yyyy hh:mm:ss from Excel). I will then create a new column called RosterID which combines the EmployeeID field and the ShortDate field together into a unique ID.
我想要实现的是创建一个名为 ShortDate 的列,它是%m/%d/%Y. 如何在日期时间对象上执行此操作(格式为 Excel 中的 mm/dd/yyyy hh:mm:ss)。然后,我将创建一个名为 RosterID 的新列,它将 EmployeeID 字段和 ShortDate 字段组合成一个唯一 ID。
I am very new to pandas and I am currently only using it to process .csv files (rename and select certain columns, create unique IDs to use in filters in Tableau, etc).
我对 Pandas 非常陌生,我目前只使用它来处理 .csv 文件(重命名和选择某些列,创建唯一 ID 以在 Tableau 的过滤器中使用等)。
rep = pd.read_csv(r'C:\Users\Desktop\test.csv.gz', dtype = 'str', compression = 'gzip', usecols = ['etc','etc2'])
print('Read successfully.')
rep['Total']=1
rep['UniqueID']= rep['EmployeeID'] + rep['InteractionID']
rep['ShortDate'] = ??? #what do I do here to get what I am looking for?
rep['RosterID']= rep['EmployeeID'] + rep['ShortDate'] # this is my goal
print('Modified successfully.')
Here is some of the raw data from the .csv. Column names would be
这是 .csv 中的一些原始数据。列名将是
InteractionID, Created Date, EmployeeID, Repeat Date
07927,04/01/2014 14:05:10,912a,04/01/2014 14:50:03
02158,04/01/2014 13:44:05,172r,04/04/2014 17:47:29
44279,04/01/2014 17:28:36,217y,04/07/2014 22:06:19
回答by EdChum
You can apply a post-processing step that first converts the string to a datetime and then applies a lambda to keep just the date portion:
您可以应用后处理步骤,首先将字符串转换为日期时间,然后应用 lambda 以仅保留日期部分:
In [29]:
df['Created Date'] = pd.to_datetime(df['Created Date']).apply(lambda x: x.date())
df['Repeat Date'] = pd.to_datetime(df['Repeat Date']).apply(lambda x: x.date())
df
Out[29]:
InteractionID Created Date EmployeeID Repeat Date
0 7927 2014-04-01 912a 2014-04-01
1 2158 2014-04-01 172r 2014-04-04
2 44279 2014-04-01 217y 2014-04-07
EDIT
编辑
After looking at this again, you can access just the date component using dt.dateif your version of pandas is greater than 0.15.0:
再次查看此内容后,dt.date如果您的 Pandas 版本大于,您可以使用仅访问日期组件0.15.0:
In [18]:
df['just_date'] = df['Repeat Date'].dt.date
df
Out[18]:
InteractionID Created Date EmployeeID Repeat Date \
0 7927 2014-04-01 14:05:10 912a 2014-04-01 14:50:03
1 2158 2014-04-01 13:44:05 172r 2014-04-04 17:47:29
2 44279 2014-04-01 17:28:36 217y 2014-04-07 22:06:19
just_date
0 2014-04-01
1 2014-04-04
2 2014-04-07
Additionally you can also do dt.strftimenow rather than use applyto achieve the result you want:
此外,您还可以dt.strftime现在做而不是使用apply来实现您想要的结果:
In [28]:
df['short_date'] = df['Repeat Date'].dt.strftime('%m%d%Y')
df
Out[28]:
InteractionID Created Date EmployeeID Repeat Date \
0 7927 2014-04-01 14:05:10 912a 2014-04-01 14:50:03
1 2158 2014-04-01 13:44:05 172r 2014-04-04 17:47:29
2 44279 2014-04-01 17:28:36 217y 2014-04-07 22:06:19
just_date short_date
0 2014-04-01 04012014
1 2014-04-04 04042014
2 2014-04-07 04072014
So generating the Roster Id's is now a trivial exercise of adding the 2 new columns:
因此,生成名册 ID 现在是添加 2 个新列的微不足道的练习:
In [30]:
df['Roster ID'] = df['EmployeeID'] + df['short_date']
df
Out[30]:
InteractionID Created Date EmployeeID Repeat Date \
0 7927 2014-04-01 14:05:10 912a 2014-04-01 14:50:03
1 2158 2014-04-01 13:44:05 172r 2014-04-04 17:47:29
2 44279 2014-04-01 17:28:36 217y 2014-04-07 22:06:19
just_date short_date Roster ID
0 2014-04-01 04012014 912a04012014
1 2014-04-04 04042014 172r04042014
2 2014-04-07 04072014 217y04072014
回答by Jerome Montino
Create a new column, then just apply simple datetimefunctions using lambdaand apply.
创建一个新列,然后datetime使用lambda和应用简单的函数apply。
In [14]: df['Short Date']= pd.to_datetime(df['Created Date'])
In [15]: df
Out[15]:
InteractionID Created Date EmployeeID Repeat Date \
0 7927 4/1/2014 14:05 912a 4/1/2014 14:50
1 2158 4/1/2014 13:44 172r 4/4/2014 17:47
2 44279 4/1/2014 17:28 217y 4/7/2014 22:06
Short Date
0 2014-04-01 14:05:00
1 2014-04-01 13:44:00
2 2014-04-01 17:28:00
In [16]: df['Short Date'] = df['Short Date'].apply(lambda x:x.date().strftime('%m%d%y'))
In [17]: df
Out[17]:
InteractionID Created Date EmployeeID Repeat Date Short Date
0 7927 4/1/2014 14:05 912a 4/1/2014 14:50 040114
1 2158 4/1/2014 13:44 172r 4/4/2014 17:47 040114
2 44279 4/1/2014 17:28 217y 4/7/2014 22:06 040114
Then just concatenate the two columns. Convert the Short Datecolumn to strings to avoid errors on concatenation of strings and integers.
然后只需连接两列。将Short Date列转换为字符串以避免连接字符串和整数时出错。
In [32]: df['Roster ID'] = df['EmployeeID'] + df['Short Date'].map(str)
In [33]: df
Out[33]:
InteractionID Created Date EmployeeID Repeat Date Short Date \
0 7927 4/1/2014 14:05 912a 4/1/2014 14:50 040114
1 2158 4/1/2014 13:44 172r 4/4/2014 17:47 040114
2 44279 4/1/2014 17:28 217y 4/7/2014 22:06 040114
Roster ID
0 912a040114
1 172r040114
2 217y040114
回答by CT Zhu
You can also do it using only the standard libraries (in any format you want '%m/%d/%Y', '%m-%d-%Y' or other orders/formats):
您也可以仅使用标准库(以您想要的任何格式“%m/%d/%Y”、“%m-%d-%Y”或其他订单/格式):
In [118]:
import time
df['Created Date'] = df['Created Date'].apply(lambda x: time.strftime('%m/%d/%Y', time.strptime(x, '%m/%d/%Y %H:%M:%S')))
In [120]:
print df
InteractionID Created Date EmployeeID Repeat Date
0 7927 04/01/2014 912a 04/01/2014 14:50:03
1 2158 04/01/2014 172r 04/04/2014 17:47:29
2 44279 04/01/2014 217y 04/07/2014 22:06:19

