pandas 从熊猫中的对象日期中剥离时间

Question

提问by trench

I am having trouble with some dates from zipped xlsx files. These files are loaded into a sqlite database then exported as .csv. Each file is about 40,000 rows per day. The issue I run into is that pd.to_datetimedoes not seem to work on these objects (dates from Excel format is causing the issue I think - pure .csv files work fine with this command). This is fine actually - I do not need them to be in datetime format.

我在处理压缩的 xlsx 文件中的某些日期时遇到问题。这些文件被加载到一个 sqlite 数据库中，然后导出为 .csv。每个文件每天大约有 40,000 行。我遇到的问题是这pd.to_datetime似乎不适用于这些对象（我认为来自 Excel 格式的日期导致了这个问题 - 纯 .csv 文件可以很好地使用此命令）。这实际上很好 - 我不需要它们采用日期时间格式。

What I am trying to achieve is creating a column called ShortDate which is %m/%d/%Y. How can I do this on a datetime object (format is mm/dd/yyyy hh:mm:ss from Excel). I will then create a new column called RosterID which combines the EmployeeID field and the ShortDate field together into a unique ID.

我想要实现的是创建一个名为 ShortDate 的列，它是%m/%d/%Y. 如何在日期时间对象上执行此操作（格式为 Excel 中的 mm/dd/yyyy hh:mm:ss）。然后，我将创建一个名为 RosterID 的新列，它将 EmployeeID 字段和 ShortDate 字段组合成一个唯一 ID。

I am very new to pandas and I am currently only using it to process .csv files (rename and select certain columns, create unique IDs to use in filters in Tableau, etc).

我对 Pandas 非常陌生，我目前只使用它来处理 .csv 文件（重命名和选择某些列，创建唯一 ID 以在 Tableau 的过滤器中使用等）。

rep = pd.read_csv(r'C:\Users\Desktop\test.csv.gz', dtype = 'str', compression = 'gzip', usecols = ['etc','etc2'])
print('Read successfully.')
rep['Total']=1
rep['UniqueID']= rep['EmployeeID'] + rep['InteractionID']
rep['ShortDate'] = ??? #what do I do here to get what I am looking for?
rep['RosterID']= rep['EmployeeID'] + rep['ShortDate'] # this is my goal
print('Modified successfully.')

Here is some of the raw data from the .csv. Column names would be

这是 .csv 中的一些原始数据。列名将是

InteractionID, Created Date, EmployeeID, Repeat Date
07927,04/01/2014 14:05:10,912a,04/01/2014 14:50:03
02158,04/01/2014 13:44:05,172r,04/04/2014 17:47:29
44279,04/01/2014 17:28:36,217y,04/07/2014 22:06:19

Answer 1

回答by EdChum

You can apply a post-processing step that first converts the string to a datetime and then applies a lambda to keep just the date portion:

您可以应用后处理步骤，首先将字符串转换为日期时间，然后应用 lambda 以仅保留日期部分：

In [29]:

df['Created Date'] = pd.to_datetime(df['Created Date']).apply(lambda x: x.date())
df['Repeat Date'] = pd.to_datetime(df['Repeat Date']).apply(lambda x: x.date())
df


Out[29]:
   InteractionID Created Date EmployeeID Repeat Date
0           7927   2014-04-01       912a  2014-04-01
1           2158   2014-04-01       172r  2014-04-04
2          44279   2014-04-01       217y  2014-04-07

EDIT

编辑

After looking at this again, you can access just the date component using dt.dateif your version of pandas is greater than 0.15.0:

再次查看此内容后，dt.date如果您的 Pandas 版本大于，您可以使用仅访问日期组件0.15.0：

In [18]:
df['just_date'] = df['Repeat Date'].dt.date
df

Out[18]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date  
0  2014-04-01  
1  2014-04-04  
2  2014-04-07

Additionally you can also do dt.strftimenow rather than use applyto achieve the result you want:

此外，您还可以dt.strftime现在做而不是使用apply来实现您想要的结果：

In [28]:
df['short_date'] = df['Repeat Date'].dt.strftime('%m%d%Y')
df

Out[28]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date  
0  2014-04-01   04012014  
1  2014-04-04   04042014  
2  2014-04-07   04072014

So generating the Roster Id's is now a trivial exercise of adding the 2 new columns:

因此，生成名册 ID 现在是添加 2 个新列的微不足道的练习：

In [30]:
df['Roster ID'] = df['EmployeeID'] + df['short_date']
df

Out[30]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date     Roster ID  
0  2014-04-01   04012014  912a04012014  
1  2014-04-04   04042014  172r04042014  
2  2014-04-07   04072014  217y04072014

Answer 2

回答by Jerome Montino

Create a new column, then just apply simple datetimefunctions using lambdaand apply.

创建一个新列，然后datetime使用lambda和应用简单的函数apply。

In [14]: df['Short Date']= pd.to_datetime(df['Created Date'])

In [15]: df
Out[15]: 
   InteractionID    Created Date EmployeeID     Repeat Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06   

           Short Date  
0 2014-04-01 14:05:00  
1 2014-04-01 13:44:00  
2 2014-04-01 17:28:00  

In [16]: df['Short Date'] = df['Short Date'].apply(lambda x:x.date().strftime('%m%d%y'))

In [17]: df
Out[17]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114

Then just concatenate the two columns. Convert the Short Datecolumn to strings to avoid errors on concatenation of strings and integers.

然后只需连接两列。将Short Date列转换为字符串以避免连接字符串和整数时出错。

In [32]: df['Roster ID'] = df['EmployeeID'] + df['Short Date'].map(str)

In [33]: df
Out[33]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114   

    Roster ID  
0  912a040114  
1  172r040114  
2  217y040114

Answer 3

回答by CT Zhu

You can also do it using only the standard libraries (in any format you want '%m/%d/%Y', '%m-%d-%Y' or other orders/formats):

您也可以仅使用标准库（以您想要的任何格式“%m/%d/%Y”、“%m-%d-%Y”或其他订单/格式）：

In [118]:

import time
df['Created Date'] = df['Created Date'].apply(lambda x: time.strftime('%m/%d/%Y', time.strptime(x, '%m/%d/%Y %H:%M:%S')))
In [120]:

print df
   InteractionID Created Date EmployeeID          Repeat Date
0           7927   04/01/2014       912a  04/01/2014 14:50:03
1           2158   04/01/2014       172r  04/04/2014 17:47:29
2          44279   04/01/2014       217y  04/07/2014 22:06:19

pandas 从熊猫中的对象日期中剥离时间

提问by trench

回答by EdChum

回答by Jerome Montino

回答by CT Zhu

相关推荐

最近更新

标签

pandas 从熊猫中的对象日期中剥离时间

提问by trench

回答by EdChum

回答by Jerome Montino

回答by CT Zhu

相关推荐

如何将数据帧堆叠在一起（Pandas、Python3）

Pandas：从 3 列创建时间戳：月、日、小时

忽略 NaN 的 Pandas 聚合

查找 Pandas DataFrame 值的索引

相关推荐

最近更新

标签