pandas 在熊猫数据框中将字符串日期转换为不同的格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38060172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert string date to a different format in pandas dataframe
提问by racekiller
I have been looking for this answer in the community so far, could not have.
到目前为止,我一直在社区中寻找这个答案,找不到。
I have a dataframe in python 3.5.1 that contains a column with dates in string imported from a CSV file.
我在 python 3.5.1 中有一个数据框,其中包含一个从 CSV 文件导入的字符串中的日期列。
The dataframe looks like this
数据框看起来像这样
TimeStamp TBD TBD Value TBD
0 2016/06/08 17:19:53 NaN NaN 0.062942 NaN
1 2016/06/08 17:19:54 NaN NaN 0.062942 NaN
2 2016/06/08 17:19:54 NaN NaN 0.062942 NaN
what I need is to change the TimeStamp column format to be %m/%d/%y %H:%M:%D
我需要的是将时间戳列格式更改为 %m/%d/%y %H:%M:%D
TimeStamp TBD TBD Value TBD
0 06/08/2016 17:19:53 NaN NaN 0.062942 NaN
So far I have found some solutions that works but for string and not for series
到目前为止,我已经找到了一些适用于字符串而不适用于系列的解决方案
Any help would be appreciated
任何帮助,将不胜感激
Thanks
谢谢
回答by unutbu
If you convert the column of strings to a time series, you could use the dt.strftime
method:
如果将字符串列转换为时间序列,则可以使用以下dt.strftime
方法:
import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')
print(df)
yields
产量
TBD TBD.1 TBD.2 TimeStamp Value
0 NaN NaN NaN 06/08/2016 17:19:53 0.062942
1 NaN NaN NaN 06/08/2016 17:19:54 0.062942
2 NaN NaN NaN 06/08/2016 17:19:54 0.062942
Since you want to convert a column of strings to another (different) column of strings, you could also use the vectorized str.replace
method:
由于要将一列字符串转换为另一列(不同的)字符串,您还可以使用矢量化str.replace
方法:
import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'//')
print(df)
since
自从
In [32]: df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'//')
Out[32]:
0 06/08/2016 17:19:53
1 06/08/2016 17:19:54
2 06/08/2016 17:19:54
Name: TimeStamp, dtype: object
This uses regex to rearrange pieces of the string without first parsing the string as a date. This is faster than the first method (mainly because it skips the parsing step), but it also has the disadvantage of not checking that the date strings are valid dates.
这使用 regex 重新排列字符串的各个部分,而无需先将字符串解析为 date。这比第一种方法快(主要是因为它跳过了解析步骤),但它也有不检查日期字符串是否为有效日期的缺点。
回答by Sarah
For most common date and datetime formats, pandas .to_datetime
function can parse them without we providing format.
For example:
对于大多数常见的日期和日期时间格式,pandas.to_datetime
函数可以在不提供格式的情况下解析它们。例如:
df.TimeStamp.apply(lambda x: pd.to_datetime(x))
df.TimeStamp.apply(lambda x: pd.to_datetime(x))
And in the example given from the question,
在问题给出的例子中,
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')
will give us the same result.
会给我们同样的结果。
Using .apply
will be efficient if you have multiple columns.
.apply
如果您有多个列,使用将是有效的。
Of course, providing the parsing format is necessary for many situations. For a full list of formats, please see https://docs.python.org/3/library/datetime.html.
当然,在很多情况下,提供解析格式是必要的。有关格式的完整列表,请参阅https://docs.python.org/3/library/datetime.html。