Python Pandas 将带有 unix 时间戳(以毫秒为单位)的行转换为日期时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34883101/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas converting row with unix timestamp (in milliseconds) to datetime
提问by tamasgal
I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.
我需要处理大量 CSV 文件,其中时间戳始终是一个字符串,以毫秒为单位表示 unix 时间戳。我还没有找到有效修改这些列的方法。
This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I'm sure it can be done when creating the DataFrame
?
这是我想出的,但是这当然只复制了列,我必须以某种方式将它放回原始数据集。我确定它可以在创建DataFrame
?
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
data = 'RUN,UNIXTIME,VALUE\n1,1447160702320,10\n2,1447160702364,20\n3,1447160722364,42'
df = pd.read_csv(StringIO(data))
convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)
This will pick the column 'UNIXTIME' and change it from
这将选择列“UNIXTIME”并将其从
0 1447160702320
1 1447160702364
2 1447160722364
Name: UNIXTIME, dtype: int64
into this
进入这个
0 2015-11-10 14:05:02.320
1 2015-11-10 14:05:02.364
2 2015-11-10 14:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]
However, I would like to use something like pd.apply()
to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.
但是,我想使用类似的方法pd.apply()
来获取与转换后的列一起返回的整个数据集,或者正如我已经写的那样,只需在从 CSV 生成数据帧时创建日期时间。
采纳答案by EdChum
You can do this as a post processing step using to_datetime
and passing arg unit='ms'
:
您可以使用to_datetime
并传递 arg作为后处理步骤执行此操作unit='ms'
:
In [5]:
df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
df
Out[5]:
RUN UNIXTIME VALUE
0 1 2015-11-10 13:05:02.320 10
1 2 2015-11-10 13:05:02.364 20
2 3 2015-11-10 13:05:22.364 42
回答by tamasgal
I came up with a solution I guess:
我想出了一个我猜的解决方案:
convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)
df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)
I'm still not sure if this is the best one though.
我仍然不确定这是否是最好的。
回答by Teudimundo
I use the @EdChum solution, but I add the timezone management:
我使用@EdChum 解决方案,但我添加了时区管理:
df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\
.tz_localize('UTC' )\
.tz_convert('America/New_York')
the tz_localize
indicates that timestamp should be considered as regarding 'UTC', then the tz_convert
actually moves the date/time to the correct timezone (in this case `America/New_York').
在tz_localize
表示时间戳应被视为关于“UTC”,那么tz_convert
实际移动的日期/时间为正确的时区(在这种情况下`美国/纽约“)。
Note that it has been converted to a DatetimeIndex
because the tz_
methods works only on the index of the series. Since Pandas 0.15 one can use .dt
:
请注意,它已转换为 a,DatetimeIndex
因为这些tz_
方法仅适用于系列的索引。由于 Pandas 0.15 可以使用.dt
:
df['UNIXTIME']=pd.to_datetime(pd['UNIXTIME'], unit='ms')\
.dt.tz_localize('UTC' )\
.dt.tz_convert('America/New_York')
回答by cs95
if you know the timestamp unit, use Series.astype
:
如果您知道时间戳单位,请使用Series.astype
:
df['UNIXTIME'].astype('datetime64[ms]')
0 2015-11-10 13:05:02.320
1 2015-11-10 13:05:02.364
2 2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]
To return the entire DataFrame, use
要返回整个 DataFrame,请使用
df.astype({'UNIXTIME': 'datetime64[ms]'})
RUN UNIXTIME VALUE
0 1 2015-11-10 13:05:02.320 10
1 2 2015-11-10 13:05:02.364 20
2 3 2015-11-10 13:05:22.364 42