Python Pandas 将带有 unix 时间戳（以毫秒为单位）的行转换为日期时间

Question

提问by tamasgal

I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.

我需要处理大量 CSV 文件，其中时间戳始终是一个字符串，以毫秒为单位表示 unix 时间戳。我还没有找到有效修改这些列的方法。

This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I'm sure it can be done when creating the DataFrame?

这是我想出的，但是这当然只复制了列，我必须以某种方式将它放回原始数据集。我确定它可以在创建DataFrame?

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

data = 'RUN,UNIXTIME,VALUE\n1,1447160702320,10\n2,1447160702364,20\n3,1447160722364,42'

df = pd.read_csv(StringIO(data))

convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)

This will pick the column 'UNIXTIME' and change it from

这将选择列“UNIXTIME”并将其从

0    1447160702320
1    1447160702364
2    1447160722364
Name: UNIXTIME, dtype: int64

into this

进入这个

0   2015-11-10 14:05:02.320
1   2015-11-10 14:05:02.364
2   2015-11-10 14:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

However, I would like to use something like pd.apply()to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.

但是，我想使用类似的方法pd.apply()来获取与转换后的列一起返回的整个数据集，或者正如我已经写的那样，只需在从 CSV 生成数据帧时创建日期时间。

Answer 1

采纳答案by EdChum

You can do this as a post processing step using to_datetimeand passing arg unit='ms':

您可以使用to_datetime并传递 arg作为后处理步骤执行此操作unit='ms'：

In [5]:
df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
df

Out[5]:
   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

Answer 2

回答by tamasgal

I came up with a solution I guess:

我想出了一个我猜的解决方案：

convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)

df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)

I'm still not sure if this is the best one though.

我仍然不确定这是否是最好的。

Answer 3

回答by Teudimundo

I use the @EdChum solution, but I add the timezone management:

我使用@EdChum 解决方案，但我添加了时区管理：

df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\
                 .tz_localize('UTC' )\
                 .tz_convert('America/New_York')

the tz_localizeindicates that timestamp should be considered as regarding 'UTC', then the tz_convertactually moves the date/time to the correct timezone (in this case `America/New_York').

在tz_localize表示时间戳应被视为关于“UTC”，那么tz_convert实际移动的日期/时间为正确的时区（在这种情况下`美国/纽约“）。

Note that it has been converted to a DatetimeIndexbecause the tz_methods works only on the index of the series. Since Pandas 0.15 one can use .dt:

请注意，它已转换为 a，DatetimeIndex因为这些tz_方法仅适用于系列的索引。由于 Pandas 0.15 可以使用.dt：

df['UNIXTIME']=pd.to_datetime(pd['UNIXTIME'], unit='ms')\
                 .dt.tz_localize('UTC' )\
                 .dt.tz_convert('America/New_York')

Answer 4

回答by cs95

if you know the timestamp unit, use Series.astype:

如果您知道时间戳单位，请使用Series.astype：

df['UNIXTIME'].astype('datetime64[ms]')

0   2015-11-10 13:05:02.320
1   2015-11-10 13:05:02.364
2   2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

To return the entire DataFrame, use

要返回整个 DataFrame，请使用

df.astype({'UNIXTIME': 'datetime64[ms]'})

   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

Python Pandas 将带有 unix 时间戳（以毫秒为单位）的行转换为日期时间

提问by tamasgal

采纳答案by EdChum

回答by tamasgal

回答by Teudimundo

回答by cs95

相关推荐

最近更新

标签

Python Pandas 将带有 unix 时间戳（以毫秒为单位）的行转换为日期时间

提问by tamasgal

采纳答案by EdChum

回答by tamasgal

回答by Teudimundo

回答by cs95

相关推荐

Python 如何在 Pandas 中用一个值填充一列？

如何在 ipython 中将 Spark RDD 转换为 Pandas 数据帧？

Python 无需代码即可将笔记本导出为 pdf

为什么 python setup.py 在 Travis CI 上说无效命令“bdist_wheel”？

相关推荐

最近更新

标签