Python 更改熊猫中日期时间列的时区并添加为分层索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17159207/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:38:15  来源:igfitidea点击:

Change timezone of date-time column in pandas and add as hierarchical index

pythontimezonedataframepandasmulti-index

提问by Erik Shilts

I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.

我有带有 UTC 时间戳的数据。我想将此时间戳的时区转换为“美国/太平洋”,并将其作为分层索引添加到 Pandas DataFrame。我已经能够将时间戳转换为索引,但是当我尝试将它作为列或索引添加回 DataFrame 时,它​​会丢失时区格式。

>>> import pandas as pd
>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})
>>> dat.dtypes
#datetime    object
#label       object
#value        int64
#dtype: object

Now if I try to convert the Series directly I run into an error.

现在,如果我尝试直接转换系列,则会遇到错误。

>>> times = pd.to_datetime(dat['datetime'])
>>> times.tz_localize('UTC')
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize
#    raise Exception('Cannot tz-localize non-time series')
#Exception: Cannot tz-localize non-time series

If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.

如果我将其转换为索引,那么我可以将其作为时间序列进行操作。请注意,索引现在具有太平洋时区。

>>> times_index = pd.Index(times)
>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')
>>> times_index_pacific
#<class 'pandas.tseries.index.DatetimeIndex'>
#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]
#Length: 6, Freq: None, Timezone: US/Pacific

However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:

但是,现在我遇到了将索引添加回数据帧的问题,因为它丢失了时区格式:

>>> dat_index = dat.set_index([dat['label'], times_index_pacific])
>>> dat_index
#                                      datetime label  value
#label                                                      
#a     2011-07-19 07:00:00  2011-07-19 07:00:00     a      0
#      2011-07-19 08:00:00  2011-07-19 08:00:00     a      1
#      2011-07-19 09:00:00  2011-07-19 09:00:00     a      2
#b     2011-07-19 07:00:00  2011-07-19 07:00:00     b      3
#      2011-07-19 08:00:00  2011-07-19 08:00:00     b      4
#      2011-07-19 09:00:00  2011-07-19 09:00:00     b      5

You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.

您会注意到索引回到了 UTC 时区,而不是转换后的太平洋时区。

How can I change the timezone and add it as an index to a DataFrame?

如何更改时区并将其添加为 DataFrame 的索引?

采纳答案by mweerden

By now this has been fixed. For example, you can now call:

到目前为止,这已被修复。例如,您现在可以调用:

dataframe.tz_localize('UTC', level=0)

You'll have to call it twice for the given example, though. (I.e., once for each level.)

但是,对于给定的示例,您必须调用它两次。(即,每个级别一次。)

回答by Andy Hayden

If you set it as the index, it's automatically converted to an Index:

如果将其设置为索引,它会自动转换为索引:

In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)

In [12]: dat
Out[12]:
                    label  value
datetime
2011-07-19 07:00:00     a      0
2011-07-19 08:00:00     a      1
2011-07-19 09:00:00     a      2
2011-07-19 07:00:00     b      3
2011-07-19 08:00:00     b      4
2011-07-19 09:00:00     b      5

Then do the tz_localize:

然后做tz_localize

In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')

In [13]: dat
Out[13]:
                          label  value
datetime
2011-07-19 00:00:00-07:00     a      0
2011-07-19 01:00:00-07:00     a      1
2011-07-19 02:00:00-07:00     a      2
2011-07-19 00:00:00-07:00     b      3
2011-07-19 01:00:00-07:00     b      4
2011-07-19 02:00:00-07:00     b      5

And then you can append the label column to the index:

然后您可以将标签列附加到索引:

Hmmm this is definitely a bug!

嗯,这绝对是一个错误!

In [14]: dat.set_index('label', append=True).swaplevel(0, 1)
Out[14]:
                           value
label datetime
a     2011-07-19 07:00:00      0
      2011-07-19 08:00:00      1
      2011-07-19 09:00:00      2
b     2011-07-19 07:00:00      3
      2011-07-19 08:00:00      4
      2011-07-19 09:00:00      5

A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):

一个hacky解决方法是直接转换(日期时间)级别(当它已经是一个MultiIndex时):

In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')

In [16]: dat1
Out[16]:
                                 value
label datetime
a     2011-07-19 00:00:00-07:00      0
      2011-07-19 01:00:00-07:00      1
      2011-07-19 02:00:00-07:00      2
b     2011-07-19 00:00:00-07:00      3
      2011-07-19 01:00:00-07:00      4
      2011-07-19 02:00:00-07:00      5

回答by ivrin

The workaround does not seem to work because the index levels of a hierarchical index seem to be immutable (FrozenList is immutable).

该解决方法似乎不起作用,因为分层索引的索引级别似乎是不可变的(FrozenList 是不可变的)。

Starting with a singular index and appending also does not work.

从单个索引开始并附加也不起作用。

Creating a lambda function that casts as Timestamp and converts each member of the Series returned by to_datetime() also does not work.

创建一个转换为 Timestamp 并转换由 to_datetime() 返回的系列的每个成员的 lambda 函数也不起作用。

Is there a way to create timezone aware Series and then insert them into a dataframe/make them an index?

有没有办法创建时区感知系列,然后将它们插入数据帧/使它们成为索引?

joined_event_df = joined_event_df.set_index(['pandasTime'])
joined_event_df.index = joined_event_df.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Central')
# we have tz-awareness above this line
joined_event_df = joined_event_df.set_index('sequence', append = True)
# we lose tz-awareness in the index as soon as we add another index
joined_event_df = joined_event_df.swaplevel(0,1)

回答by Mark Horvath

An other workaround which works in pandas 0.13.1, and solves the FrozenList can not be assigned problem:

在 Pandas 0.13.1 中工作并解决 FrozenList 无法分配问题的另一种解决方法:

index.levels = pandas.core.base.FrozenList([
    index.levels[0].tz_localize('UTC').tz_convert(tz),
    index.levels[1].tz_localize('UTC').tz_convert(tz)
])

Struggling a lot with this issue, MultiIndex loses tz in many other conditions too.

在这个问题上苦苦挣扎,MultiIndex 在许多其他情况下也失去了 tz。