Python 将 Pandas 时区感知 DateTimeIndex 转换为朴素时间戳,但在某些时区
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/16628819/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone
提问by joris
You can use the function tz_localizeto make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while preserving its timezone?
您可以使用该函数tz_localize使 Timestamp 或 DateTimeIndex 时区感知,但您如何做相反的事情:如何将时区感知 Timestamp 转换为简单的 Timestamp,同时保留其时区?
An example:
一个例子:
In [82]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10, freq='s', tz="Europe/Brussels")
In [83]: t
Out[83]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
I could remove the timezone by setting it to None, but then the result is converted to UTC (12 o'clock became 10):
我可以通过将时区设置为 None 来删除时区,但随后结果将转换为 UTC(12 点钟变为 10 点):
In [86]: t.tz = None
In [87]: t
Out[87]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 10:00:00, ..., 2013-05-18 10:00:09]
Length: 10, Freq: S, Timezone: None
Is there another way I can convert a DateTimeIndex to timezone naive, but while preserving the timezone it was set in?
有没有另一种方法可以将 DateTimeIndex 转换为时区天真,但同时保留它设置的时区?
Some contexton the reason I am asking this: I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on).
But for some reason, I have to deal with a timezone-aware timeseries in my local timezone (Europe/Brussels). As all my other data are timezone naive (but represented in my local timezone), I want to convert this timeseries to naive to further work with it, but it also has to be represented in my local timezone (so just remove the timezone info, without converting the user-visibletime to UTC).  
关于我提出这个问题的原因的一些背景:我想使用时区天真的时间序列(以避免时区带来的额外麻烦,我在处理的情况下不需要它们)。
但出于某种原因,我必须在本地时区(欧洲/布鲁塞尔)中处理时区感知时间序列。由于我的所有其他数据都是时区原始数据(但在我的本地时区中表示),我想将此时间序列转换为原始数据以进一步使用它,但它也必须在我的本地时区中表示(因此只需删除时区信息,无需将用户可见时间转换为 UTC)。  
I know the time is actually internal stored as UTC and only converted to another timezone when you represent it, so there has to be some kind of conversion when I want to "delocalize" it. For example, with the python datetime module you can "remove" the timezone like this:
我知道时间实际上是在内部存储为 UTC 的,并且只有在您表示它时才转换为另一个时区,因此当我想“非本地化”它时必须进行某种转换。例如,使用 python datetime 模块,您可以像这样“删除”时区:
In [119]: d = pd.Timestamp("2013-05-18 12:00:00", tz="Europe/Brussels")
In [120]: d
Out[120]: <Timestamp: 2013-05-18 12:00:00+0200 CEST, tz=Europe/Brussels>
In [121]: d.replace(tzinfo=None)
Out[121]: <Timestamp: 2013-05-18 12:00:00> 
So, based on this, I could do the following, but I suppose this will not be very efficient when working with a larger timeseries:
因此,基于此,我可以执行以下操作,但我认为在处理较大的时间序列时这不会很有效:
In [124]: t
Out[124]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
In [125]: pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
Out[125]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: None, Timezone: None
采纳答案by joris
To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None)to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements
为了回答我自己的问题,此功能已同时添加到 Pandas 中。从 pandas 0.15.0开始,您可以使用tz_localize(None)删除导致本地时间的时区。
请参阅 whatsnew 条目:http: //pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements
So with my example from above:
所以用我上面的例子:
In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',
                          tz= "Europe/Brussels")
In [5]: t
Out[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],
                       dtype='datetime64[ns, Europe/Brussels]', freq='H')
using tz_localize(None)removes the timezone information resulting in naive local time:
usingtz_localize(None)删除导致本地时间的时区信息:
In [6]: t.tz_localize(None)
Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], 
                      dtype='datetime64[ns]', freq='H')
Further, you can also use tz_convert(None)to remove the timezone information but converting to UTC, so yielding naive UTC time:
此外,您还可以使用tz_convert(None)删除时区信息但转换为 UTC,从而产生朴素的 UTC 时间:
In [7]: t.tz_convert(None)
Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'], 
                      dtype='datetime64[ns]', freq='H')
This is much more performantthan the datetime.replacesolution:
这比解决方案的性能要高得多datetime.replace:
In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',
                           tz="Europe/Brussels")
In [32]: %timeit t.tz_localize(None)
1000 loops, best of 3: 233 μs per loop
In [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
10 loops, best of 3: 99.7 ms per loop
回答by D. A.
I think you can't achieve what you want in a more efficient manner than you proposed.
我认为你无法以比你提议的更有效的方式实现你想要的。
The underlying problem is that the timestamps (as you seem aware) are made up of two parts. The data that represents the UTC time, and the timezone, tz_info. The timezone information is used only for display purposes when printing the timezone to the screen. At display time, the data is offset appropriately and +01:00 (or similar) is added to the string. Stripping off the tz_info value (using tz_convert(tz=None)) doesn't doesn't actually change the data that represents the naive part of the timestamp.
潜在的问题是时间戳(如您所知)由两部分组成。表示 UTC 时间和时区的数据 tz_info。时区信息仅用于将时区打印到屏幕时的显示目的。在显示时,数据会适当偏移,并将 +01:00(或类似的)添加到字符串中。剥离 tz_info 值(使用 tz_convert(tz=None))实际上并没有改变代表时间戳朴素部分的数据。
So, the only way to do what you want is to modify the underlying data (pandas doesn't allow this... DatetimeIndex are immutable -- see the help on DatetimeIndex), or to create a new set of timestamp objects and wrap them in a new DatetimeIndex. Your solution does the latter:
所以,做你想做的唯一方法是修改底层数据(熊猫不允许这样做...... DatetimeIndex 是不可变的——请参阅 DatetimeIndex 上的帮助),或者创建一组新的时间戳对象并将它们包装起来在新的 DatetimeIndex 中。您的解决方案是后者:
pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
For reference, here is the replacemethod of Timestamp(see tslib.pyx):
作为参考,这里是(见tslib.pyx)的replace方法Timestamp:
def replace(self, **kwds):
    return Timestamp(datetime.replace(self, **kwds),
                     offset=self.offset)
You can refer to the docs on datetime.datetimeto see that datetime.datetime.replacealso creates a new object.   
您可以参考 docs ondatetime.datetime查看这datetime.datetime.replace也创建了一个新对象。   
If you can, your best bet for efficiency is to modify the source of the data so that it (incorrectly) reports the timestamps without their timezone. You mentioned:
如果可以的话,提高效率的最佳选择是修改数据源,以便它(错误地)报告没有时区的时间戳。你提到:
I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on)
我想使用时区天真的时间序列(以避免时区带来的额外麻烦,而且我在处理的情况下不需要它们)
I'd be curious what extra hassle you are referring to. I recommend as a general rule for all software development, keep your timestamp 'naive values' in UTC. There is little worse than looking at two different int64 values wondering which timezone they belong to. If you always, always, always use UTC for the internal storage, then you will avoid countless headaches. My mantra is Timezones are for human I/O only.
我很好奇你指的是什么额外的麻烦。我建议作为所有软件开发的一般规则,将时间戳保留为 UTC 中的“天真值”。没有什么比查看两个不同的 int64 值想知道它们属于哪个时区更糟糕的了。如果您始终,始终,始终使用 UTC 进行内部存储,那么您将避免无数头痛。我的口头禅是Timezones are for human I/O only。
回答by Hyman Kelly
Building on D.A.'s suggestion that "the only way to do what you want is to modify the underlying data" and using numpy to modify the underlying data...
建立在 DA 的建议之上,即“做你想做的唯一方法是修改基础数据”并使用 numpy 修改基础数据......
This works for me, and is pretty fast:
这对我有用,而且速度非常快:
def tz_to_naive(datetime_index):
    """Converts a tz-aware DatetimeIndex into a tz-naive DatetimeIndex,
    effectively baking the timezone into the internal representation.
    Parameters
    ----------
    datetime_index : pandas.DatetimeIndex, tz-aware
    Returns
    -------
    pandas.DatetimeIndex, tz-naive
    """
    # Calculate timezone offset relative to UTC
    timestamp = datetime_index[0]
    tz_offset = (timestamp.replace(tzinfo=None) - 
                 timestamp.tz_convert('UTC').replace(tzinfo=None))
    tz_offset_td64 = np.timedelta64(tz_offset)
    # Now convert to naive DatetimeIndex
    return pd.DatetimeIndex(datetime_index.values + tz_offset_td64)
回答by filmor
Setting the tzattribute of the index explicitly seems to work:
tz显式设置索引的属性似乎有效:
ts_utc = ts.tz_convert("UTC")
ts_utc.index.tz = None
回答by Yuchao Jiang
The most important thing is add tzinfowhen you define a datetime object.
最重要的是tzinfo在定义日期时间对象时添加。
from datetime import datetime, timezone
from tzinfo_examples import HOUR, Eastern
u0 = datetime(2016, 3, 13, 5, tzinfo=timezone.utc)
for i in range(4):
     u = u0 + i*HOUR
     t = u.astimezone(Eastern)
     print(u.time(), 'UTC =', t.time(), t.tzname())
回答by Juan A. Navarro
Because I always struggle to remember, a quick summary of what each of these do:
因为我总是很难记住,所以快速总结一下每个人的作用:
>>> pd.Timestamp.now()  # naive local time
Timestamp('2019-10-07 10:30:19.428748')
>>> pd.Timestamp.utcnow()  # tz aware UTC
Timestamp('2019-10-07 08:30:19.428748+0000', tz='UTC')
>>> pd.Timestamp.now(tz='Europe/Brussels')  # tz aware local time
Timestamp('2019-10-07 10:30:19.428748+0200', tz='Europe/Brussels')
>>> pd.Timestamp.now(tz='Europe/Brussels').tz_localize(None)  # naive local time
Timestamp('2019-10-07 10:30:19.428748')
>>> pd.Timestamp.now(tz='Europe/Brussels').tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
>>> pd.Timestamp.utcnow().tz_localize(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
>>> pd.Timestamp.utcnow().tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')
回答by tozCSS
The accepted solution does not work when there are multiple different timezones in a Series. It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
当系列中有多个不同的时区时,已接受的解决方案不起作用。它抛出ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
The solution is to use the applymethod. 
解决方法是使用apply方法。
Please see the examples below:
请参阅以下示例:
# Let's have a series `a` with different multiple timezones. 
> a
0    2019-10-04 16:30:00+02:00
1    2019-10-07 16:00:00-04:00
2    2019-09-24 08:30:00-07:00
Name: localized, dtype: object
> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')
# trying the accepted solution
> a.dt.tz_localize(None)
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
# Make it tz-naive. This is the solution:
> a.apply(lambda x:x.tz_localize(None))
0   2019-10-04 16:30:00
1   2019-10-07 16:00:00
2   2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]
# a.tz_convert() also does not work with multiple timezones, but this works:
> a.apply(lambda x:x.tz_convert('America/Los_Angeles'))
0   2019-10-04 07:30:00-07:00
1   2019-10-07 13:00:00-07:00
2   2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]

