使用 Pandas TimeSeries 选择某个时间戳后的第一个索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13040312/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting the first index after a certain timestamp with a pandas TimeSeries
提问by Arthur B.
This is a two-part question, with an immediate question and a more general one.
这是一个由两部分组成的问题,一个是直接问题,另一个是更一般的问题。
I have a pandas TimeSeries, ts. To know the first value after a certain time. I can do this,
我有一个Pandas时间序列,ts。知道一定时间后的第一个值。我可以做这个,
ts.ix[ts[datetime(2012,1,1,15,0,0):].first_valid_index()]
a) Is there a better, less clunky way to do it?
a) 有没有更好、更不笨拙的方法来做到这一点?
b) Coming from C, I have a certain phobia when dealing with these somewhat opaque, possibly mutable but generally not, possibly lazy but not always types. So to be clear, when I do
b) 来自 C,在处理这些有点不透明、可能可变但通常不会、可能是懒惰但并非总是类型的时,我有一定的恐惧症。所以要清楚,当我这样做时
ts[datetime(2012,1,1,15,0,0):].first_valid_index()
ts[datetime(2012,1,1,15,0,0):] is a pandas.TimeSeries object right? And I could possibly mutate it.
ts[datetime(2012,1,1,15,0,0):] 是一个 pandas.TimeSeries 对象,对吗?我可能会变异它。
Does it mean that whenever I take a slice, there's a copy of ts being allocated in memory? Does it mean that this innocuous line of code could actually trigger the copy of a gigabyte of TimeSeries just to get an index value?
这是否意味着每当我取一个切片时,都会在内存中分配一个 ts 的副本?这是否意味着这行无害的代码实际上可以触发一个千兆字节的 TimeSeries 副本来获取索引值?
Or perhaps they magically share memory and a lazy copy is done if one of the object is mutated for instance? But then, how do you know which specific operations trigger a copy? Maybe not slicing but how about renaming columns? It doesn't seem to say so in the documentation. Does that bother you? Should it bother me or should I just learn not to worry and catch problems with a profiler?
或者,例如,如果其中一个对象发生变异,它们可能会神奇地共享内存并完成延迟复制?但是,您如何知道哪些特定操作会触发副本?也许不是切片,但重命名列怎么样?文档中似乎没有这么说。那会麻烦你吗?它应该打扰我还是我应该学会不担心并使用分析器发现问题?
回答by Aman
Some setup:
一些设置:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: from datetime import datetime
In [4]: dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
In [5]: ts = pd.Series(np.random.randn(6), index=dates)
In [6]: ts
Out[6]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 -0.337281
2011-01-10 0.522765
2011-01-12 1.559876
Okay, now to answer your first question, a) yes, there are less clunky ways, depending on your intention. This is pretty simple:
好的,现在回答您的第一个问题,a) 是的,根据您的意图,有不那么笨拙的方法。这很简单:
In [9]: ts[datetime(2011, 1, 8):]
Out[9]:
2011-01-08 -0.337281
2011-01-10 0.522765
2011-01-12 1.559876
This is a slicecontaining all the values after your chosen date. You can select just the first one, as you wanted, by:
这是一个包含所选日期之后的所有值的切片。您可以根据需要通过以下方式仅选择第一个:
In [10]: ts[datetime(2011, 1, 8):][0]
Out[10]: -0.33728079849770815
To your second question, (b) -- this type of indexing is a slice of the original, just as other numpy arrays. It is NOT a copy of the original. See this question, or many similar: Bug or feature: cloning a numpy array w/ slicing
对于你的第二个问题,(b)——这种类型的索引是原始的一部分,就像其他 numpy 数组一样。它不是原件的副本。请参阅此问题或许多类似问题: Bug or feature: cloning a numpy array w/ slicing
To demonstrate, let's modify the slice:
为了演示,让我们修改切片:
In [21]: ts2 = ts[datetime(2011, 1, 8):]
In [23]: ts2[0] = 99
This changes the original timeseries object ts, since ts2 is a slice and not a copy.
这会更改原始时间序列对象 ts,因为 ts2 是一个切片而不是副本。
In [24]: ts
Out[24]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
If you DO want a copy, you can (in general) use the copy method or, (in this case) use truncate:
如果你确实想要一个副本,你可以(通常)使用 copy 方法,或者(在这种情况下)使用 truncate:
In [25]: ts3 = ts.truncate(before='2011-01-08')
In [26]: ts3
Out[26]:
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
Changing this copy will not change the original.
更改此副本不会更改原件。
In [27]: ts3[1] = 99
In [28]: ts3
Out[28]:
2011-01-08 99.000000
2011-01-10 99.000000
2011-01-12 1.559876
In [29]: ts #The january 10th value will be unchanged.
Out[29]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
This example is straight out of "Python for Data Analysis" by Wes. Check it out. It's great.
这个例子直接来自 Wes 的“Python for Data Analysis”。一探究竟。这很棒。
回答by Marian
Me not knowing pandas, a general answer:
我不知道Pandas,一般的答案:
You can overload anything in python, and they must have done that there. If you define a special method __getitem__on your class, it is called when you use obj[key]or obj[start:stop](With just key as argument in the former case, with a special sliceobjectin the latter). You can then return anything you want.
你可以在 python 中重载任何东西,他们肯定已经在那里做了。如果你__getitem__在你的类上定义了一个特殊的方法,它会在你使用obj[key]or时被调用obj[start:stop](在前一种情况下只有键作为参数,在后一种情况下有一个特殊的slice对象)。然后你可以返回任何你想要的东西。
Here's an example to show how __getitem__works:
下面是一个展示如何__getitem__工作的示例:
class Foo(object):
def __getitem__(self, k):
if isinstance(k, slice):
return k.start + k.stop # properties of the slice object
else:
return k
This gives you:
这给你:
>>> f = range.Foo()
>>> f[42]
42
>>> f[23:42]
65
I assume that in your example, the __getitem__method returns some special object, which contains the datetime objects plus a reference to the original tsobject. That special object can then use that information to fetch the desired information later on, when the first_valid_indexmethod or a similar one is called. (It does not even have to modify the original object, like your question suggested.)
我假设在您的示例中,该__getitem__方法返回一些特殊对象,其中包含日期时间对象以及对原始ts对象的引用。然后,当first_valid_index调用该方法或类似方法时,该特殊对象可以使用该信息来获取所需的信息。(它甚至不必修改原始对象,就像您的问题所建议的那样。)
TL;DR:Learn not to worry :-)
TL;DR:学会不要担心:-)
Addition:I got curious, so I implemented a minimal example of the behavior you described above myself:
另外:我很好奇,所以我自己实现了一个你上面描述的行为的最小例子:
class FilterableList(list):
def __init__(self, *args):
list.__init__(self, *args)
self.filter = FilterProxy(self)
class FilterProxy(object):
def __init__(self, parent):
self.parent = parent
def __getitem__(self, sl):
if isinstance(sl, slice):
return Filter(self.parent, sl)
class Filter(object):
def __init__(self, parent, sl):
self.parent = parent
self.sl = sl
def eval(self):
return [e for e in self.parent if self.sl.start <= e <= self.sl.stop]
>>> l = FilterableList([4,5,6,7])
>>> f = l.filter[6:10]
>>> f.eval()
[6, 7]
>>> l.append(8)
>>> f.eval()
[6, 7, 8]

