在 Pandas 中找到最接近给定时间的 DataFrame 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15115547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find closest row of DataFrame to given time in Pandas
提问by robintw
I have a Pandas dataframe which is indexed by a DatetimeIndex:
我有一个由 DatetimeIndex 索引的 Pandas 数据框:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values
Julian_Day 53732 non-null values
AOT_870 53732 non-null values
440-870Angstrom 53732 non-null values
440-675Angstrom 53732 non-null values
500-870Angstrom 53732 non-null values
Last_Processing_Date(dd/mm/yyyy) 53732 non-null values
Solar_Zenith_Angle 53732 non-null values
time 53732 non-null values
dtypes: datetime64[ns](2), float64(6), object(1)
I want to find the row that is closest to a certain time:
我想找到最接近某个时间的行:
image_time = dateutil.parser.parse('2009-07-28 13:39:02')
and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.
并找出它有多接近。到目前为止,我已经尝试了基于从所有时间中减去我想要的时间并找到最小绝对值的想法的各种方法,但似乎都没有奏效。
For example:
例如:
aeronet.index - image_time
Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:
给出了一个错误,我认为这是由于日期时间索引上的 +/- 改变了事物,所以我尝试将索引放入另一列,然后进行处理:
aeronet['time'] = aeronet.index
aeronet.time - image_time
This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running absor np.abson it gives an error:
这似乎有效,但要做我想做的事,我需要获得绝对时差,而不是相对时差。但是,只是运行abs或np.abs在它上面会出现错误:
abs(aeronet.time - image_time)
C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self)
1061 Yields Bytestring in Py2, Unicode String in py3.
1062 """
-> 1063 return str(self)
1064
1065 def _tidy_repr(self, max_vals=20):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self)
1021 if py3compat.PY3:
1022 return self.__unicode__()
-> 1023 return self.__bytes__()
1024
1025 def __bytes__(self):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self)
1031 """
1032 encoding = com.get_option("display.encoding")
-> 1033 return self.__unicode__().encode(encoding, 'replace')
1034
1035 def __unicode__(self):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
1044 else get_option("display.max_rows"))
1045 if len(self.index) > (max_rows or 1000):
-> 1046 result = self._tidy_repr(min(30, max_rows - 4))
1047 elif len(self.index) > 0:
1048 result = self._get_repr(print_header=True,
C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals)
1069 """
1070 num = max_vals // 2
-> 1071 head = self[:num]._get_repr(print_header=True, length=False,
1072 name=False)
1073 tail = self[-(max_vals - num):]._get_repr(print_header=False,
AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'
Am I approaching this the right way? If so, how should I get absto work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?
我是否以正确的方式接近这个?如果是这样,我应该如何abs开始工作,以便我可以选择最小绝对时间差,从而获得最接近的时间。如果没有,用 Pandas 时间序列做到这一点的最佳方法是什么?
采纳答案by Chang She
I think you can try DatetimeIndex.asofto find the most recent label up to and including the input. Then use the returned datetime to select the appropriate row.
If you only need values for a particular column, Series.asofexists and combines the two steps above into one.
我认为您可以尝试DatetimeIndex.asof查找包含输入的最新标签。然后使用返回的日期时间选择适当的行。如果您只需要特定列的值,则Series.asof存在并将上述两个步骤合二为一。
This assumes you want the closest datetime. If you don't care about the date and just want the same time every day, use at_timein DataFrame.
这假设您想要最接近的日期时间。如果您不关心日期而只想每天都在同一时间,请at_time在 DataFrame 中使用。
Follow up:
跟进:
Edit: false alarm, I had an older version locally. The latest on master should work with np.abs.
编辑:误报,我在本地有一个旧版本。master 上的最新版本应该与 np.abs 一起使用。
In [10]: np.abs(df.time - image_time)
Out[10]:
0 27 days, 13:39:02
1 26 days, 13:39:02
2 25 days, 13:39:02
3 24 days, 13:39:02
4 23 days, 13:39:02
5 22 days, 13:39:02
Also just to clarify:
也只是为了澄清:
aeronet.index - image_time doesn't work because subtraction on Index is a set difference (back in the day Index used to be constrained to be unique).
aeronet.index - image_time 不起作用,因为对 Index 的减法是一组差异(回到过去,Index 曾经被限制为唯一的)。
回答by cmeeren
This simple method will return the (integer index of the) TimeSeriesIndex entry closest to a given datetime object. There's no need to copy the index to a regular column - simply use the .to_pydatetimemethod instead.
这个简单的方法将返回最接近给定日期时间对象的(整数索引)TimeSeriesIndex 条目。无需将索引复制到常规列 - 只需使用该.to_pydatetime方法即可。
import numpy as np
i = np.argmin(np.abs(df.index.to_pydatetime() - image_time))
Then you simply use the DataFrame's .ilocindexer:
然后您只需使用 DataFrame 的.iloc索引器:
df.iloc[i]
Here's a function to do this:
这是执行此操作的函数:
def fcl(df, dtObj):
return df.iloc[np.argmin(np.abs(df.index.to_pydatetime() - dtObj))]
You can then further filter seamlessly, e.g.
然后您可以进一步无缝过滤,例如
fcl(df, dtObj)['column']
回答by silgon
I was confronting the same problem today. I wanted a function able to give me the closest value prior a given timestamp. Here is the function I got:
我今天面临同样的问题。我想要一个能够在给定时间戳之前为我提供最接近值的函数。这是我得到的功能:
def get_nearest_past(data, timestamp):
index = data.index.get_loc(timestamp,"ffill")
return data.iloc[index]
In the case that you need the global closest (and not the closest before as in my case), you can use:
如果您需要全局最接近的(而不是之前最接近的,就像我的情况一样),您可以使用:
def get_nearest(data, timestamp):
index = data.index.get_loc(timestamp,"nearest")
return data.iloc[index]
You can find more information in the get_locdocumentation.
您可以在get_loc文档中找到更多信息。

