Pandas 数据框列中值的第一个实例
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43635660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe first instance of value in column
提问by warrenfitzhenry
I have df:
我有 df:
Voltage
01-02-2017 00:00 13.1
01-02-2017 00:01 13.2
01-02-2017 00:02 13.3
01-02-2017 00:03 14.1
01-02-2017 00:04 14.3
01-02-2017 00:04 13.5
I would like the time (hh:mm) of the first instance of when the value in the Voltage column >=14.0. There should only be one time value in column 'Time of Full Charge'.
我想要电压列中的值 >=14.0 的第一个实例的时间 (hh:mm)。在“完全充电时间”列中应该只有一个时间值。
Voltage Time of Full Charge
01-02-2017 00:00 13.1
01-02-2017 00:01 13.2
01-02-2017 00:02 13.3
01-02-2017 00:03 14.1 00:03
01-02-2017 00:04 14.3
01-02-2017 00:04 13.5
I am trying something along these lines, but cannot figure it out:
我正在沿着这些方向尝试一些东西,但无法弄清楚:
df.index = pd.to_datetime(df.index)
df.['Time of Full Charge'] = np.where(df.['Voltage'] >= 14.0), (df.index.hour:df.index.minute))
采纳答案by jezrael
You need idxmax
for first index value by condition, only is necessary index has to be unique:
您需要idxmax
按条件获取第一个索引值,只有必要的索引必须是唯一的:
idx = (df['Voltage'] >= 14.0).idxmax()
df.loc[mask, 'Time of Full Charge'] = mask.idxmax().strftime('%H:%M')
print (df)
Voltage Time of Full Charge
2017-01-02 00:00:00 13.1 NaN
2017-01-02 00:01:00 13.2 NaN
2017-01-02 00:02:00 13.3 NaN
2017-01-02 00:03:00 14.1 00:03
2017-01-02 00:04:00 14.3 NaN
2017-01-02 00:04:00 13.5 NaN
Or:
或者:
idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index == idx, idx.strftime('%H:%M'), '')
print (df)
Voltage Time of Full Charge
2017-01-02 00:00:00 13.1
2017-01-02 00:01:00 13.2
2017-01-02 00:02:00 13.3
2017-01-02 00:03:00 14.1 00:03
2017-01-02 00:04:00 14.3
2017-01-02 00:04:00 13.5
For non unique index is possible use MultiIndex
:
对于非唯一索引可以使用MultiIndex
:
df.index = [np.arange(len(df.index)), df.index]
idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index.get_level_values(0) == idx[0],
idx[1].strftime('%H:%M'),
'')
df.index = df.index.droplevel(0)
print (df)
Voltage Time of Full Charge
2017-01-02 00:00:00 13.1
2017-01-02 00:01:00 13.2
2017-01-02 00:02:00 13.3
2017-01-02 00:03:00 14.1 00:03
2017-01-02 00:04:00 14.3
2017-01-02 00:04:00 13.5
回答by MaxU
You can use numpy.searchsorted()if Voltage
column is sorted:
如果列已排序,您可以使用numpy.searchsorted()Voltage
:
In [260]: df.index[np.searchsorted(df.Voltage, 14)]
Out[260]: DatetimeIndex(['2017-01-02 00:03:00'], dtype='datetime64[ns]', freq=None)