Pandas 数据框列中值的第一个实例

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43635660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:29:00  来源:igfitidea点击:

Pandas dataframe first instance of value in column

pythonpandasdataframe

提问by warrenfitzhenry

I have df:

我有 df:

                     Voltage
01-02-2017 00:00       13.1
01-02-2017 00:01       13.2
01-02-2017 00:02       13.3
01-02-2017 00:03       14.1
01-02-2017 00:04       14.3
01-02-2017 00:04       13.5

I would like the time (hh:mm) of the first instance of when the value in the Voltage column >=14.0. There should only be one time value in column 'Time of Full Charge'.

我想要电压列中的值 >=14.0 的第一个实例的时间 (hh:mm)。在“完全充电时间”列中应该只有一个时间值。

                     Voltage   Time of Full Charge
01-02-2017 00:00       13.1
01-02-2017 00:01       13.2
01-02-2017 00:02       13.3
01-02-2017 00:03       14.1         00:03
01-02-2017 00:04       14.3
01-02-2017 00:04       13.5

I am trying something along these lines, but cannot figure it out:

我正在沿着这些方向尝试一些东西,但无法弄清楚:

df.index = pd.to_datetime(df.index)
df.['Time of Full Charge'] = np.where(df.['Voltage'] >= 14.0), (df.index.hour:df.index.minute))    

采纳答案by jezrael

You need idxmaxfor first index value by condition, only is necessary index has to be unique:

您需要idxmax按条件获取第一个索引值,只有必要的索引必须是唯一的:

idx = (df['Voltage'] >= 14.0).idxmax()
df.loc[mask, 'Time of Full Charge'] = mask.idxmax().strftime('%H:%M')
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                 NaN
2017-01-02 00:01:00     13.2                 NaN
2017-01-02 00:02:00     13.3                 NaN
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                 NaN
2017-01-02 00:04:00     13.5                 NaN

Or:

或者:

idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index == idx, idx.strftime('%H:%M'), '')
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                    
2017-01-02 00:01:00     13.2                    
2017-01-02 00:02:00     13.3                    
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                    
2017-01-02 00:04:00     13.5     

For non unique index is possible use MultiIndex:

对于非唯一索引可以使用MultiIndex

df.index = [np.arange(len(df.index)), df.index]

idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index.get_level_values(0) == idx[0], 
                                     idx[1].strftime('%H:%M'),
                                     '')

df.index = df.index.droplevel(0)
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                    
2017-01-02 00:01:00     13.2                    
2017-01-02 00:02:00     13.3                    
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                    
2017-01-02 00:04:00     13.5                    

回答by MaxU

You can use numpy.searchsorted()if Voltagecolumn is sorted:

如果列已排序,您可以使用numpy.searchsorted()Voltage

In [260]: df.index[np.searchsorted(df.Voltage, 14)]
Out[260]: DatetimeIndex(['2017-01-02 00:03:00'], dtype='datetime64[ns]', freq=None)