查找大于级别的值 - Python Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38862657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:46:50  来源:igfitidea点击:

Find value greater than level - Python Pandas

pythonpandas

提问by Jared

In a time series (ordered tuples), what's the most efficient way to find the first time a criterion is met?

在时间序列(有序元组)中,找到第一次满足条件的最有效方法是什么?

In particular, what's the most efficient way to determine when a value goes over 100 for the value of a column in a pandas data frame?

特别是,确定 Pandas 数据框中列的值何时超过 100 的最有效方法是什么?

I was hoping for a clever vectorized solution, and not having to use df.iterrows().

我希望有一个聪明的矢量化解决方案,而不必使用df.iterrows().

For example, for price or count data, when a value exceeds 100. I.e. df['col'] > 100.

例如,对于价格或计数数据,当一个值超过 100 时。即 df['col'] > 100。

              price
date 
2005-01-01     98
2005-01-02     99
2005-01-03     100
2005-01-04     99
2005-01-05     98
2005-01-06     100
2005-01-07     100
2005-01-08     98

but for potentially very large series. Is it better to iterate (slow) or is there a vectorized solution?

但对于可能非常大的系列。迭代(慢)更好还是有矢量化解决方案?

A df.iterrows()solution could be:

一个df.iterrows()解决方案可能是:

for row, ind in df.iterrows():
    if row['col'] > value_to_check:
        breakpoint = row['value_to_record'].loc[ind]
        return breakpoint
return None

But my question is more about efficiency (potentially, a vectorized solution that will scale well).

但我的问题更多是关于效率(可能是一个可以很好扩展的矢量化解决方案)。

回答by Merlin

Try this: "> 99"

试试这个:"> 99"

df[df['price'].gt(99)].index[0]

returns "2", the second index row.

返回"2",第二个索引行。

all row indexes greater than 99

所有大于 99 的行索引

df[df['price'].gt(99)].index
Int64Index([2, 5, 6], dtype='int64')

回答by user3304496

This will return the index value of the first occurrence of 100 in the series:

这将返回系列中第一次出现 100 的索引值:

 index_value = (df['col'] - 100).apply(abs).idxmin()

If there is no value exactly 100, it should return the index of the closest value.

如果没有正好是 100 的值,它应该返回最接近值的索引。