Python 以整数形式获取 Pandas 数据帧行的索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41217310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:37:25  来源:igfitidea点击:

Get index of a row of a pandas dataframe as an integer

pythonpandasnumpy

提问by durbachit

Assume an easy dataframe, for example

假设一个简单的数据框,例如

    A         B
0   1  0.810743
1   2  0.595866
2   3  0.154888
3   4  0.472721
4   5  0.894525
5   6  0.978174
6   7  0.859449
7   8  0.541247
8   9  0.232302
9  10  0.276566

How can I retrieve an index value of a row, given a condition? For example: dfb = df[df['A']==5].index.values.astype(int)returns [4], but what I would like to get is just 4. This is causing me troubles later in the code.

给定条件,如何检索行的索引值?例如: dfb = df[df['A']==5].index.values.astype(int)返回[4],但我想得到的只是4. 这给我后面的代码带来了麻烦。

Based on some conditions, I want to have a record of the indexes where that condition is fulfilled, and then select rows between.

根据某些条件,我想记录满足该条件的索引,然后在其中选择行。

I tried

我试过

dfb = df[df['A']==5].index.values.astype(int)
dfbb = df[df['A']==8].index.values.astype(int)
df.loc[dfb:dfbb,'B']

for a desired output

对于所需的输出

    A         B
4   5  0.894525
5   6  0.978174
6   7  0.859449

but I get TypeError: '[4]' is an invalid key

但我明白了 TypeError: '[4]' is an invalid key

回答by jezrael

The easier is add [0]- select first value of list with one element:

更容易的是添加[0]- 用一个元素选择列表的第一个值:

dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]


dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])

But if possible some values not match, error is raised, because first value not exist.

但是如果可能的话,有些值不匹配,则会引发错误,因为第一个值不存在。

Solution is use nextwith iterfor get default parameetr if values not matched:

解决方案是使用nextiter用于获取缺省parameetr如果没有匹配的值:

dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4

dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match

Then it seems need substract 1:

那么似乎需要减去1

print (df.loc[dfb:dfbb-1,'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

Another solution with boolean indexingor query:

使用boolean indexingor 的另一种解决方案query

print (df[(df['A'] >= 5) & (df['A'] < 8)])
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64


print (df.query('A >= 5 and A < 8'))
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

回答by dmdip

To answer the original question on how to get the index as an integer for the desired selection, the following will work :

要回答有关如何将索引作为所需选择的整数获取的原始问题,以下将起作用:

df[df['A']==5].index.item()

回答by piRSquared

The nature of wanting to include the row where A == 5and all rows upto but notincluding the row where A == 8means we will end up using iloc(locincludes both ends of slice).

想要包含行 whereA == 5和所有行但包括行 where 的性质A == 8意味着我们最终将使用iloc(loc包括切片的两端)。

In order to get the index labels we use idxmax. This will return the first position of the maximum value. I run this on a boolean series where A == 5(then when A == 8) which returns the index value of when A == 5first happens (same thing for A == 8).

为了获得索引标签,我们使用idxmax. 这将返回最大值的第一个位置。我在一个布尔系列上运行它,其中A == 5(then when A == 8) 返回A == 5第一次发生时的索引值(与 相同A == 8)。

Then I use searchsortedto find the ordinal position of where the index label (that I found above) occurs. This is what I use in iloc.

然后我用它searchsorted来查找索引标签(我在上面找到的)出现的顺序位置。这是我在iloc.

i5, i8 = df.index.searchsorted([df.A.eq(5).idxmax(), df.A.eq(8).idxmax()])
df.iloc[i5:i8]

enter image description here

在此处输入图片说明



numpy

麻木

you can further enhance this by using the underlying numpy objects the analogous numpy functions. I wrapped it up into a handy function.

您可以通过使用底层 numpy 对象和类似的 numpy 函数来进一步增强这一点。我把它包装成一个方便的函数。

def find_between(df, col, v1, v2):
    vals = df[col].values
    mx1, mx2 = (vals == v1).argmax(), (vals == v2).argmax()
    idx = df.index.values
    i1, i2 = idx.searchsorted([mx1, mx2])
    return df.iloc[i1:i2]

find_between(df, 'A', 5, 8)

enter image description here

在此处输入图片说明



timing
enter image description here

定时
在此处输入图片说明