Python 以整数形式获取 Pandas 数据帧行的索引

Question

提问by durbachit

Assume an easy dataframe, for example

假设一个简单的数据框，例如

    A         B
0   1  0.810743
1   2  0.595866
2   3  0.154888
3   4  0.472721
4   5  0.894525
5   6  0.978174
6   7  0.859449
7   8  0.541247
8   9  0.232302
9  10  0.276566

How can I retrieve an index value of a row, given a condition? For example: dfb = df[df['A']==5].index.values.astype(int)returns [4], but what I would like to get is just 4. This is causing me troubles later in the code.

给定条件，如何检索行的索引值？例如： dfb = df[df['A']==5].index.values.astype(int)返回[4]，但我想得到的只是4. 这给我后面的代码带来了麻烦。

Based on some conditions, I want to have a record of the indexes where that condition is fulfilled, and then select rows between.

根据某些条件，我想记录满足该条件的索引，然后在其中选择行。

I tried

我试过

dfb = df[df['A']==5].index.values.astype(int)
dfbb = df[df['A']==8].index.values.astype(int)
df.loc[dfb:dfbb,'B']

for a desired output

对于所需的输出

    A         B
4   5  0.894525
5   6  0.978174
6   7  0.859449

but I get TypeError: '[4]' is an invalid key

但我明白了 TypeError: '[4]' is an invalid key

Answer 1

回答by jezrael

The easier is add [0]- select first value of list with one element:

更容易的是添加[0]- 用一个元素选择列表的第一个值：

dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]

dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])

But if possible some values not match, error is raised, because first value not exist.

但是如果可能的话，有些值不匹配，则会引发错误，因为第一个值不存在。

Solution is use nextwith iterfor get default parameetr if values not matched:

解决方案是使用next与iter用于获取缺省parameetr如果没有匹配的值：

dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4

dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match

Then it seems need substract 1:

那么似乎需要减去1：

print (df.loc[dfb:dfbb-1,'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

Another solution with boolean indexingor query:

使用boolean indexingor 的另一种解决方案query：

print (df[(df['A'] >= 5) & (df['A'] < 8)])
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

print (df.query('A >= 5 and A < 8'))
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

Answer 2

回答by dmdip

To answer the original question on how to get the index as an integer for the desired selection, the following will work :

要回答有关如何将索引作为所需选择的整数获取的原始问题，以下将起作用：

df[df['A']==5].index.item()

Answer 3

回答by piRSquared

The nature of wanting to include the row where A == 5and all rows upto but notincluding the row where A == 8means we will end up using iloc(locincludes both ends of slice).

想要包含行 whereA == 5和所有行但不包括行 where 的性质A == 8意味着我们最终将使用iloc(loc包括切片的两端)。

In order to get the index labels we use idxmax. This will return the first position of the maximum value. I run this on a boolean series where A == 5(then when A == 8) which returns the index value of when A == 5first happens (same thing for A == 8).

为了获得索引标签，我们使用idxmax. 这将返回最大值的第一个位置。我在一个布尔系列上运行它，其中A == 5(then when A == 8) 返回A == 5第一次发生时的索引值（与相同A == 8）。

Then I use searchsortedto find the ordinal position of where the index label (that I found above) occurs. This is what I use in iloc.

然后我用它searchsorted来查找索引标签（我在上面找到的）出现的顺序位置。这是我在iloc.

i5, i8 = df.index.searchsorted([df.A.eq(5).idxmax(), df.A.eq(8).idxmax()])
df.iloc[i5:i8]

numpy

麻木

you can further enhance this by using the underlying numpy objects the analogous numpy functions. I wrapped it up into a handy function.

您可以通过使用底层 numpy 对象和类似的 numpy 函数来进一步增强这一点。我把它包装成一个方便的函数。

def find_between(df, col, v1, v2):
    vals = df[col].values
    mx1, mx2 = (vals == v1).argmax(), (vals == v2).argmax()
    idx = df.index.values
    i1, i2 = idx.searchsorted([mx1, mx2])
    return df.iloc[i1:i2]

find_between(df, 'A', 5, 8)

timing

定时

Python 以整数形式获取 Pandas 数据帧行的索引

提问by durbachit

回答by jezrael

回答by dmdip

回答by piRSquared

相关推荐

最近更新

标签

Python 以整数形式获取 Pandas 数据帧行的索引

提问by durbachit

回答by jezrael

回答by dmdip

回答by piRSquared

相关推荐

Python AttributeError: 'tuple' 对象没有属性 'shape'

Python TypeError: unhashable type: 'slice' for pandas

使用 python 更改特定 Pandas 数据框列中的行值

Python opencv 无法停止流：设备的 ioctl 不合适

相关推荐

最近更新

标签