根据条件获取 Python Pandas 中数据框的第一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40660088/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:47:11  来源:igfitidea点击:

Get first row of dataframe in Python Pandas based on criteria

pythonpandas

提问by Tasos

Let's say that I have a dataframe like this one

假设我有一个像这样的数据框

import pandas as pd
df = pd.DataFrame([[1, 2, 1], [1, 3, 2], [4, 6, 3], [4, 3, 4], [5, 4, 5]], columns=['A', 'B', 'C'])

>> df
   A  B  C
0  1  2  1
1  1  3  2
2  4  6  3
3  4  3  4
4  5  4  5

The original table is more complicated with more columns and rows.

原始表更复杂,具有更多的列和行。

I want to get the first row that fulfil some criteria. Examples:

我想获得满足某些条件的第一行。例子:

  1. Get first row where A > 3 (returns row 2)
  2. Get first row where A > 4 AND B > 3 (returns row 4)
  3. Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)
  1. 获取 A > 3 的第一行(返回第 2 行)
  2. 获取 A > 4 AND B > 3 的第一行(返回第 4 行)
  3. 获取 A > 3 AND (B > 3 OR C > 2) 的第一行(返回第 2 行)

But, if there isn't any row that fulfil the specific criteria, then I want to get the first one after I just sort it descending by A (or other cases by B, C etc)

但是,如果没有任何行满足特定条件,那么我想在我按 A 降序排序(或按 B、C 等其他情况)后得到第一个行

  1. Get first row where A > 6 (returns row 4 by ordering it by A desc and get the first one)
  1. 获取 A > 6 的第一行(通过按 A desc 排序返回第 4 行并获取第一个)

I was able to do it by iterating on the dataframe (I know that craps :P). So, I prefer a more pythonic way to solve it.

我能够通过迭代数据帧来做到这一点(我知道胡扯:P)。所以,我更喜欢用更 Pythonic 的方式来解决它。

回答by Tgsmith61591

This tutorialis a very good one for pandas slicing. Make sure you check it out. Onto some snippets... To slice a dataframe with a condition, you use this format:

本教程非常适合大熊猫切片。请务必检查一下。在一些片段上...要使用条件切片数据框,请使用以下格式:

>>> df[condition]

This will return a slice of your dataframe which you can index using iloc. Here are your examples:

这将返回数据帧的一部分,您可以使用iloc. 以下是您的示例:

  1. Get first row where A > 3 (returns row 2)

    >>> df[df.A > 3].iloc[0]
    A    4
    B    6
    C    3
    Name: 2, dtype: int64
    
  1. 获取 A > 3 的第一行(返回第 2 行)

    >>> df[df.A > 3].iloc[0]
    A    4
    B    6
    C    3
    Name: 2, dtype: int64
    

If what you actually want is the row number, rather than using iloc, it would be df[df.A > 3].index[0].

如果您真正想要的是行号,而不是使用iloc,那就是df[df.A > 3].index[0].

  1. Get first row where A > 4 AND B > 3:

    >>> df[(df.A > 4) & (df.B > 3)].iloc[0]
    A    5
    B    4
    C    5
    Name: 4, dtype: int64
    
  2. Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)

    >>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0]
    A    4
    B    6
    C    3
    Name: 2, dtype: int64
    
  1. 获取 A > 4 AND B > 3 的第一行:

    >>> df[(df.A > 4) & (df.B > 3)].iloc[0]
    A    5
    B    4
    C    5
    Name: 4, dtype: int64
    
  2. 获取 A > 3 AND (B > 3 OR C > 2) 的第一行(返回第 2 行)

    >>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0]
    A    4
    B    6
    C    3
    Name: 2, dtype: int64
    

Now, with your last case we can write a function that handles the default case of returning the descending-sorted frame:

现在,对于您的最后一种情况,我们可以编写一个函数来处理返回降序排序框架的默认情况:

>>> def series_or_default(X, condition, default_col, ascending=False):
...     sliced = X[condition]
...     if sliced.shape[0] == 0:
...         return X.sort_values(default_col, ascending=ascending).iloc[0]
...     return sliced.iloc[0]
>>> 
>>> series_or_default(df, df.A > 6, 'A')
A    5
B    4
C    5
Name: 4, dtype: int64

As expected, it returns row 4.

正如预期的那样,它返回第 4 行。

回答by Boud

For existing matches, use query:

对于现有匹配项,请使用query

df.query(' A > 3' ).head(1)
Out[33]: 
   A  B  C
2  4  6  3

df.query(' A > 4 and B > 3' ).head(1)
Out[34]: 
   A  B  C
4  5  4  5

df.query(' A > 3 and (B > 3 or C > 2)' ).head(1)
Out[35]: 
   A  B  C
2  4  6  3

回答by PabTorre

you can take care of the first 3 items with slicing and head:

您可以使用切片和头部处理前 3 个项目:

  1. df[df.A>=4].head(1)
  2. df[(df.A>=4)&(df.B>=3)].head(1)
  3. df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)
  1. df[df.A>=4].head(1)
  2. df[(df.A>=4)&(df.B>=3)].head(1)
  3. df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)

The condition in case nothing comes back you can handle with a try or an if...

万一什么都没有回来的情况,你可以用 try 或 if 来处理......

try:
    output = df[df.A>=6].head(1)
    assert len(output) == 1
except: 
    output = df.sort_values('A',ascending=False).head(1)