根据条件获取 Python Pandas 中数据框的第一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40660088/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get first row of dataframe in Python Pandas based on criteria
提问by Tasos
Let's say that I have a dataframe like this one
假设我有一个像这样的数据框
import pandas as pd
df = pd.DataFrame([[1, 2, 1], [1, 3, 2], [4, 6, 3], [4, 3, 4], [5, 4, 5]], columns=['A', 'B', 'C'])
>> df
A B C
0 1 2 1
1 1 3 2
2 4 6 3
3 4 3 4
4 5 4 5
The original table is more complicated with more columns and rows.
原始表更复杂,具有更多的列和行。
I want to get the first row that fulfil some criteria. Examples:
我想获得满足某些条件的第一行。例子:
- Get first row where A > 3 (returns row 2)
- Get first row where A > 4 AND B > 3 (returns row 4)
- Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)
- 获取 A > 3 的第一行(返回第 2 行)
- 获取 A > 4 AND B > 3 的第一行(返回第 4 行)
- 获取 A > 3 AND (B > 3 OR C > 2) 的第一行(返回第 2 行)
But, if there isn't any row that fulfil the specific criteria, then I want to get the first one after I just sort it descending by A (or other cases by B, C etc)
但是,如果没有任何行满足特定条件,那么我想在我按 A 降序排序(或按 B、C 等其他情况)后得到第一个行
- Get first row where A > 6 (returns row 4 by ordering it by A desc and get the first one)
- 获取 A > 6 的第一行(通过按 A desc 排序返回第 4 行并获取第一个)
I was able to do it by iterating on the dataframe (I know that craps :P). So, I prefer a more pythonic way to solve it.
我能够通过迭代数据帧来做到这一点(我知道胡扯:P)。所以,我更喜欢用更 Pythonic 的方式来解决它。
回答by Tgsmith61591
This tutorialis a very good one for pandas slicing. Make sure you check it out. Onto some snippets... To slice a dataframe with a condition, you use this format:
本教程非常适合大熊猫切片。请务必检查一下。在一些片段上...要使用条件切片数据框,请使用以下格式:
>>> df[condition]
This will return a slice of your dataframe which you can index using iloc
. Here are your examples:
这将返回数据帧的一部分,您可以使用iloc
. 以下是您的示例:
Get first row where A > 3 (returns row 2)
>>> df[df.A > 3].iloc[0] A 4 B 6 C 3 Name: 2, dtype: int64
获取 A > 3 的第一行(返回第 2 行)
>>> df[df.A > 3].iloc[0] A 4 B 6 C 3 Name: 2, dtype: int64
If what you actually want is the row number, rather than using iloc
, it would be df[df.A > 3].index[0]
.
如果您真正想要的是行号,而不是使用iloc
,那就是df[df.A > 3].index[0]
.
Get first row where A > 4 AND B > 3:
>>> df[(df.A > 4) & (df.B > 3)].iloc[0] A 5 B 4 C 5 Name: 4, dtype: int64
Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)
>>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0] A 4 B 6 C 3 Name: 2, dtype: int64
获取 A > 4 AND B > 3 的第一行:
>>> df[(df.A > 4) & (df.B > 3)].iloc[0] A 5 B 4 C 5 Name: 4, dtype: int64
获取 A > 3 AND (B > 3 OR C > 2) 的第一行(返回第 2 行)
>>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0] A 4 B 6 C 3 Name: 2, dtype: int64
Now, with your last case we can write a function that handles the default case of returning the descending-sorted frame:
现在,对于您的最后一种情况,我们可以编写一个函数来处理返回降序排序框架的默认情况:
>>> def series_or_default(X, condition, default_col, ascending=False):
... sliced = X[condition]
... if sliced.shape[0] == 0:
... return X.sort_values(default_col, ascending=ascending).iloc[0]
... return sliced.iloc[0]
>>>
>>> series_or_default(df, df.A > 6, 'A')
A 5
B 4
C 5
Name: 4, dtype: int64
As expected, it returns row 4.
正如预期的那样,它返回第 4 行。
回答by Boud
For existing matches, use query
:
对于现有匹配项,请使用query
:
df.query(' A > 3' ).head(1)
Out[33]:
A B C
2 4 6 3
df.query(' A > 4 and B > 3' ).head(1)
Out[34]:
A B C
4 5 4 5
df.query(' A > 3 and (B > 3 or C > 2)' ).head(1)
Out[35]:
A B C
2 4 6 3
回答by PabTorre
you can take care of the first 3 items with slicing and head:
您可以使用切片和头部处理前 3 个项目:
df[df.A>=4].head(1)
df[(df.A>=4)&(df.B>=3)].head(1)
df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)
df[df.A>=4].head(1)
df[(df.A>=4)&(df.B>=3)].head(1)
df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)
The condition in case nothing comes back you can handle with a try or an if...
万一什么都没有回来的情况,你可以用 try 或 if 来处理......
try:
output = df[df.A>=6].head(1)
assert len(output) == 1
except:
output = df.sort_values('A',ascending=False).head(1)