pandas 熊猫根据布尔条件选择行和列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48035493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas select rows and columns based on boolean condition
提问by Sockey
I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x', 'col_y'where 'col_z' < m. Is there a simple way to do this, similar to df[df['col3'] < m]and df[['colx','coly']]but combined?
我有一个大约有 50 列和 > 100 行的 Pandas 数据框。我想选择列'col_x','col_y'其中'col_z' < m. 有一个简单的方法来做到这一点,类似于df[df['col3'] < m]和df[['colx','coly']],但结合起来呢?
回答by cs95
Let's break down your problem. You want to
让我们分解你的问题。你想要
- Filter rows based on some boolean condition
- You want to select a subset of columns from the result.
- 根据一些布尔条件过滤行
- 您想从结果中选择列的子集。
For the first point, the condition you'd need is -
对于第一点,你需要的条件是——
df["col_z"] < m
For the second requirement, you'd want to specify the list of columns that you need -
对于第二个要求,您需要指定所需的列列表 -
["col_x", "col_y"]
How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc-
您将如何将这两者结合起来使用 Pandas 产生预期的输出?最直接的方法是使用loc-
df.loc[df["col_z"] < m, ["col_x", "col_y"]]
The first argument selects rows, and the second argument selects columns.
第一个参数选择行,第二个参数选择列。
More About loc
更多关于 loc
Think of this in terms of the relational algebra operations - selectionand projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -
从关系代数运算——选择和投影的角度考虑这一点。如果您来自 SQL 世界,这将是一个相关的等价物。上面的操作,在 SQL 语法中,看起来像这样 -
SELECT col_x, col_y # projection on columns
FROM df
WHERE col_z < m # selection on rows
pandasloc allows you to specify index labels for selecting rows. For example, if you have a dataframe -
pandasloc 允许您指定用于选择行的索引标签。例如,如果您有一个数据框 -
col_x col_y
a 1 4
b 2 5
c 3 6
To select index a, and c, and col_xyou'd use -
要选择 index a, and c,col_x你会使用 -
df.loc[['a', 'c'], ['col_x']]
col_x
a 1
c 3
Alternatively, for selecting by a boolean condition (using a series/array of boolvalues, as your original question asks), where all values in col_xare odd -
或者,对于通过布尔条件进行选择(使用一系列/bool值数组,如您的原始问题所问),其中所有值col_x都是奇数 -
df.loc[(df.col_x % 2).ne(0), ['col_y']]
col_y
a 4
c 6
For details, df.col_x % 2computes the modulus of each value with respect to 2. The ne(0)will then compare the value to 0, and return Trueif it isn't (all odd numbers are selected like this). Here's what that expression results in -
有关详细信息,请df.col_x % 2计算每个值关于 的模数2。在ne(0)随后将比较值0,并返回True如果不是(所有的奇数都选择这样的)。这是该表达式的结果 -
(df.col_x % 2).ne(0)
a True
b False
c True
Name: col_x, dtype: bool
Further Reading
进一步阅读

