pandas 熊猫根据布尔条件选择行和列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48035493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas select rows and columns based on boolean condition
提问by Sockey
I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x'
, 'col_y'
where 'col_z' < m
. Is there a simple way to do this, similar to df[df['col3'] < m]
and df[['colx','coly']]
but combined?
我有一个大约有 50 列和 > 100 行的 Pandas 数据框。我想选择列'col_x'
,'col_y'
其中'col_z' < m
. 有一个简单的方法来做到这一点,类似于df[df['col3'] < m]
和df[['colx','coly']]
,但结合起来呢?
回答by cs95
Let's break down your problem. You want to
让我们分解你的问题。你想要
- Filter rows based on some boolean condition
- You want to select a subset of columns from the result.
- 根据一些布尔条件过滤行
- 您想从结果中选择列的子集。
For the first point, the condition you'd need is -
对于第一点,你需要的条件是——
df["col_z"] < m
For the second requirement, you'd want to specify the list of columns that you need -
对于第二个要求,您需要指定所需的列列表 -
["col_x", "col_y"]
How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc
-
您将如何将这两者结合起来使用 Pandas 产生预期的输出?最直接的方法是使用loc
-
df.loc[df["col_z"] < m, ["col_x", "col_y"]]
The first argument selects rows, and the second argument selects columns.
第一个参数选择行,第二个参数选择列。
More About loc
更多关于 loc
Think of this in terms of the relational algebra operations - selectionand projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -
从关系代数运算——选择和投影的角度考虑这一点。如果您来自 SQL 世界,这将是一个相关的等价物。上面的操作,在 SQL 语法中,看起来像这样 -
SELECT col_x, col_y # projection on columns
FROM df
WHERE col_z < m # selection on rows
pandas
loc allows you to specify index labels for selecting rows. For example, if you have a dataframe -
pandas
loc 允许您指定用于选择行的索引标签。例如,如果您有一个数据框 -
col_x col_y
a 1 4
b 2 5
c 3 6
To select index a
, and c
, and col_x
you'd use -
要选择 index a
, and c
,col_x
你会使用 -
df.loc[['a', 'c'], ['col_x']]
col_x
a 1
c 3
Alternatively, for selecting by a boolean condition (using a series/array of bool
values, as your original question asks), where all values in col_x
are odd -
或者,对于通过布尔条件进行选择(使用一系列/bool
值数组,如您的原始问题所问),其中所有值col_x
都是奇数 -
df.loc[(df.col_x % 2).ne(0), ['col_y']]
col_y
a 4
c 6
For details, df.col_x % 2
computes the modulus of each value with respect to 2
. The ne(0)
will then compare the value to 0
, and return True
if it isn't (all odd numbers are selected like this). Here's what that expression results in -
有关详细信息,请df.col_x % 2
计算每个值关于 的模数2
。在ne(0)
随后将比较值0
,并返回True
如果不是(所有的奇数都选择这样的)。这是该表达式的结果 -
(df.col_x % 2).ne(0)
a True
b False
c True
Name: col_x, dtype: bool
Further Reading
进一步阅读