pandas 熊猫根据布尔条件选择行和列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48035493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:59:28  来源:igfitidea点击:

Pandas select rows and columns based on boolean condition

pythonpandasdataframeconditional

提问by Sockey

I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x', 'col_y'where 'col_z' < m. Is there a simple way to do this, similar to df[df['col3'] < m]and df[['colx','coly']]but combined?

我有一个大约有 50 列和 > 100 行的 Pandas 数据框。我想选择列'col_x''col_y'其中'col_z' < m. 有一个简单的方法来做到这一点,类似于df[df['col3'] < m]df[['colx','coly']],但结合起来呢?

回答by cs95

Let's break down your problem. You want to

让我们分解你的问题。你想要

  1. Filter rows based on some boolean condition
  2. You want to select a subset of columns from the result.
  1. 根据一些布尔条件过滤行
  2. 您想从结果中选择列的子集。

For the first point, the condition you'd need is -

对于第一点,你需要的条件是——

df["col_z"] < m

For the second requirement, you'd want to specify the list of columns that you need -

对于第二个要求,您需要指定所需的列列表 -

["col_x", "col_y"]

How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc-

您将如何将这两者结合起来使用 Pandas 产生预期的输出?最直接的方法是使用loc-

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

The first argument selects rows, and the second argument selects columns.

第一个参数选择行,第二个参数选择列。



More About loc

更多关于 loc

Think of this in terms of the relational algebra operations - selectionand projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -

从关系代数运算——选择投影的角度考虑这一点。如果您来自 SQL 世界,这将是一个相关的等价物。上面的操作,在 SQL 语法中,看起来像这样 -

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m         # selection on rows

pandasloc allows you to specify index labels for selecting rows. For example, if you have a dataframe -

pandasloc 允许您指定用于选择行的索引标签。例如,如果您有一个数据框 -

   col_x  col_y
a      1      4
b      2      5
c      3      6

To select index a, and c, and col_xyou'd use -

要选择 index a, and ccol_x你会使用 -

df.loc[['a', 'c'], ['col_x']]

   col_x
a      1
c      3

Alternatively, for selecting by a boolean condition (using a series/array of boolvalues, as your original question asks), where all values in col_xare odd -

或者,对于通过布尔条件进行选择(使用一系列/bool值数组,如您的原始问题所问),其中所有值col_x都是奇数 -

df.loc[(df.col_x % 2).ne(0), ['col_y']]

   col_y
a      4
c      6

For details, df.col_x % 2computes the modulus of each value with respect to 2. The ne(0)will then compare the value to 0, and return Trueif it isn't (all odd numbers are selected like this). Here's what that expression results in -

有关详细信息,请df.col_x % 2计算每个值关于 的模数2。在ne(0)随后将比较值0,并返回True如果不是(所有的奇数都选择这样的)。这是该表达式的结果 -

(df.col_x % 2).ne(0)

a     True
b    False
c     True
Name: col_x, dtype: bool


Further Reading

进一步阅读