pandas 熊猫根据布尔条件选择行和列

Question

提问by Sockey

I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x', 'col_y'where 'col_z' < m. Is there a simple way to do this, similar to df[df['col3'] < m]and df[['colx','coly']]but combined?

我有一个大约有 50 列和 > 100 行的 Pandas 数据框。我想选择列'col_x'，'col_y'其中'col_z' < m. 有一个简单的方法来做到这一点，类似于df[df['col3'] < m]和df[['colx','coly']]，但结合起来呢？

Answer 1

回答by cs95

Let's break down your problem. You want to

让我们分解你的问题。你想要

Filter rows based on some boolean condition
You want to select a subset of columns from the result.

根据一些布尔条件过滤行
您想从结果中选择列的子集。

For the first point, the condition you'd need is -

对于第一点，你需要的条件是——

df["col_z"] < m

For the second requirement, you'd want to specify the list of columns that you need -

对于第二个要求，您需要指定所需的列列表 -

["col_x", "col_y"]

How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc-

您将如何将这两者结合起来使用 Pandas 产生预期的输出？最直接的方法是使用loc-

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

The first argument selects rows, and the second argument selects columns.

第一个参数选择行，第二个参数选择列。

More About loc

更多关于 loc

Think of this in terms of the relational algebra operations - selectionand projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -

从关系代数运算——选择和投影的角度考虑这一点。如果您来自 SQL 世界，这将是一个相关的等价物。上面的操作，在 SQL 语法中，看起来像这样 -

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m         # selection on rows

pandasloc allows you to specify index labels for selecting rows. For example, if you have a dataframe -

pandasloc 允许您指定用于选择行的索引标签。例如，如果您有一个数据框 -

   col_x  col_y
a      1      4
b      2      5
c      3      6

To select index a, and c, and col_xyou'd use -

要选择 index a, and c，col_x你会使用 -

df.loc[['a', 'c'], ['col_x']]

   col_x
a      1
c      3

Alternatively, for selecting by a boolean condition (using a series/array of boolvalues, as your original question asks), where all values in col_xare odd -

或者，对于通过布尔条件进行选择（使用一系列/bool值数组，如您的原始问题所问），其中所有值col_x都是奇数 -

df.loc[(df.col_x % 2).ne(0), ['col_y']]

   col_y
a      4
c      6

For details, df.col_x % 2computes the modulus of each value with respect to 2. The ne(0)will then compare the value to 0, and return Trueif it isn't (all odd numbers are selected like this). Here's what that expression results in -

有关详细信息，请df.col_x % 2计算每个值关于的模数2。在ne(0)随后将比较值0，并返回True如果不是（所有的奇数都选择这样的）。这是该表达式的结果 -

(df.col_x % 2).ne(0)

a     True
b    False
c     True
Name: col_x, dtype: bool

Further Reading

进一步阅读

pandas 熊猫根据布尔条件选择行和列

提问by Sockey

回答by cs95

相关推荐

最近更新

标签

pandas 熊猫根据布尔条件选择行和列

提问by Sockey

回答by cs95

相关推荐

pandas 如何按行随机打乱pandas数据帧

Pandas - 将时间戳四舍五入到最接近的秒

pandas 如何将字符串转换为整数熊猫

pandas 使用 pymssql 将数据插入 SQL Server 表

相关推荐

最近更新

标签