pandas 根据熊猫中多列的值从数据框中选择行

Question

提问by Does it matter

This question is veryrelated to these two questions anotherand thisone, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):

这个问题与这两个问题another和thisone非常相关，我什至会使用这个问题上非常有用的公认解决方案中的示例。这是已接受的解决方案中的示例（归功于 unutbu）：

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

产量

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

But I want to have all rows of A and only the arrows in B that have 'two' in them. My attempt at it is to try

但我想拥有 A 的所有行，并且只有 B 中的箭头中有“两个”。我的尝试是尝试

print(df.loc[df['A']) & df['B'] == 'two'])

This does not work, unfortunately. Can anybody suggest a way to implement something like this? it would be of a great help if the solution is somewhat general where for example column A doesn't have the same value which is 'foo' but has different values and you still want the whole column.

不幸的是，这不起作用。任何人都可以建议一种方法来实现这样的事情吗？如果解决方案有点通用，例如列 A 没有相同的值，即 'foo' 但具有不同的值，并且您仍然想要整个列，那将会有很大帮助。

Answer 1

采纳答案by ely

I thinkI understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:

我想我理解你修改后的问题。在以为条件进行子选择后B，您可以选择您想要的列，例如：

In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]: 
     A    B
2  foo  two
4  foo  two
5  bar  two

For example, if I wanted to concatenate all the string of column A, for which column B had value 'two', then I could do:

例如，如果我想连接 A 列的所有字符串，其中 B 列具有 value 'two'，那么我可以这样做：

In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'

You could also groupbythe values of column B and get such a concatenation result for every different B-group from one expression:

您还可以groupby获取 B 列的值，并从一个表达式中为每个不同的 B 组获得这样的串联结果：

In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]: 
B
one      foobarfoo
three       barfoo
two      foofoobar
dtype: object

To filter on AandBuse numpy.logical_and:

过滤A和B使用numpy.logical_and：

In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]: 
     A    B  C  D
2  foo  two  2  4
4  foo  two  4  8

Answer 2

回答by YOLO

Row subsetting: Isn't this you are looking for ?

行子集：这不是你要找的吗？

df.loc[(df['A'] == 'foo') & (df['B'] == 'two')]

   A   B  C D
2 foo two 2 4
4 foo two 4 8

You can also add .reset_index()at the end to initialize indexes from zero.

您还可以.reset_index()在末尾添加以从零初始化索引。

Answer 3

回答by Humi

Easy , if you do

很简单，如果你这样做

     df[['A','B']][df['B']=='two']

you will get:

你会得到：

    A    B

2  foo  two
4  foo  two
5  bar  two

To filter on both A and B:

要同时过滤 A 和 B：

    df[['A','B']][(df['B']=='two') & (df['A']=='foo')]

You get:

你得到：

        A    B
    2  foo  two
    4  foo  two

and if you want all the columns :

如果你想要所有的列：

        df[df['B']=='two']

you will get:

你会得到：

            A    B  C   D
        2  foo  two  2   4
        4  foo  two  4   8
        5  bar  two  5  10

pandas 根据熊猫中多列的值从数据框中选择行

提问by Does it matter

采纳答案by ely

回答by YOLO

回答by Humi

相关推荐

最近更新

标签

pandas 根据熊猫中多列的值从数据框中选择行

提问by Does it matter

采纳答案by ely

回答by YOLO

回答by Humi

相关推荐

Pandas 数据框：省略周末和假期附近的日子

pandas 熊猫系列到二维数组

pandas 将pandas系列输出到txt文件

Pandas：使用 .isin() 返回错误：“AttributeError: float' object has no attribute 'isin'”

相关推荐

最近更新

标签