pandas 根据熊猫中多列的值从数据框中选择行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48979561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:14:47  来源:igfitidea点击:

Selecting rows from a Dataframe based on values from multiple columns in pandas

pythonpandas

提问by Does it matter

This question is veryrelated to these two questions anotherand thisone, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):

这个问题与这两个问题anotherthisone非常相关,我什至会使用这个问题上非常有用的公认解决方案中的示例。这是已接受的解决方案中的示例(归功于 unutbu):

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

产量

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

But I want to have all rows of A and only the arrows in B that have 'two' in them. My attempt at it is to try

但我想拥有 A 的所有行,并且只有 B 中的箭头中有“两个”。我的尝试是尝试

print(df.loc[df['A']) & df['B'] == 'two'])

This does not work, unfortunately. Can anybody suggest a way to implement something like this? it would be of a great help if the solution is somewhat general where for example column A doesn't have the same value which is 'foo' but has different values and you still want the whole column.

不幸的是,这不起作用。任何人都可以建议一种方法来实现这样的事情吗?如果解决方案有点通用,例如列 A 没有相同的值,即 'foo' 但具有不同的值,并且您仍然想要整个列,那将会有很大帮助。

采纳答案by ely

I thinkI understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:

我理解你修改后的问题。在以 为条件进行子选择后B,您可以选择您想要的列,例如:

In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]: 
     A    B
2  foo  two
4  foo  two
5  bar  two

For example, if I wanted to concatenate all the string of column A, for which column B had value 'two', then I could do:

例如,如果我想连接 A 列的所有字符串,其中 B 列具有 value 'two',那么我可以这样做:

In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'

You could also groupbythe values of column B and get such a concatenation result for every different B-group from one expression:

您还可以groupby获取 B 列的值,并从一个表达式中为每个不同的 B 组获得这样的串联结果:

In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]: 
B
one      foobarfoo
three       barfoo
two      foofoobar
dtype: object

To filter on AandBuse numpy.logical_and:

过滤AB使用numpy.logical_and

In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]: 
     A    B  C  D
2  foo  two  2  4
4  foo  two  4  8

回答by YOLO

Row subsetting: Isn't this you are looking for ?

行子集:这不是你要找的吗?

df.loc[(df['A'] == 'foo') & (df['B'] == 'two')]

   A   B  C D
2 foo two 2 4
4 foo two 4 8

You can also add .reset_index()at the end to initialize indexes from zero.

您还可以.reset_index()在末尾添加以从零初始化索引。

回答by Humi

Easy , if you do

很简单,如果你这样做

     df[['A','B']][df['B']=='two']

you will get:

你会得到:

    A    B

2  foo  two
4  foo  two
5  bar  two

To filter on both A and B:

要同时过滤 A 和 B:

    df[['A','B']][(df['B']=='two') & (df['A']=='foo')]

You get:

你得到:

        A    B
    2  foo  two
    4  foo  two

and if you want all the columns :

如果你想要所有的列:

        df[df['B']=='two']

you will get:

你会得到:

            A    B  C   D
        2  foo  two  2   4
        4  foo  two  4   8
        5  bar  two  5  10