pandas 根据熊猫中多列的值从数据框中选择行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48979561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting rows from a Dataframe based on values from multiple columns in pandas
提问by Does it matter
This question is veryrelated to these two questions anotherand thisone, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):
这个问题与这两个问题another和thisone非常相关,我什至会使用这个问题上非常有用的公认解决方案中的示例。这是已接受的解决方案中的示例(归功于 unutbu):
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])
yields
产量
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14
But I want to have all rows of A and only the arrows in B that have 'two' in them. My attempt at it is to try
但我想拥有 A 的所有行,并且只有 B 中的箭头中有“两个”。我的尝试是尝试
print(df.loc[df['A']) & df['B'] == 'two'])
This does not work, unfortunately. Can anybody suggest a way to implement something like this? it would be of a great help if the solution is somewhat general where for example column A doesn't have the same value which is 'foo' but has different values and you still want the whole column.
不幸的是,这不起作用。任何人都可以建议一种方法来实现这样的事情吗?如果解决方案有点通用,例如列 A 没有相同的值,即 'foo' 但具有不同的值,并且您仍然想要整个列,那将会有很大帮助。
采纳答案by ely
I thinkI understand your modified question. After sub-selecting on a condition of B
, then you can select the columns you want, such as:
我想我理解你修改后的问题。在以 为条件进行子选择后B
,您可以选择您想要的列,例如:
In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]:
A B
2 foo two
4 foo two
5 bar two
For example, if I wanted to concatenate all the string of column A, for which column B had value 'two'
, then I could do:
例如,如果我想连接 A 列的所有字符串,其中 B 列具有 value 'two'
,那么我可以这样做:
In [2]: df.loc[df.B =='two'].A.sum() # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'
You could also groupby
the values of column B and get such a concatenation result for every different B-group from one expression:
您还可以groupby
获取 B 列的值,并从一个表达式中为每个不同的 B 组获得这样的串联结果:
In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]:
B
one foobarfoo
three barfoo
two foofoobar
dtype: object
To filter on A
andB
use numpy.logical_and
:
过滤A
和B
使用numpy.logical_and
:
In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]:
A B C D
2 foo two 2 4
4 foo two 4 8
回答by YOLO
Row subsetting: Isn't this you are looking for ?
行子集:这不是你要找的吗?
df.loc[(df['A'] == 'foo') & (df['B'] == 'two')]
A B C D
2 foo two 2 4
4 foo two 4 8
You can also add .reset_index()
at the end to initialize indexes from zero.
您还可以.reset_index()
在末尾添加以从零初始化索引。
回答by Humi
Easy , if you do
很简单,如果你这样做
df[['A','B']][df['B']=='two']
you will get:
你会得到:
A B
2 foo two
4 foo two
5 bar two
To filter on both A and B:
要同时过滤 A 和 B:
df[['A','B']][(df['B']=='two') & (df['A']=='foo')]
You get:
你得到:
A B
2 foo two
4 foo two
and if you want all the columns :
如果你想要所有的列:
df[df['B']=='two']
you will get:
你会得到:
A B C D
2 foo two 2 4
4 foo two 4 8
5 bar two 5 10