Python 如何检查熊猫数据框中是否存在具有特定列值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22895405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:54:39  来源:igfitidea点击:

How to check if there exists a row with a certain column value in pandas dataframe

pythonpandasdataframe

提问by AMM

Very new to pandas.

对熊猫来说很新。

Is there a way to check given a pandas dataframe, if there exists a row with a certain column value. Say I have a column 'Name' and I need to check for a certain name if it exists.

有没有办法检查给定的熊猫数据框,如果存在具有特定列值的行。假设我有一列“名称”,我需要检查某个名称是否存在。

And once I do this, I will need to make a similar query, but with a bunch of values at a time. I read that there is 'isin', but I'm not sure how to use it. So I need to make a query such that I get all the rows which have 'Name' column matching to any of the values in a big array of names.

一旦我这样做了,我将需要做一个类似的查询,但一次有一堆值。我读到有“isin”,但我不确定如何使用它。所以我需要进行一个查询,以便我获得所有具有“名称”列与大数组名称中的任何值匹配的行。

回答by Akavall

import numpy as np
import pandas as pd
df = pd.DataFrame(data = np.arange(8).reshape(4,2), columns=['name', 'value'])

Result:

结果:

>>> df
   name  value
0     0      1
1     2      3
2     4      5
3     6      7
>>> any(df.name == 4)
True
>>> any(df.name == 5)
False

Second Part:

第二部分:

my_data = np.arange(8).reshape(4,2)
my_data[0,0] = 4

df = pd.DataFrame(data = my_data, columns=['name', 'value'])

Result:

结果:

>>> df.loc[df.name == 4]
   name  value
0     4      1
2     4      5

Update:

更新:

my_data = np.arange(8).reshape(4,2)
my_data[0,0] = 4

df = pd.DataFrame(data = my_data, index=['a', 'b', 'c', 'd'], columns=['name', 'value'])

Result:

结果:

>>> df.loc[df.name == 4]  # gives relevant rows
   name  value
a     4      1
c     4      5  
>>> df.loc[df.name == 4].index  # give "row names" of relevant rows
Index([u'a', u'c'], dtype=object)

回答by James Sapam

If you want to extract set of values given a sequence of row labels and column labels, and the lookup method allows for this and returns a numpy array.

如果您想在给定一系列行标签和列标签的情况下提取一组值,并且查找方法允许这样做并返回一个 numpy 数组。

Here is my snippet and output:

这是我的片段和输出:

>>> import pandas as pd
>>> import numpy as np
>>> df = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D'])
>>> df
           A         B         C         D
0   0.121190  0.360813  0.500082  0.817546
1   0.304313  0.773412  0.902835  0.440485
2   0.700338  0.733342  0.196394  0.364041
3   0.385534  0.078589  0.181256  0.440475
4   0.151840  0.956841  0.422713  0.018626
5   0.995875  0.110973  0.149234  0.543029
6   0.274740  0.745955  0.420808  0.020774
7   0.305654  0.580817  0.580476  0.210345
8   0.726075  0.801743  0.562489  0.367190
9   0.567987  0.591544  0.523653  0.133099
10  0.795625  0.163556  0.594703  0.208612
11  0.977728  0.751709  0.976577  0.439014
12  0.967853  0.214956  0.126942  0.293847
13  0.189418  0.019772  0.618112  0.643358
14  0.526221  0.276373  0.947315  0.792088
15  0.714835  0.782455  0.043654  0.966490
16  0.760602  0.487120  0.747248  0.982081
17  0.050449  0.666720  0.835464  0.522671
18  0.382314  0.146728  0.666722  0.573501
19  0.392152  0.195802  0.919299  0.181929

>>> df.lookup([0,2,4,6], ['B', 'C', 'A','D'])
array([ 0.36081287,  0.19639367,  0.15184046,  0.02077381])
>>> df.lookup([0,2,4,6], ['A', 'B', 'C','D'])
array([ 0.12119047,  0.73334194,  0.4227131 ,  0.02077381])
>>>