Python 检查某个值是否包含在 Pandas 的数据框列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35956712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:12:13  来源:igfitidea点击:

Check if certain value is contained in a dataframe column in pandas

pythonpandasdataframe

提问by Michael Perdue

I am trying to check if a certain value is contained in a python column. I'm using df.date.isin(['07311954']), which I do not doubt to be a good tool. The problem is that I have over 350K rows and the output won't show all of them so that I can see if the value is actually contained. Put simply, I just want to know (Y/N) whether or not a specific value is contained in a column. My code follows:

我试图检查某个值是否包含在 python 列中。我正在使用df.date.isin(['07311954']),我毫不怀疑它是一个很好的工具。问题是我有超过 350K 行,输出不会显示所有行,因此我可以查看是否实际包含该值。简而言之,我只想知道(是/否)列中是否包含特定值。我的代码如下:

import numpy as np
import pandas as pd
import glob


df = (pd.read_csv('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas214.txt',\
    sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', \
    '8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', \
    '21', '22']))

df.date.isin(['07311954'])

采纳答案by jezrael

I think you need str.contains, if you need rows where values of column datecontains string 07311954:

我认为你需要str.contains,如果你需要列的值date包含字符串的行07311954

print df[df['date'].astype(str).str.contains('07311954')]

Or if typeof datecolumn is string:

或者,如果typedate列是string

print df[df['date'].str.contains('07311954')]

If you want check last 4 digits for string1954in column date:

如果要检查string1954列中的最后 4 位数字date

print df[df['date'].astype(str).str[-4:].str.contains('1954')]

Sample:

样本:

print df['date']
0    8152007
1    9262007
2    7311954
3    2252011
4    2012011
5    2012011
6    2222011
7    2282011
Name: date, dtype: int64

print df['date'].astype(str).str[-4:].str.contains('1954')
0    False
1    False
2     True
3    False
4    False
5    False
6    False
7    False
Name: date, dtype: bool

print df[df['date'].astype(str).str[-4:].str.contains('1954')]
     cmte_id trans_typ entity_typ state  employer  occupation     date  \
2  C00119040       24K        CCM    MD       NaN         NaN  7311954   

   amount     fec_id    cand_id  
2    1000  C00140715  H2MD05155  

回答by YaOzI

You can simply use this:

你可以简单地使用这个:

'07311954' in df.date.valueswhich returns Trueor False

'07311954' in df.date.values返回TrueFalse



Here is the further explanation:

这是进一步的解释:

In pandas, using incheck directly with DataFrame and Series (e.g. val in dfor val in series) will check whether the valis contained in the Index.

inPandas 中,直接使用check 与 DataFrame 和 Series(例如val in dfval in series)将检查 是否val包含在Index 中

BUT you can still use incheck for their values too (instead of Index)! Just using val in df.col_name.valuesor val in series.values. In this way, you are actually checking the valwith a Numpy array.

但是您仍然可以使用in检查它们的值(而不是索引)!只需使用val in df.col_name.valuesval in series.values。通过这种方式,您实际上是在val使用 Numpy 数组进行检查。

And .isin(vals)is the other way around, it checks whether the DataFrame/Series values are inthe vals. Here valsmust be set or list-like. So this is not the natural way to go for the question.

并且.isin(vals)是周围的其他方法,它会检查数据帧/系列值是否vals。这里vals必须是set 或 list-like。所以这不是解决这个问题的自然方式。

回答by Deusdeorum

You can use any:

您可以使用any

print any(df.column == 07311954)
True       #true if it contains the number, false otherwise

If you rather want to see how many times '07311954'occurs in a column you can use:

如果您想查看列中出现“07311954”的次数,可以使用:

df.column[df.column == 07311954].count()