Python 检查某个值是否包含在 Pandas 的数据框列中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35956712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Check if certain value is contained in a dataframe column in pandas
提问by Michael Perdue
I am trying to check if a certain value is contained in a python column. I'm using df.date.isin(['07311954'])
, which I do not doubt to be a good tool. The problem is that I have over 350K rows and the output won't show
all of them so that I can see if the value is actually contained. Put simply, I just want to know (Y/N) whether or not a specific value is contained in a column. My code follows:
我试图检查某个值是否包含在 python 列中。我正在使用df.date.isin(['07311954'])
,我毫不怀疑它是一个很好的工具。问题是我有超过 350K 行,输出不会显示所有行,因此我可以查看是否实际包含该值。简而言之,我只想知道(是/否)列中是否包含特定值。我的代码如下:
import numpy as np
import pandas as pd
import glob
df = (pd.read_csv('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas214.txt',\
sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', \
'8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', \
'21', '22']))
df.date.isin(['07311954'])
采纳答案by jezrael
I think you need str.contains
, if you need rows where values of column date
contains string 07311954
:
我认为你需要str.contains
,如果你需要列的值date
包含字符串的行07311954
:
print df[df['date'].astype(str).str.contains('07311954')]
Or if type
of date
column is string
:
或者,如果type
的date
列是string
:
print df[df['date'].str.contains('07311954')]
If you want check last 4 digits for string
1954
in column date
:
如果要检查string
1954
列中的最后 4 位数字date
:
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
Sample:
样本:
print df['date']
0 8152007
1 9262007
2 7311954
3 2252011
4 2012011
5 2012011
6 2222011
7 2282011
Name: date, dtype: int64
print df['date'].astype(str).str[-4:].str.contains('1954')
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
Name: date, dtype: bool
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
cmte_id trans_typ entity_typ state employer occupation date \
2 C00119040 24K CCM MD NaN NaN 7311954
amount fec_id cand_id
2 1000 C00140715 H2MD05155
回答by YaOzI
You can simply use this:
你可以简单地使用这个:
'07311954' in df.date.values
which returns True
or False
'07311954' in df.date.values
返回True
或False
Here is the further explanation:
这是进一步的解释:
In pandas, using in
check directly with DataFrame and Series (e.g. val in df
or val in series
) will check whether the val
is contained in the Index.
在in
Pandas 中,直接使用check 与 DataFrame 和 Series(例如val in df
或val in series
)将检查 是否val
包含在Index 中。
BUT you can still use in
check for their values too (instead of Index)! Just using val in df.col_name.values
or val in series.values
. In this way, you are actually checking the val
with a Numpy array.
但是您仍然可以使用in
检查它们的值(而不是索引)!只需使用val in df.col_name.values
或val in series.values
。通过这种方式,您实际上是在val
使用 Numpy 数组进行检查。
And .isin(vals)
is the other way around, it checks whether the DataFrame/Series values are inthe vals
. Here vals
must be set or list-like. So this is not the natural way to go for the question.
并且.isin(vals)
是周围的其他方法,它会检查数据帧/系列值是否是在vals
。这里vals
必须是set 或 list-like。所以这不是解决这个问题的自然方式。
回答by Deusdeorum
You can use any
:
您可以使用any
:
print any(df.column == 07311954)
True #true if it contains the number, false otherwise
If you rather want to see how many times '07311954'occurs in a column you can use:
如果您想查看列中出现“07311954”的次数,可以使用:
df.column[df.column == 07311954].count()