在 Pandas DataFrame 中查找字符串值的索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46453275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find the index of a string value in a pandas DataFrame
提问by Ben
How can I identify which column(s) in my DataFrame contain a specific string 'foo'
?
如何识别 DataFrame 中的哪些列包含特定字符串'foo'
?
Sample DataFrame:
示例数据帧:
>>> import pandas as pd
>>> df = pd.DataFrame({'A':[10,20,42], 'B':['foo','bar','blah'],'C':[3,4,5], 'D':['some','foo','thing']})
I want to find B
and D
here.
我想找到B
和D
这里。
I can search for numbers:
我可以搜索数字:
If I'm looking for a number (e.g. 42) instead of a string, I can generate a boolean mask like this:
如果我正在寻找一个数字(例如 42)而不是一个字符串,我可以生成一个布尔掩码,如下所示:
>>> ~(df.where(df==42)).isnull().all()
A True
B False
C False
D False
dtype: bool
but not strings:
但不是字符串:
>>> ~(df.where(df=='foo')).isnull().all()
TypeError: Could not compare ['foo'] with block values
I don't want to iterate over each column and row if possible (my actual data is much larger than this example). It feels like there should be a simple and efficient way.
如果可能的话,我不想遍历每一列和每一行(我的实际数据比这个例子大得多)。感觉应该有一个简单有效的方法。
How can I do this?
我怎样才能做到这一点?
采纳答案by Divakar
One way with underlying array data -
底层数组数据的一种方式 -
df.columns[(df.values=='foo').any(0)].tolist()
Sample run -
样品运行 -
In [209]: df
Out[209]:
A B C D
0 10 foo 3 some
1 20 bar 4 foo
2 42 blah 5 thing
In [210]: df.columns[(df.values=='foo').any(0)].tolist()
Out[210]: ['B', 'D']
If you are looking for just the column-mask -
如果您只是在寻找列掩码 -
In [205]: (df.values=='foo').any(0)
Out[205]: array([False, True, False, True], dtype=bool)
回答by YOBEN_S
Option 1 df.values
选项1 df.values
~(df.where(df.values=='foo')).isnull().all()
Out[91]:
A False
B True
C False
D True
dtype: bool
Option 2 isin
选项 2 isin
~(df.where(df.isin(['foo']))).isnull().all()
Out[94]:
A False
B True
C False
D True
dtype: bool
回答by rko
Unfortunately, it won't index a str through the syntax you gave. It has to be run as a series of type string to compare it with string, unless I am missing something.
不幸的是,它不会通过您提供的语法索引 str 。它必须作为一系列类型的字符串运行才能与字符串进行比较,除非我遗漏了一些东西。
try this
尝试这个
~df101.where(df101.isin(['foo'])).isnull().all()
A False
B True
C False
D True
dtype: bool