获取包含在 python pandas 中任何行的特定值的列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50923707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:42:57  来源:igfitidea点击:

Get column name which contains a specific value at any rows in python pandas

pythonpandasdataframeinput

提问by wh112

I want to get column name from the whole database (assume the database contains more than 100 rows with more than 50 column) based on specific value that contain in a specific column in pandas.

我想根据 Pandas 中特定列中包含的特定值从整个数据库中获取列名(假设数据库包含 100 多行和 50 多列)。

with the help of Bkmm3 (member from india) I've succeeded on numerical term but failed on alphabetic term. the way I've tried is this:

在 Bkmm3(来自印度的成员)的帮助下,我在数字方面取得了成功,但在字母方面却失败了。我试过的方法是这样的:

df = pd.DataFrame({'A':['APPLE','BALL','CAT'],
                    'B':['ACTION','BATMAN','CATCHUP'],
                    'C':['ADVERTISE','BEAST','CARTOON']})
response = input("input")
for i in df.columns: if(len(df.query(i + '==' + str(response))) > 0):
print(i)`

then output arise as error:

然后输出作为错误出现:

Traceback (most recent call last): NameError: name 'APPLE' is not defined

Any Help from You Guys will be very Appreciated, Thank You . . .

非常感谢你们的任何帮助,谢谢。. .

回答by cs95

isin/eqworks for DataFrames, and you can 100% vectorize this:

isin/eq适用于 DataFrames,您可以 100% 将其矢量化:

df.columns[df.isin(['APPLE']).any()]  # df.isin([response])

Or,

或者,

df.columns[df.eq(response).any()]

Index(['A'], dtype='object')

And here's the roundabout way with DataFrame.evaland np.logical_or(were you to loop on columns):

这是使用DataFrame.eval和的迂回方式np.logical_or(您是否在列上循环):

df.columns[
    np.logical_or.reduce(
        [df.eval(f"{repr(response)} in {i}") for i in df]
)]
Index(['A'], dtype='object')

回答by jpp

First, the reason for your error. With pd.DataFrame.query, as with regular comparisons, you need to surround strings with quotation marks. So this would work (notice the pair of "quotations):

首先,你的错误的原因。使用pd.DataFrame.query,与常规比较一样,您需要用引号将字符串括起来。所以这会起作用(注意这对"引号):

response = input("input")

for i in df.columns:
    if not df.query(i + '=="' + str(response) + '"').empty:
        print(i)

inputAPPLE
A

Next, you can extract index and/or columns via pd.DataFrame.any. coldspeed's solutionis fine here, I'm just going to show how similar syntax can be used to extract both row and column labels.

接下来,您可以通过pd.DataFrame.any. Coldspeed 的解决方案在这里很好,我将展示如何使用相似的语法来提取行和列标签。

# columns
print(df.columns[(df == response).any(1)])
Index(['A'], dtype='object')

# rows
print(df.index[(df == response).any(0)])
Int64Index([0], dtype='int64')

Notice in both cases you get as your result Indexobjects. The code differs only in the property being extracted and in the axisparameter of pd.DataFrame.any.

请注意,在这两种情况下,您都将获得作为结果Index对象。该代码仅在提取的属性和 的axis参数上有所不同pd.DataFrame.any