pandas 在python pandas中搜索整行数据帧的多个字符串值

Question

提问by BrianBeing

In a pandas dataframe, I want to search row by row for multiple string values. If the row contains a string value then the function will add/print for that row, into an empty column at the end of the df 1 or 0 based upon
There have been multiple tutorials on how to select rows of a Pandas DataFrame that match a (partial) string.

在Pandas数据框中，我想逐行搜索多个字符串值。如果该行包含一个字符串值，则该功能将根据在DF 1或0的末尾添加/打印该行，成为一个空列
已经有关于如何选择一个Pandas数据帧的行多个教程比赛一（部分）字符串。

For Example:

例如：

import pandas as pd

#create sample data
data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
        'launched': [1983,1984,1984,1984],
        'discontinued': [1986, 1985, 1984, 1986]}

df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])
df

I'm pulling the above example from this website: https://davidhamann.de/2017/06/26/pandas-select-elements-by-string/

我从这个网站上拉出上面的例子：https: //davidhamann.de/2017/06/26/pandas-select-elements-by-string/

How would I do a multi-value search of the entire row for: 'int', 'tos', '198'?

我将如何对整行进行多值搜索：'int'、'tos'、'198'？

Then print into a column next discontinued, a column int that would have 1 or 0 based upon whether the row contained that keyword.

然后打印到下一个停止的列中，列 int 根据该行是否包含该关键字而具有 1 或 0。

Answer 1

采纳答案by mrGreenBrown

So the simplest method without using fancy pandas staff would be to use two for loops. I would like if someone could give a better solution, but my approach would be this:

因此，不使用花哨的Pandas工作人员的最简单方法是使用两个 for 循环。我想如果有人可以提供更好的解决方案，但我的方法是这样的：

def check_all_for(column_name, search_terms):
    df[column_name] = ''
    for row in df.iterrows():
        flag = 0
        for element in row:
            for search_term in search_terms:
                if search_term in (str(element)).lower():
                    flag = 1
        row[column_name] = flag

Assumption is that you have dataframedefined as dfand you want to flag the new column with 1 and 0

假设您已dataframe定义为df并且您想用 1 和 0 标记新列

Answer 2

回答by rafaelc

If you have

如果你有

l=['int', 'tos', '198']

Then you use str.containsby joining with '|'to get every model that contains any of these words

然后你使用str.containsby join with'|'来获取包含这些单词中的任何一个的每个模型

df.model.str.contains('|'.join(l))

0    False
1    False
2     True
3     True

Edit

编辑

If the intention is to check all columns as @jpp interpreted, I'd suggest:

如果打算按照@jpp 的解释检查所有列，我建议：

from functools import reduce
res = reduce(lambda a,b: a | b, [df[col].astype(str).str.contains(m) for col in df.columns])

0    False
1     True
2     True
3     True

If you want it as a column with integer values, just do

如果您希望将其作为具有整数值的列，请执行

df['new_col'] = res.astype(int)

     new_col
0    0
1    1
2    1
3    1

Answer 3

回答by jpp

If I understand correctly, you wish to check the existence of strings across all columns in each row. This is not straightforward given you have mixed types (integers, strings). One way is to use pd.DataFrame.applywith a custom function.

如果我理解正确，您希望检查每行中所有列中字符串的存在。鉴于您有混合类型（整数、字符串），这并不简单。一种方法是使用pd.DataFrame.apply自定义函数。

The main point we need to remember is to convert your entire dataframe to type str, since you cannot test the existence of substrings within an integer.

我们需要记住的要点是将整个数据帧转换为 type str，因为您无法测试整数中子字符串的存在。

match = ['int', 'tos', '1985']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

df['isContained'] = df.astype(str).apply(string_finder, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0            Lisa      1983          1986        False
1          Lisa 2      1984          1985         True
2  Macintosh 128K      1984          1984         True
3  Macintosh 512K      1984          1986         True

Answer 4

回答by Feras

@Guy_Fuqua, my understanding that you want to assure that all words included in one row, am I right?

@Guy_Fuqua，我的理解是您想确保所有单词都包含在一行中，对吗？

if so, then a little modification for jpp answer shall help you to achieve this,kindly note the AssessAllString function here

如果是这样，那么对 jpp answer 稍作修改将帮助您实现这一点，请注意这里的 AssessAllString 函数

match = ['int', 'tos', '1984']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

def AssessAllString (row,words):
    b=True
    for x in words:
      b = b&string_finder(row,[x])
    return b

df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          False      
2  Macintosh 128K  1984      1984          True       
3  Macintosh 512K  1984      1986          True

Another Example for :

另一个例子：

match = ['isa','1984']
df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          True       
2  Macintosh 128K  1984      1984          False      
3  Macintosh 512K  1984      1986          False

I believe code still need optimization, but so far shall fit the purpose

我相信代码仍然需要优化，但到目前为止应该符合目的

Answer 5

回答by harvpan

You need to check if modelis a substring of matchor not.

您需要检查是否model是的子串match。

match = [ 'int', 'tos', '198']
df['isContained'] = df['model'].apply(lambda x: 1 if any(s in x for s in match) else 0)

Output:

输出：

            model  launched  discontinued  isContained
0            Lisa      1983          1986            0
1          Lisa 2      1984          1985            0
2  Macintosh 128K      1984          1984            1
3  Macintosh 512K      1984          1986            1

pandas 在python pandas中搜索整行数据帧的多个字符串值

提问by BrianBeing

采纳答案by mrGreenBrown

回答by rafaelc

Edit

编辑

回答by jpp

回答by Feras

回答by harvpan

相关推荐

最近更新

标签

pandas 在python pandas中搜索整行数据帧的多个字符串值

提问by BrianBeing

采纳答案by mrGreenBrown

回答by rafaelc

Edit

编辑

回答by jpp

回答by Feras

回答by harvpan

相关推荐

pandas 移动数据框列并更改列顺序

Pandas：按满足条件的列分组

pandas 将年份和月份名称转换为熊猫数据框的日期时间列

在 pandas/python 的同一数据框中将两列合并为一列

相关推荐

最近更新

标签