Python 在列表和字符串中查找匹配的单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14769162/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:23:13  来源:igfitidea点击:

Find matching words in a list and a string

pythonstringperformancelistcomparison

提问by clifgray

I am writing some code in Python and I want to check if a list of words is in a long string. I know I could iterate through it multiple times and that may be the same thing but I wanted tp see if there is a faster way to do it. What I am currently doing is this:

我正在用 Python 编写一些代码,我想检查单词列表是否在一个长字符串中。我知道我可以多次迭代它,这可能是同一件事,但我希望 tp 看看是否有更快的方法来做到这一点。我目前正在做的是:

    all_text = 'some rather long string'
    if "motorcycle" in all_text or 'bike' in all_text or 'cycle' in all_text or 'dirtbike' in all_text:
        print 'found one of em'

but what I want to do is this:

但我想做的是:

keyword_list = ['motorcycle', 'bike', 'cycle', 'dirtbike']
if item in keyword_list in all_text:
            print 'found one of em'

Is there anyway to do this efficiently? I realize I could do:

有没有办法有效地做到这一点?我意识到我可以这样做:

keyword_list = ['motorcycle', 'bike', 'cycle', 'dirtbike']
for item in keyword_list:
      if item in all_text:
            print 'found one of em'

But it seems like there would be a better way once the keyword list becomes long.

但是,一旦关键字列表变长,似乎会有更好的方法。

采纳答案by Pavel Anossov

You still have to check them all at least until one is found to be in the text, but it can be more concise:

您仍然必须至少检查所有内容,直到在文本中找到一个,但可以更简洁:

keyword_list = ['motorcycle', 'bike', 'cycle', 'dirtbike']

if any(word in all_text for word in keyword_list):
    print 'found one of em'

回答by Petar Ivanov

One way would be to build a prefix treeout of the keyword list. Then you can iterate through the long string character per character. At each iteration you try to find in the prefix tree the prefix in the big string starting at the current position. This operation takes O(log k)time, where the keyword list is of size k (assuming the prefix tree is balanced). If the long string is of length n, then the overal complexity is just O(n log k), which is much better then the naive O(n k)if k is large.

一种方法是从关键字列表中构建前缀树。然后您可以遍历每个字符的长字符串字符。在每次迭代中,您尝试在前缀树中找到从当前位置开始的大字符串中的前缀。此操作需要O(log k)时间,其中关键字列表的大小为 k(假设前缀树是平衡的)。如果长字符串的长度为 n,则总体复杂度仅为O(n log k),这比O(n k)k 大时的朴素要好得多。

回答by Rakesh

How about this.

这个怎么样。

>>> keyword_list = ['motorcycle', 'bike', 'cycle', 'dirtbike', "long"]
>>> all_text = 'some rather long string'
>>> if set(keyword_list).intersection(all_text.split()):
...     print "Found One"
Found One

回答by Luka Styles

ya need to make all_text a variable or it wont work

你需要把 all_text 变成一个变量,否则它就行不通了

keyword_list = ['motorcycle', 'bike', 'cycle', 'dirtbike']
all_text = input("what kind of bike do you like?")
for item in keyword_list:
      if item in all_text:
            print ('found one of em')

回答by Shawn

Using regular expression is probably the fast way.

使用正则表达式可能是最快的方法。

re.findall(r'motorcycle|bike|cycle|dirtbike', text)

will return all matches of selected words.

将返回所选单词的所有匹配项。