Python Fastest way to check if a string contains specific characters in any of the items in a list

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14411633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:21:26  来源:igfitidea点击:

Fastest way to check if a string contains specific characters in any of the items in a list

pythonperformancelistiteration

提问by Muhammad Taha

What is the fastest way to check if a string contains some characters from any items of a list?

What is the fastest way to check if a string contains some characters from any items of a list?

Currently, I'm using this method:

Currently, I'm using this method:

lestring = "Text123"

lelist = ["Text", "foo", "bar"]

for x in lelist:
    if lestring.count(x):
        print 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)

Is there any way to do it without iteration (which will make it faster I suppose.)?

Is there any way to do it without iteration (which will make it faster I suppose.)?

采纳答案by Abhijit

You can try list comprehension with membership check

You can try list comprehension with membership check

>>> lestring = "Text123"
>>> lelist = ["Text", "foo", "bar"]
>>> [e for e in lelist if e in lestring]
['Text']

Compared to your implementation, though LC has an implicit loop but its faster as there is no explicit function call as in your case with count

Compared to your implementation, though LC has an implicit loop but its faster as there is no explicit function call as in your case with count

Compared to Joe's implementation, yours is way faster, as the filter function would require to call two functions in a loop, lambdaand count

Compared to Joe's implementation, yours is way faster, as the filter function would require to call two functions in a loop, lambdaand count

>>> def joe(lelist, lestring):
    return ''.join(random.sample(x + 'b'*len(x), len(x)))

>>> def uz(lelist, lestring):
    for x in lelist:
        if lestring.count(x):
            return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)


>>> def ab(lelist, lestring):
    return [e for e in lelist if e in lestring]

>>> t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
>>> t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
>>> t_joe = timeit.Timer("joe(lelist, lestring)", setup="from __main__ import lelist, lestring, joe")
>>> t_ab.timeit(100000)
0.09391469893125759
>>> t_uz.timeit(100000)
0.1528471407273173
>>> t_joe.timeit(100000)
1.4272649857800843


Jamie's commented solution is slower for shorter string's. Here is the test result

Jamie's commented solution is slower for shorter string's. Here is the test result

>>> def jamie(lelist, lestring):
    return next(itertools.chain((e for e in lelist if e in lestring), (None,))) is not None

>>> t_jamie = timeit.Timer("jamie(lelist, lestring)", setup="from __main__ import lelist, lestring, jamie")
>>> t_jamie.timeit(100000)
0.22237164127909637


If you need Boolean values, for shorter strings, just modify the above LC expression

If you need Boolean values, for shorter strings, just modify the above LC expression

[e in lestring for e in lelist if e in lestring]

Or for longer strings, you can do the following

Or for longer strings, you can do the following

>>> next(e in lestring for e in lelist if e in lestring)
True

or

or

>>> any(e in lestring for e in lelist)

回答by Joe

filter(lambda x: lestring.count(x), lelist)

That will return all the strings that you're trying to find as a list.

That will return all the strings that you're trying to find as a list.

回答by theodox

if the test is to see if there are any characters in common (not words or segments), create a set out of the letters in the list and then check the letters agains the string:

if the test is to see if there are any characters in common (not words or segments), create a set out of the letters in the list and then check the letters agains the string:

char_list = set(''.join(list_of_words))
test_set = set(string_to_teat)
common_chars = char_list.intersection(test_set)

However I'm assuming you're looking for as little as one character in common...

However I'm assuming you're looking for as little as one character in common...

回答by moomima

The esmrelibrary does the trick. In your case, the simpler, esm(part of esmre) is what you want.

The esmrelibrary does the trick. In your case, the simpler, esm(part of esmre) is what you want.

https://pypi.python.org/pypi/esmre/

https://pypi.python.org/pypi/esmre/

https://code.google.com/p/esmre/

https://code.google.com/p/esmre/

They have good documentation and examples: Taken from their examples:

They have good documentation and examples: Taken from their examples:

>>> import esm
>>> index = esm.Index()
>>> index.enter("he")
>>> index.enter("she")
>>> index.enter("his")
>>> index.enter("hers")
>>> index.fix()
>>> index.query("this here is history")
[((1, 4), 'his'), ((5, 7), 'he'), ((13, 16), 'his')]
>>> index.query("Those are his sheep!")
[((10, 13), 'his'), ((14, 17), 'she'), ((15, 17), 'he')]
>>> 

I ran some performance tests:

I ran some performance tests:

import random, timeit, string, esm

def uz(lelist, lestring):
    for x in lelist:
        if lestring.count(x):
            return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)



def ab(lelist, lestring):
    return [e for e in lelist if e in lestring]


def use_esm(index, lestring):
    return index.query(lestring)

for TEXT_LEN in [5, 50, 1000]:
    for SEARCH_LEN in [5, 20]:
        for N in [5, 50, 1000, 10000]:
            if TEXT_LEN < SEARCH_LEN:
                continue

            print 'TEXT_LEN:', TEXT_LEN, 'SEARCH_LEN:', SEARCH_LEN, 'N:', N

            lestring = ''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(TEXT_LEN)))
            lelist = [''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(SEARCH_LEN))) for _
                      in range(N)]

            index = esm.Index()
            for i in lelist:
                index.enter(i)
            index.fix()

            t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
            t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
            t_esm = timeit.Timer("use_esm(index, lestring)", setup="from __main__ import index, lestring, use_esm")

            ab_time = t_ab.timeit(1000)
            uz_time = t_uz.timeit(1000)
            esm_time = t_esm.timeit(1000)

            min_time = min(ab_time, uz_time, esm_time)
            print '  ab%s: %f' % ('*' if ab_time == min_time else '', ab_time)
            print '  uz%s: %f' % ('*' if uz_time == min_time else '', uz_time)
            print '  esm%s %f:' % ('*' if esm_time == min_time else '', esm_time)

And got that results depends mostly on the number of items that one is looking for (in my case, 'N'):

And got that results depends mostly on the number of items that one is looking for (in my case, 'N'):

TEXT_LEN: 1000 SEARCH_LEN: 20 N: 5
  ab*: 0.001733
  uz: 0.002512
  esm 0.126853:

TEXT_LEN: 1000 SEARCH_LEN: 20 N: 50
  ab*: 0.017564
  uz: 0.023701
  esm 0.079925:

TEXT_LEN: 1000 SEARCH_LEN: 20 N: 1000
  ab: 0.370371
  uz: 0.489523
  esm* 0.133783:

TEXT_LEN: 1000 SEARCH_LEN: 20 N: 10000
  ab: 3.678790
  uz: 4.883575
  esm* 0.259605: