有效地检查字符串是否由 Python 中的一个字符组成

Question

提问by

What is an efficient way to check that a string sin Python consists of just one character, say 'A'? Something like all_equal(s, 'A')which would behave like this:

s比方说，检查Python中的字符串是否仅包含一个字符的有效方法是什么'A'？类似的东西all_equal(s, 'A')会像这样：

all_equal("AAAAA", "A") = True

all_equal("AAAAAAAAAAA", "A") = True

all_equal("AAAAAfAAAAA", "A") = False

Two seemingly inefficient ways would be to: first convert the string to a list and check each element, or second to use a regular expression. Are there more efficient ways or are these the best one can do in Python? Thanks.

两种看似低效的方法是：首先将字符串转换为列表并检查每个元素，或者其次使用正则表达式。有没有更有效的方法，或者这些是 Python 中最好的方法吗？谢谢。

Answer 1

采纳答案by Ellioh

This is by far the fastest, several times faster than even count(), just time it with that excellent mgilson's timing suite:

这是迄今为止最快的，甚至比快几倍count()，只需使用出色的mgilson 计时套件计时即可：

s == len(s) * s[0]

Here all the checking is done inside the Python C code which just:

这里所有的检查都是在 Python C 代码中完成的，它只是：

allocates len(s) characters;
fills the space with the first character;
compares two strings.

分配 len(s) 个字符；
用第一个字符填充空格；
比较两个字符串。

The longer the string is, the greater is time bonus. However, as mgilson writes, it creates a copy of the string, so if your string length is many millions of symbols, it may become a problem.

字符串越长，时间奖励就越大。但是，正如 mgilson 所写，它会创建字符串的副本，因此如果您的字符串长度是数百万个符号，则可能会出现问题。

As we can see from timing results, generally the fastest ways to solve the task do not execute any Python code for each symbol. However, the set()solution also does all the job inside C code of the Python library, but it is still slow, probably because of operating string through Python object interface.

正如我们从计时结果中看到的那样，通常解决任务的最快方法是不对每个符号执行任何 Python 代码。但是，该set()解决方案也在Python库的C代码内部完成了所有工作，但仍然很慢，可能是因为通过Python对象接口操作字符串。

UPD:Concerning the empty string case. What to do with it strongly depends on the task. If the task is "check if all the symbols in a string are the same", s == len(s) * s[0]is a valid answer (no symbols mean an error, and exception is ok). If the task is "check if there is exactly one unique symbol", empty string should give us False, and the answer is s and s == len(s) * s[0], or bool(s) and s == len(s) * s[0]if you prefer receiving boolean values. Finally, if we understand the task as "check if there are no different symbols", the result for empty string is True, and the answer is not s or s == len(s) * s[0].

UPD：关于空字符串的情况。如何处理它很大程度上取决于任务。如果任务是“检查字符串中的所有符号是否都相同”，s == len(s) * s[0]则是有效答案（没有符号表示错误，异常正常）。如果任务是“检查是否只有一个唯一的符号”，空字符串应该给我们 False，答案是s and s == len(s) * s[0]，或者bool(s) and s == len(s) * s[0]如果您更喜欢接收布尔值。最后，如果我们将任务理解为“检查是否有不同的符号”，则空字符串的结果为 True，答案为not s or s == len(s) * s[0]。

Answer 2

回答by Mark Byers

Try using the built-in function all:

尝试使用内置函数all：

all(c == 'A' for c in s)

Answer 3

回答by Daniel Roseman

You could convert to a set and check there is only one member:

您可以转换为一组并检查只有一个成员：

len(set("AAAAAAAA"))

Answer 4

回答by Abhijit

If you need to check if all the characters in the string are same and is equal to a given character, you need to remove all duplicates and check if the final result equals the single character.

如果您需要检查字符串中的所有字符是否相同并且等于给定字符，则需要删除所有重复项并检查最终结果是否等于单个字符。

>>> set("AAAAA") == set("A")
True

In case you desire to find if there is any duplicate, just check the length

如果您想查找是否有任何重复，只需检查长度

>>> len(set("AAAAA")) == 1
True

Answer 5

回答by mgilson

>>> s = 'AAAAAAAAAAAAAAAAAAA'
>>> s.count(s[0]) == len(s)
True

This doesn't short circuit. A version which does short-circuit would be:

这不会短路。短路的版本是：

>>> all(x == s[0] for x in s)
True

However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)

但是，我有一种感觉，由于优化的 C 实现，非短路版本可能会在某些字符串上表现更好（取决于大小等）

Here's a simple timeitscript to test some of the other options posted:

这是一个简单的timeit脚本来测试发布的其他一些选项：

import timeit
import re

def test_regex(s,regex=re.compile(r'^(.)*$')):
    return bool(regex.match(s))

def test_all(s):
    return all(x == s[0] for x in s)

def test_count(s):
    return s.count(s[0]) == len(s)

def test_set(s):
    return len(set(s)) == 1

def test_replace(s):
    return not s.replace(s[0],'')

def test_translate(s):
    return not s.translate(None,s[0])

def test_strmul(s):
    return s == s[0]*len(s)

tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')

print "WITH ALL EQUAL"
for test in tests:
    print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
    if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
        print globals()[test]("AAAAAAAAAAAAAAAAA")
        raise AssertionError

print
print "WITH FIRST NON-EQUAL"
for test in tests:
    print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
    if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
        print globals()[test]("FAAAAAAAAAAAAAAAA")
        raise AssertionError

On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.countsmokes setand all, and beats str.replaceby a little, but is edged out by str.translateand strmulis currently in the lead by a good margin:

在我的机器（OS-X 10.5.8，酷睿2，python2.7.3）与这些做作（短）的字符串，str.count吸烟set和all，和节拍str.replace受了一点，但被排挤出去str.translate，并strmul是目前在由佳缘的铅：

WITH ALL EQUAL
test_all 5.83863711357
test_count 0.947771072388
test_set 2.01028490067
test_replace 1.24682998657
test_translate 0.941282987595
test_strmul 0.629556179047
test_regex 2.52913498878

WITH FIRST NON-EQUAL
test_all 2.41147494316
test_count 0.942595005035
test_set 2.00480484962
test_replace 0.960338115692
test_translate 0.924381017685
test_strmul 0.622269153595
test_regex 1.36632800102

The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.

不同系统和不同字符串的时间可能略有不同（甚至显着？），因此值得研究您计划传递的实际字符串。

Eventually, if you hit the best case for allenough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the setsolution though as I don't see any case where it could possibly beat out the countsolution.

最终，如果您达到了all足够的最佳状态，并且您的字符串足够长，您可能需要考虑那个。这是一个更好的算法......但我会避免set解决方案，因为我没有看到任何可能击败count解决方案的情况。

If memory could be an issue, you'll need to avoid str.translate, str.replaceand strmulas those create a second string, but this isn't usually a concern these days.

如果内存可能是一个问题，你需要避免str.translate，str.replace而strmul那些创建第二个字符串，但是这通常不是一个问题，这些天。

Answer 6

回答by Master_Yoda

Interesting answers so far. Here's another:

到目前为止有趣的答案。这是另一个：

flag = True
for c in 'AAAAAAAfAAAA':
    if not c == 'A': 
        flag = False
        break

The only advantage I can think of to mine is that it doesn't need to traverse the entire string if it finds an inconsistent character.

我能想到的唯一优点是，如果发现不一致的字符，则不需要遍历整个字符串。

Answer 7

回答by Ellioh

not len("AAAAAAAAA".replace('A', ''))

Answer 8

回答by Abhijit

Adding another solution to this problem

为这个问题添加另一个解决方案

>>> not "AAAAAA".translate(None,"A")
True

有效地检查字符串是否由 Python 中的一个字符组成

提问by

采纳答案by Ellioh

回答by Mark Byers

回答by Daniel Roseman

回答by Abhijit

回答by mgilson

回答by Master_Yoda

回答by Ellioh

回答by Abhijit

相关推荐

最近更新

标签

有效地检查字符串是否由 Python 中的一个字符组成

提问by

采纳答案by Ellioh

回答by Mark Byers

回答by Daniel Roseman

回答by Abhijit

回答by mgilson

回答by Master_Yoda

回答by Ellioh

回答by Abhijit

相关推荐

Python sqlite3.ProgrammingError：除非使用可以解释 8 位字节串的 text_factory，否则不得使用 8 位字节串

使用 Python 估计自相关

Python 为什么 corrcoef 返回一个矩阵？

Python 如何使用 lxml 通过文本查找元素？

相关推荐

最近更新

标签