Python 计算连续字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34443946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:58:57  来源:igfitidea点击:

Count consecutive characters

pythonstringcount

提问by vashts85

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

我如何计算 Python 中的连续字符以查看每个唯一数字在下一个唯一数字之前重复的次数?

At first, I thought I could do something like:

起初,我以为我可以这样做:

word = '1000'

counter=0
print range(len(word))


for i in range(len(word)-1):
    while word[i]==word[i+1]:
        counter +=1
        print counter*"0"
    else:
        counter=1
        print counter*"1"

So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when ireaches the last value.

这样我就可以通过这种方式看到每个唯一数字重复的次数。但是,当然,当i达到最后一个值时,这超出了范围。

In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.

在上面的示例中,我希望 Python 告诉我 1 重复 1,而 0 重复 3 次。但是,由于我的 while 语句,上面的代码失败了。

I know you can do this with just built-in functions and would prefer a solution that way.

我知道你可以只用内置函数来做到这一点,并且更喜欢这样的解决方案。

采纳答案by B. M.

A solution "that way", with only basic statements:

一个“那样”的解决方案,只有基本的陈述:

word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
    for i in range(1,len(word)):
       if word[i-1]==word[i]:
          count+=1
       else :
           length += word[i-1]+" repeats "+str(count)+", "
           count=1
    length += ("and "+word[i]+" repeats "+str(count))
else:
    i=0
    length += ("and "+word[i]+" repeats "+str(count))
print (length)

Output :

输出 :

'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'

回答by code_dredd

Totals (without sub-groupings)

总计(无子分组)

#!/usr/bin/python3 -B

charseq = 'abbcccdddd'
distros = { c:1 for c in charseq  }

for c in range(len(charseq)-1):
    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1

print(distros)

I'll provide a brief explanation for the interesting lines.

我将对有趣的线条进行简要说明。

distros = { c:1 for c in charseq  }

The line above is a dictionary comprehension, and it basically iterates over the characters in charseqand creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

上面这行是一个字典推导式,它基本上迭代了中的字符charseq并为字典创建了一个键/值对,其中键是字符,值是到目前为止遇到的次数。

Then comes the loop:

然后是循环:

for c in range(len(charseq)-1):

We go from 0to length - 1to avoid going out of bounds with the c+1indexing in the loop's body.

我们从0length - 1避免超出c+1循环体中的索引范围。

if charseq[c] == charseq[c+1]:
    distros[charseq[c]] += 1

At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):

此时,我们知道的每个匹配项都是连续的,因此我们只需在字符键上加 1。例如,如果我们拍摄一次迭代的快照,代码可能如下所示(出于说明目的,使用直接值而不是变量):

# replacing vars for their values
if charseq[1] == charseq[1+1]:
    distros[charseq[1]] += 1

# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
    distros['b'] += 1

You can see the program output below with the correct counts:

您可以在下面看到带有正确计数的程序输出:

?  /tmp  ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}

回答by Reut Sharabani

You only need to change len(word)to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:

您只需更改len(word)len(word) - 1. 也就是说,您还可以使用False的值为 0 且True的值为 1的事实sum

sum(word[i] == word[i+1] for i in range(len(word)-1))

This produces the sum of (False, True, True, False)where Falseis 0 and Trueis 1 - which is what you're after.

这产生(False, True, True, False)where Falseis 0 和Trueis 1的总和- 这就是你所追求的。

If you want this to be safe you need to guard empty words (index -1 access):

如果您希望这是安全的,您需要保护空词(索引 -1 访问):

sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))

And this can be improved with zip:

这可以通过以下方式改进zip

sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))

回答by Adam Smith

Consecutive counts:

连续计数:

Ooh nobody's posted itertools.groupbyyet!

哦,还没有人发帖itertools.groupby

s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

After which, resultlooks like:

之后,result看起来像:

[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]

And you could format with something like:

您可以使用以下格式进行格式化:

", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"

Total counts:

总数:

Someone in the comments is concerned that you want a totalcount of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

有人在评论关注的是,你想要一个数字,所以计数"11100111" -> {"1":6, "0":2}。在这种情况下,您想使用一个collections.Counter

from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}

Your method:

你的方法:

As many have pointed out, your method fails because you're looping through range(len(s))but addressing s[i+1]. This leads to an off-by-one error when iis pointing at the last index of s, so i+1raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.

正如许多人指出的那样,您的方法失败了,因为您正在循环range(len(s))但寻址s[i+1]. 当i指向 的最后一个索引时s,这会导致逐一错误,因此i+1引发IndexError. 解决此问题的一种方法是循环遍历range(len(s)-1),但生成一些要迭代的内容更像是 Pythonic。

For string that's not absolutely huge, zip(s, s[1:])isn't a a performance issue, so you could do:

对于不是绝对巨大的字符串,不是zip(s, s[1:])性能问题,因此您可以执行以下操作:

counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest

唯一的问题是,如果最后一个字符是唯一的,则必须对其进行特殊处理。这可以解决itertools.zip_longest

import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

If you do have a truly hugestring and can't stand to hold two of them in memory at a time, you can use the itertoolsrecipe pairwise.

如果您确实有一个非常大的字符串并且不能忍受一次在内存中保存两个字符串,则可以使用该itertools配方pairwise

def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...

回答by Bakijan Rahman

This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:

这是我在 python 3 中查找二进制字符串中最大连续 1 数的简单代码:

count= 0
maxcount = 0
for i in str(bin(13)):
    if i == '1':
        count +=1
    elif count > maxcount:
        maxcount = count;
        count = 0
    else:
        count = 0
if count > maxcount: maxcount = count        
maxcount

回答by NIKHIL SHARMA

Unique method:- In case if you are just looking for counting consecutive 1's Using Bit Magic: The idea is based on the concept that if we AND a bit sequence with a shifted version of itself, we're effectively removing the trailing 1 from every sequence of consecutive 1s.

独特的方法:- 如果您只是想计算连续的 1 使用 Bit Magic:这个想法基于这样一个概念,即如果我们将一个位序列与它自身的移位版本相加,我们就有效地从每个连续 1 的序列。

  11101111   (x)
& 11011110   (x << 1)
----------
  11001110   (x & (x << 1)) 
    ^    ^
    |    |

trailing 1 removed So the operation x = (x & (x << 1)) reduces length of every sequence of 1s by one in binary representation of x. If we keep doing this operation in a loop, we end up with x = 0. The number of iterations required to reach 0 is actually length of the longest consecutive sequence of 1s.

删除尾随 1 因此,操作 x = (x & (x << 1)) 在 x 的二进制表示中将每个 1 序列的长度减一。如果我们在循环中继续执行此操作,我们最终会得到 x = 0。达到 0 所需的迭代次数实际上是最长的连续 1 序列的长度。

回答by Alpha

If we want to count consecutivecharacters without looping, we can make use of pandas:

如果我们想在不循环的情况下计算连续字符,我们可以使用:pandas

In [1]: import pandas as pd

In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))

In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]

The key is to find the first elementsthat are different from their previous values and then make proper groupings in pandas:

关键是找到之前值不同的第一个元素,然后在 中进行适当的分组pandas

In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))

In [7]: d.ne(d.shift())
Out[7]:
0     True
1     True
2    False
3     True
dtype: bool

In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0    1
1    2
2    2
3    3
dtype: int32

回答by ShpielMeister

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

无需计数或分组。只需注意发生变化的索引并减去连续的索引。

w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]

print(dw)  # digits
['1', '0', '2', '3', '4']
print(cw)  # counts
[3, 3, 3, 2, 2, 5]

w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw)  # characters
print(cw)  # digits

['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

回答by ARUN BALAJI

This is my simple and efficient code for finding maximum number of consecutive binary 1's in python:

这是我在 python 中查找最大连续二进制 1 数的简单而有效的代码:

def consec(x):
    count=0
    while x!=0:
        x= x & (x<<1)
        count+=1
    return count

n = int(input())
print(consec(n))

Using Bit Magic: The idea is based on the concept that if we AND a bit sequence with a shifted version of itself, we're effectively removing the trailing 1 from every sequence of consecutive 1s.

使用位魔术:这个想法基于这样一个概念:如果我们将一个位序列与它自身的移位版本进行 AND 运算,我们就有效地从每个连续 1 序列中删除了尾随的 1。

  11101111   (x)
& 11011110   (x << 1)
----------
  11001110   (x & (x << 1)) 
    ^    ^
    |    |

trailing 1 removed

尾随 1 已删除

So the operation x = (x & (x << 1)) reduces length of every sequence of 1s by one in binary representation of x. If we keep doing this operation in a loop, we end up with x = 0. The number of iterations required to reach 0 is actually length of the longest consecutive sequence of 1s.

因此,操作 x = (x & (x << 1)) 在 x 的二进制表示中将每个 1 序列的长度减一。如果我们在循环中继续执行此操作,我们最终会得到 x = 0。达到 0 所需的迭代次数实际上是最长的连续 1 序列的长度。

**

**