python 在可迭代的东西中计算匹配元素的最pythonic方法

Question

提问by Henrik Gustafsson

I have an iterable of entries on which I would like to gather some simple statistics, say the count of all numbers divisible by two and the count of all numbers divisible by three.

我有一个可迭代的条目，我想收集一些简单的统计数据，比如所有可被 2 整除的数字的计数和可被 3 整除的所有数字的计数。

My first alternative, While only iterating through the list once and avoiding the list expansion (and keeping the split looprefactoring in mind), looks rather bloated:

我的第一个选择，虽然只遍历列表一次并避免列表扩展（并记住拆分循环重构），但看起来相当臃肿：

(alt 1)

（替代 1）

r = xrange(1, 10)

twos = 0
threes = 0

for v in r:
  if v % 2 == 0:
    twos+=1
  if v % 3 == 0:
    threes+=1

print twos
print threes

This looks rather nice, but has the drawback of expanding the expression to a list:

这看起来相当不错，但缺点是将表达式扩展为列表：

(alt 2)

（替代 2）

r = xrange(1, 10)

print len([1 for v in r if v % 2 == 0])
print len([1 for v in r if v % 3 == 0])

What I would really like is something like a function like this:

我真正想要的是这样的函数：

(alt 3)

（替代 3）

def count(iterable):
  n = 0
  for i in iterable:
    n += 1
  return n

r = xrange(1, 10)

print count(1 for v in r if v % 2 == 0)
print count(1 for v in r if v % 3 == 0)

But this looks a lot like something that could be done without a function. The final variant is this:

但这看起来很像可以在没有函数的情况下完成的事情。最后的变体是这样的：

(alt 4)

（替代 4）

r = xrange(1, 10)

print sum(1 for v in r if v % 2 == 0)
print sum(1 for v in r if v % 3 == 0)

and while the smallest (and in my book probably the most elegant) it doesn't feel like it expresses the intent very well.

虽然最小（在我的书中可能是最优雅的），但它并不能很好地表达意图。

So, my question to you is:

所以，我对你的问题是：

Which alternative do you like best to gather these types of stats? Feel free to supply your own alternative if you have something better.

您最喜欢哪种替代方法来收集这些类型的统计数据？如果您有更好的选择，请随意提供您自己的替代方案。

To clear up some confusion below:

为了澄清下面的一些困惑：

In reality my filter predicates are more complex than just this simple test.
The objects I iterate over are larger and more complex than just numbers
My filter functions are more different and hard to parameterize into one predicate

实际上，我的过滤谓词比这个简单的测试更复杂。
我迭代的对象比数字更大更复杂
我的过滤器函数更加不同并且难以参数化为一个谓词

Answer 1

采纳答案by Anders Waldenborg

Having to iterate over the list multiple times isn't elegant IMHO.

必须多次迭代列表并不优雅，恕我直言。

I'd probably create a function that allows doing:

我可能会创建一个允许执行以下操作的函数：

twos, threes = countmatching(xrange(1,10),
                             lambda a: a % 2 == 0,
                             lambda a: a % 3 == 0)

A starting point would be something like this:

一个起点是这样的：

def countmatching(iterable, *predicates):
    v = [0] * len(predicates)
    for e in iterable:
        for i,p in enumerate(predicates):
            if p(e):
                v[i] += 1
    return tuple(v)

Btw, "itertools recipes" has a recipe for doing much like your alt4.

顺便说一句，“itertools recipes”有一个很像你的alt4的方法。

def quantify(seq, pred=None):
    "Count how many times the predicate is true in the sequence"
    return sum(imap(pred, seq))

Answer 2

回答by Anders Waldenborg

Alt 4! But maybe you should refactor the code to a function that takes an argument which should contain the divisible number (two and three). And then you could have a better functionname.

替代 4！但也许您应该将代码重构为一个函数，该函数接受一个应包含可整除数（二和三）的参数。然后你可以有一个更好的函数名。

def methodName(divNumber, r):
  return sum(1 for v in r if v % divNumber == 0)


print methodName(2, xrange(1, 10))
print methodName(3, xrange(1, 10))

Answer 3

回答by Dave Webb

You could use the filterfunction.

您可以使用该filter功能。

It filters a list (or strictly an iterable) producing a new list containing only the items for which the specified function evaluates to true.

它过滤一个列表（或严格来说是一个可迭代的），生成一个新列表，该列表只包含指定函数评估为真的项目。

r = xrange(1, 10)

def is_div_two(n):
    return n % 2 == 0

def is_div_three(n):
    return n % 3 == 0

print len(filter(is_div_two,r))
print len(filter(is_div_three,r))

This is good as it allows you keep your statistics logic contained in a function and the intent of the filtershould be pretty clear.

这很好，因为它允许您将统计逻辑包含在函数中，并且的意图filter应该非常清楚。

Answer 4

回答by Sébastien RoccaSerra

I would choose a small variant of your (alt 4):

我会选择你的一个小变种（alt 4）：

def count(predicate, list):
    print sum(1 for x in list if predicate(x))

r = xrange(1, 10)

count(lambda x: x % 2 == 0, r)
count(lambda x: x % 3 == 0, r)
# ...

If you want to change what count does, change its implementation in one place.

如果您想更改 count 的作用，请在一处更改其实现。

Note: since your predicates are complex, you'll probably want to define them in functions instead of lambdas. And so you'll probably want to put all this in a class rather than the global namespace.

注意：由于您的谓词很复杂，您可能希望在函数而不是 lambda 中定义它们。因此，您可能希望将所有这些放在一个类中而不是全局命名空间中。

Answer 5

回答by Alex Coventry

True booleans are coerced to unit integers, and false booleans to zero integers. So if you're happy to use scipy or numpy, make an array of integers for each element of your sequence, each array containing one element for each of your tests, and sum over the arrays. E.g.

真布尔值被强制为单位整数，而假布尔值被强制为零整数。因此，如果您乐于使用 scipy 或 numpy，请为序列的每个元素创建一个整数数组，每个数组包含每个测试的一个元素，并对数组求和。例如

>>> sum(scipy.array([c % 2 == 0, c % 3 == 0]) for c in xrange(10))
array([5, 4])

Answer 6

回答by John Montgomery

Well you could do one list comprehension/expression to get a set of tuples with that stat test in them and then reduce that down to get the sums.

那么你可以做一个列表理解/表达式来获得一组带有该统计测试的元组，然后将其减少以获得总和。


r=xrange(10)
s=( (v % 2 == 0, v % 3 == 0) for v in r )
def add_tuples(t1,t2):
     return tuple(x+y for x,y in zip(t1, t2))
sums=reduce(add_tuples, s, (0,0)) # (0,0) is starting amount

print sums[0] # sum of numbers divisible by 2
print sums[1] # sum of numbers divisible by 3

Using generator expression etc should mean you'll only run through the iterator once (unless reduce does anything odd?). Basically you'd be doing map/reduce...

使用生成器表达式等应该意味着你只会运行一次迭代器（除非 reduce 有什么奇怪的？）。基本上你会做映射/减少......

Answer 7

回答by Thomas Wouters

Not as terse as you are looking for, but more efficient, it actually works with any iterable, not just iterables you can loop over multiple times, and you can expand the things to check for without complicating it further:

不像您正在寻找的那么简洁，但更有效，它实际上适用于任何可迭代对象，而不仅仅是您可以循环多次的可迭代对象，并且您可以扩展要检查的内容而不会进一步复杂化：

r = xrange(1, 10)

counts = {
   2: 0,
   3: 0,
}

for v in r:
    for q in counts:
        if not v % q:
            counts[q] += 1
        # Or, more obscure:
        #counts[q] += not v % q

for q in counts:
    print "%s's: %s" % (q, counts[q])

Answer 8

回答by Simon

I would definitely be looking at a numpyarray instead of an iterable list if you just have numbers. You will almost certainly be able to do what you want with some terse arithmetic on the array.

如果您只有数字，我肯定会查看numpy数组而不是可迭代列表。您几乎可以肯定能够通过对数组进行一些简洁的算术来做您想做的事情。

Answer 9

回答by ironfroggy

from itertools import groupby
from collections import defaultdict

def multiples(v):
    return 2 if v%2==0 else 3 if v%3==0 else None
d = defaultdict(list)

for k, values in groupby(range(10), multiples):
    if k is not None:
        d[k].extend(values)

Answer 10

回答by seuvitor

The idea here is to use reduction to avoid repeated iterations. Also, this does not create any extra data structures, if memory is an issue for you. You start with a dictionary with your counters ({'div2': 0, 'div3': 0}) and increment them along the iteration.

这里的想法是使用减少来避免重复迭代。此外，如果内存对您来说是个问题，这不会创建任何额外的数据结构。您从一个带有计数器 ( {'div2': 0, 'div3': 0})的字典开始，并在迭代过程中增加它们。

def increment_stats(stats, n):
    if n % 2 == 0: stats['div2'] += 1
    if n % 3 == 0: stats['div3'] += 1
    return stats

r = xrange(1, 10)
stats = reduce(increment_stats, r, {'div2': 0, 'div3': 0})
print stats

If you want to count anything more complicated than divisors, it would be appropriate to use a more object-oriented approach (with the same advantages), encapsulating the logic for stats extraction.

如果你想计算比除数更复杂的东西，最好使用更面向对象的方法（具有相同的优点），封装统计提取的逻辑。

class Stats:

    def __init__(self, div2=0, div3=0):
        self.div2 = div2
        self.div3 = div3

    def increment(self, n):
        if n % 2 == 0: self.div2 += 1
        if n % 3 == 0: self.div3 += 1
        return self

    def __repr__(self):
        return 'Stats(%d, %d)' % (self.div2, self.div3)

r = xrange(1, 10)
stats = reduce(lambda stats, n: stats.increment(n), r, Stats())
print stats

Please point out any mistakes.

请指出任何错误。

@Henrik: I think the first approach is less maintainable since you have to control initialization of the dictionary in one place and update in another, as well as having to use strings to refer to each stat (instead of having attributes). And I do not think OO is overkill in this case, for you said the predicates and objects will be complex in your application. In fact if the predicates were really simple, I wouldn't even bother to use a dictionary, a single fixed size list would be just fine. Cheers :)

@Henrik：我认为第一种方法不太容易维护，因为您必须在一个地方控制字典的初始化并在另一个地方更新，并且必须使用字符串来引用每个统计信息（而不是具有属性）。在这种情况下，我不认为 OO 是矫枉过正，因为您说过谓词和对象在您的应用程序中会很复杂。事实上，如果谓词真的很简单，我什至不会费心使用字典，一个固定大小的列表就可以了。干杯:)

python 在可迭代的东西中计算匹配元素的最pythonic方法

提问by Henrik Gustafsson

采纳答案by Anders Waldenborg

回答by Anders Waldenborg

回答by Dave Webb

回答by Sébastien RoccaSerra

回答by Alex Coventry

回答by John Montgomery

回答by Thomas Wouters

回答by Simon

回答by ironfroggy

回答by seuvitor

相关推荐

最近更新

标签

python 在可迭代的东西中计算匹配元素的最pythonic方法

提问by Henrik Gustafsson

采纳答案by Anders Waldenborg

回答by Anders Waldenborg

回答by Dave Webb

回答by Sébastien RoccaSerra

回答by Alex Coventry

回答by John Montgomery

回答by Thomas Wouters

回答by Simon

回答by ironfroggy

回答by seuvitor

相关推荐

在没有制表符、缩进和空格标准的情况下，有没有办法在 Python 中编码？

python Django -vs- Grails -vs-?

使用 Python 搜索具有 MAC 地址的主机

python 什么是python中最好/最容易使用的加密库

相关推荐

最近更新

标签