Python 将项目分组到桶中的简单方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12720151/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:41:43  来源:igfitidea点击:

Simple way to group items into buckets

python

提问by Mu Mind

I often want to bucket an unordered collection in python. itertools.groubpydoes the right sort of thing but almost always requires massaging to sort the items first and catch the iterators before they're consumed.

我经常想在 python 中存储一个无序的集合。itertools.groubpy做正确的事情,但几乎总是需要按摩来首先对项目进行排序并在它们被消耗之前捕获迭代器。

Is there any quick way to get this behavior, either through a standard python module or a simple python idiom?

有没有什么快速的方法可以通过标准的 python 模块或简单的 python 习语来获得这种行为?

>>> bucket('thequickbrownfoxjumpsoverthelazydog', lambda x: x in 'aeiou')
{False: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p',
    's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g'],
 True: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']}
>>> bucket(xrange(21), lambda x: x % 10)
{0: [0, 10, 20],
 1: [1, 11],
 2: [2, 12],
 3: [3, 13],
 4: [4, 14],
 5: [5, 15],
 6: [6, 16],
 7: [7, 17],
 8: [8, 18],
 9: [9, 19]}

采纳答案by DSM

This has come up several times before -- (1), (2), (3)-- and there's a partition recipe in the itertoolsrecipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

这又拿出了前几次- (1) (2) (3)-并有一个在一个分区配方itertools的食谱,但据我所知,没有什么标准库。虽然我很惊讶,几个星期之前由accumulate,那么谁知道这些天潜伏着什么?:^)

When I need this behaviour, I use

当我需要这种行为时,我使用

from collections import defaultdict

def partition(seq, key):
    d = defaultdict(list)
    for x in seq:
        d[key(x)].append(x)
    return d

and get on with my day.

继续我的一天。

回答by grieve

Here is a simple two liner

这是一个简单的两个班轮

d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)

Edit:

编辑:

Just adding your other case for completeness.

只需添加您的其他案例以确保完整性。

d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)

回答by korylprince

Edit:

编辑:

Using DSM's answer as a start, here is a slightly more concise, general answer:

使用 DSM 的答案作为开始,这里是一个稍微简洁的通用答案:

d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')

or

或者

d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))
#

Here is a two liner:

这是两个班轮:

d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

Which can of course be made a one-liner:

这当然可以做成单线:

d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

回答by Thomas Perl

Here's a variant of partition()from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

partition()当谓词是布尔值时,这是上面的变体,避免了dict/的成本defaultdict

def boolpartition(seq, pred):
    passing, failing = [], []
    for item in seq:
        (passing if pred(item) else failing).append(item)
    return passing, failing

Example usage:

用法示例:

>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]

回答by Boern

If its a pandas.DataFramethe following also works, utilizing pd.cut()

如果它pandas.DataFrame的以下也有效,利用pd.cut()

from sklearn import datasets
import pandas as pd

# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature

# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)

yielding

屈服

0      0_1
1      0_0
2      0_0
[...]

In case you don't set the labels, the output is going to like this

如果你不设置labels,输出会像这样

0       (5.02, 5.74]
1      (4.296, 5.02]
2      (4.296, 5.02]
[...]