Python 将项目分组到桶中的简单方法

Question

提问by Mu Mind

I often want to bucket an unordered collection in python. itertools.groubpydoes the right sort of thing but almost always requires massaging to sort the items first and catch the iterators before they're consumed.

我经常想在 python 中存储一个无序的集合。itertools.groubpy做正确的事情，但几乎总是需要按摩来首先对项目进行排序并在它们被消耗之前捕获迭代器。

Is there any quick way to get this behavior, either through a standard python module or a simple python idiom?

有没有什么快速的方法可以通过标准的 python 模块或简单的 python 习语来获得这种行为？

>>> bucket('thequickbrownfoxjumpsoverthelazydog', lambda x: x in 'aeiou')
{False: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p',
    's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g'],
 True: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']}
>>> bucket(xrange(21), lambda x: x % 10)
{0: [0, 10, 20],
 1: [1, 11],
 2: [2, 12],
 3: [3, 13],
 4: [4, 14],
 5: [5, 15],
 6: [6, 16],
 7: [7, 17],
 8: [8, 18],
 9: [9, 19]}

Answer 1

采纳答案by DSM

This has come up several times before -- (1), (2), (3)-- and there's a partition recipe in the itertoolsrecipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

这又拿出了前几次- （1），（2），（3）-并有一个在一个分区配方itertools的食谱，但据我所知，没有什么标准库。虽然我很惊讶，几个星期之前由accumulate，那么谁知道这些天潜伏着什么？:^)

When I need this behaviour, I use

当我需要这种行为时，我使用

from collections import defaultdict

def partition(seq, key):
    d = defaultdict(list)
    for x in seq:
        d[key(x)].append(x)
    return d

and get on with my day.

继续我的一天。

Answer 2

回答by grieve

Here is a simple two liner

这是一个简单的两个班轮

d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)

Edit:

编辑：

Just adding your other case for completeness.

只需添加您的其他案例以确保完整性。

d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)

Answer 3

回答by korylprince

Edit:

编辑：

Using DSM's answer as a start, here is a slightly more concise, general answer:

使用 DSM 的答案作为开始，这里是一个稍微简洁的通用答案：

d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')

or

或者

d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))

#

Here is a two liner:

这是两个班轮：

d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

Which can of course be made a one-liner:

这当然可以做成单线：

d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

Answer 4

回答by Thomas Perl

Here's a variant of partition()from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

partition()当谓词是布尔值时，这是上面的变体，避免了dict/的成本defaultdict：

def boolpartition(seq, pred):
    passing, failing = [], []
    for item in seq:
        (passing if pred(item) else failing).append(item)
    return passing, failing

Example usage:

用法示例：

>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]

Answer 5

回答by Boern

If its a pandas.DataFramethe following also works, utilizing pd.cut()

如果它pandas.DataFrame的以下也有效，利用pd.cut()

from sklearn import datasets
import pandas as pd

# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature

# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)

yielding

屈服

0      0_1
1      0_0
2      0_0
[...]

In case you don't set the labels, the output is going to like this

如果你不设置labels，输出会像这样

0       (5.02, 5.74]
1      (4.296, 5.02]
2      (4.296, 5.02]
[...]

Python 将项目分组到桶中的简单方法

提问by Mu Mind

采纳答案by DSM

回答by grieve

回答by korylprince

回答by Thomas Perl

回答by Boern

相关推荐

最近更新

标签

Python 将项目分组到桶中的简单方法

提问by Mu Mind

采纳答案by DSM

回答by grieve

回答by korylprince

回答by Thomas Perl

回答by Boern

相关推荐

Python Pandas - 如何在列中展平分层索引

如何清除 Python 脚本中间的所有变量？

Python Numpy 整数 nan

Python 如何从 Tkinter 滑块（“比例”）中获取价值？

相关推荐

最近更新

标签