Python Pandas：同时分配多个新列

Question

提问by 8one6

I have a DataFrame with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary with keys equal to the possible labels and values equal to 2-tuples of information related to that label. I'd like to tack two new columns onto my frame, one for each part of the 2-tuple corresponding to the label for each row.

我有一个 DataFrame，其中有一列包含每行的标签（除了每行的一些相关数据）。我有一个字典，其键等于可能的标签，值等于与该标签相关的信息的 2 元组。我想在我的框架上添加两个新列，一个用于与每行标签相对应的 2 元组的每个部分。

Here is the setup:

这是设置：

import pandas as pd
import numpy as np

np.random.seed(1)
n = 10

labels = list('abcdef')
colors = ['red', 'green', 'blue']
sizes = ['small', 'medium', 'large']

labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}

df = pd.DataFrame({'label': np.random.choice(labels, n), 
                   'somedata': np.random.randn(n)})

I can get what I want by running:

我可以通过运行得到我想要的：

df['color'], df['size'] = zip(*df['label'].map(labeldict))
print df

  label  somedata  color    size
0     b  0.196643    red  medium
1     c -1.545214  green   small
2     a -0.088104  green   small
3     c  0.852239  green   small
4     b  0.677234    red  medium
5     c -0.106878  green   small
6     a  0.725274  green   small
7     d  0.934889    red  medium
8     a  1.118297  green   small
9     c  0.055613  green   small

But how can I do this if I don't want to manually type out the two columns on the left side of the assignment? I.e. how can I create multiple new columns on the fly. For example, if I had 10-tuples in labeldictinstead of 2-tuples, this would be a real pain as currently written. Here are a couple things that don't work:

但是，如果我不想手动输入作业左侧的两列，我该怎么做呢？即如何即时创建多个新列。例如，如果我有 10 元组labeldict而不是 2 元组，这将是目前编写的真正痛苦。以下是一些不起作用的事情：

# set up attrlist for later use
attrlist = ['color', 'size']

# non-working idea 1)
df[attrlist] = zip(*df['label'].map(labeldict))

# non-working idea 2)
df.loc[:, attrlist] = zip(*df['label'].map(labeldict))

This does work, but seems like a hack:

这确实有效，但似乎是一个黑客：

for a in attrlist:
    df[a] = 0
df[attrlist] = zip(*df['label'].map(labeldict))

Better solutions?

更好的解决方案？

Answer 1

采纳答案by alko

You can use merge instead:

您可以改用合并：

>>> ld = pd.DataFrame(labeldict).T
>>> ld.columns = ['color', 'size']
>>> ld.index.name = 'label'
>>> df.merge(ld.reset_index(), on='label')
  label  somedata  color    size
0     b  1.462108    red  medium
1     c -2.060141  green   small
2     c  1.133769  green   small
3     c  0.042214  green   small
4     e -0.322417    red  medium
5     e -1.099891    red  medium
6     e -0.877858    red  medium
7     e  0.582815    red  medium
8     f -0.384054    red   large
9     d -0.172428    red  medium

Answer 2

回答by BrenBarn

Instead of doing what you're doing with labeldict, you could make that information into a DataFrame and then join it with your original one:

您可以将该信息放入 DataFrame 中，然后将其与原始数据结合，而不是使用 labeldict 执行您正在执行的操作：

>>> labeldf = pandas.DataFrame([(np.random.choice(colors), np.random.choice(sizes)) for c in labels], columns=['color', 'size'], index=labels)
>>> df.join(labeldf, on='label')
  label  somedata  color    size
0     a -1.709973    red  medium
1     b  0.099109   blue  medium
2     a -0.427323    red  medium
3     b  0.474995   blue  medium
4     b -2.819208   blue  medium
5     d -0.998888    red   small
6     b  0.713357   blue  medium
7     d  0.331989    red   small
8     e -0.906240  green   large
9     c -0.501916   blue   large

Answer 3

回答by Eric Ness

If you want to add multiple columns to a DataFrameas part of a method chain, you can use apply. The first step is to create a function that will transform a row represented as a Seriesinto the form you want. Then you can call applyto use this function on each row.

如果要将多个列添加到 aDataFrame作为方法链的一部分，可以使用apply. 第一步是创建一个函数，它将表示为 a 的行Series转换为您想要的形式。然后你可以调用apply在每一行上使用这个函数。

def append_label_attributes(row: pd.Series, labelmap: dict) -> pd.Series:
    result = row.copy()
    result['color'] = labelmap[result['label']][0]
    result['size'] = labelmap[result['label']][1]
    return result

df = (
    pd.DataFrame(
    {
        'label': np.random.choice(labels, n),
        'somedata': np.random.randn(n)}
    )
    .apply(append_label_attributes, axis='columns', labelmap=labeldict)
)

Answer 4

回答by Markus Dutschke

Just use result_type='expand'in pandas apply

只result_type='expand'在熊猫中使用 apply

df
Out[78]: 
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')

df
Out[80]: 
   a  b  mean  std  max
0  0  1   0.5  0.5  1.0
1  2  3   2.5  0.5  3.0
2  4  5   4.5  0.5  5.0
3  6  7   6.5  0.5  7.0
4  8  9   8.5  0.5  9.0

and here some copy paste code

这里有一些复制粘贴代码

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b'])
print('df',df, sep='\n')
print()
def mathOperationsTuple(arr):
    return np.mean(arr), np.std(arr), np.amax(arr)

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')
print('df',df, sep='\n')

Python Pandas：同时分配多个新列

提问by 8one6

采纳答案by alko

回答by BrenBarn

回答by Eric Ness

回答by Markus Dutschke

相关推荐

最近更新

标签

Python Pandas：同时分配多个*新*列

提问by 8one6

采纳答案by alko

回答by BrenBarn

回答by Eric Ness

回答by Markus Dutschke

相关推荐

在python中更改当前工作目录

Python Virtualenv 中的 TKinter

python正则表达式“\1”

将python列表复制到numpy数组时，如何防止TypeError：列表索引必须是整数，而不是元组？

相关推荐

最近更新

标签

Python Pandas：同时分配多个新列