Python 如何在一项任务中向 Pandas 数据框添加多列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39050539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:49:00  来源:igfitidea点击:

How to add multiple columns to pandas dataframe in one assignment?

pythonpandasdataframe

提问by runningbirds

I'm new to pandas and trying to figure out how to add multiple columns to pandas simultaneously. Any help here is appreciated. Ideally I would like to do this in one step rather than multiple repeated steps...

我是熊猫的新手,并试图弄清楚如何同时向熊猫添加多个列。任何帮助在这里表示赞赏。理想情况下,我想在一个步骤中完成此操作,而不是多个重复步骤...

import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)

df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...

回答by Matthias Fripp

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

我本来希望你的语法也能工作。问题出现是因为当您使用 column-list 语法 ( df[[new1, new2]] = ...)创建新列时,pandas 要求右侧是 DataFrame(请注意,DataFrame 的列是否与列的名称相同实际上并不重要)你正在创造)。

Your syntax works fine for assigning scalar values to existingcolumns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

您的语法适用于将标量值分配给现有列,并且 Pandas 也很乐意使用单列语法 ( df[new1] = ...)将标量值分配给新列。因此,解决方案要么将其转换为多个单列分配,要么为右侧创建一个合适的 DataFrame。

Here are several approaches that willwork:

这里有几种方法是工作:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

Then one of the following:

然后是以下之一:

1) Three assignments in one, using list unpacking:

1) 三项合一,使用列表解包:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) DataFrameconveniently expands a single row to match the index, so you can do this:

2)DataFrame方便地扩展单行以匹配索引,因此您可以这样做:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary data frame with new columns, then combine with the original data frame later:

3) 用新列创建一个临时数据框,然后再与原始数据框合并:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4) Similar to the previous, but using joininstead of concat(may be less efficient):

4) 与前面类似,但使用join代替concat(可能效率较低):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

5) 使用 dict 是一种比前两个更“自然”的方式来创建新数据框,但新列将按字母顺序排序(至少在 Python 3.6 或 3.7 之前):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6) Use .assign()with multiple column arguments.

6).assign()与多个列参数一起使用。

I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

我非常喜欢@zero 的答案中的这个变体,但与前一个一样,新列将始终按字母顺序排序,至少在 Python 的早期版本中是这样:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:

7)这很有趣(基于https://stackoverflow.com/a/44951376/3830997),但我不知道什么时候值得麻烦:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8) In the end it's hard to beat three separate assignments:

8) 最后很难通过三个独立的任务:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame

注意:其他答案中已经涵盖了其中的许多选项:向 DataFrame 添加多个列并将它们设置为等于现有列是否可以一次向 Pandas DataFrame 添加多个列?,添加多个空列到pandas DataFrame

回答by Zero

You could use assignwith a dict of column names and values.

您可以使用assign列名和值的字典。

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

回答by Nehal J Wani

With the use of concat:

使用concat

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

Not very sure of what you wanted to do with [np.nan, 'dogs',3]. Maybe now set them as default values?

不太确定你想做什么[np.nan, 'dogs',3]。也许现在将它们设置为默认值?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

回答by piRSquared

use of list comprehension, pd.DataFrameand pd.concat

使用列表理解,pd.DataFrame以及pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)

enter image description here

在此处输入图片说明

回答by Prometheus

I am defining the columns using the columns parameter. Here column1and column2are column names.

我正在使用 columns 参数定义列。这里column1column2是列名。

df = pd.DataFrame(columns = ['column1', 'column2'])

回答by A. Rabus

if adding a lot of missing columns (a, b, c ,....) with the same value, here 0, i did this:

如果添加许多具有相同值的缺失列 (a, b, c ,....),这里是 0,我这样做了:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

It's based on the second variant of the accepted answer.

它基于已接受答案的第二个变体。

回答by Markus Dutschke

If you just want to add empty new columns, reindexwill do the job

如果您只想添加空的新列,reindex将完成这项工作

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

full code example

完整代码示例

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

otherwise go for zerosanswer with assign

否则,请使用赋值的答案

回答by Alex

I am not comfortable using "Index" and so on...could come up as below

我不习惯使用“索引”等等......可能会出现如下

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

回答by halfmoonhalf

Just want to point out that option2 in @Matthias Fripp's answer

只想在@Matthias Fripp 的回答中指出 option2

(2) I wouldn't necessarily expect DataFrame to work this way, but it does

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

(2) 我不一定希望 DataFrame 以这种方式工作,但确实如此

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

is already documented in pandas' own documentation http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

已经记录在熊猫自己的文档中 http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner.You may find this useful for applying a transform (in-place) to a subset of the columns.

您可以将列列表传递给 [] 以按该顺序选择列。如果 DataFrame 中不包含列,则会引发异常。 也可以通过这种方式设置多列。您可能会发现这对于将转换(就地)应用于列的子集很有用。