Python 是否可以一次向 Pandas DataFrame 添加多列？

Question

提问by dbliss

If I want to create a new DataFrame with several columns, I can add all the columns at once -- for example, as follows:

如果我想创建一个包含多列的新 DataFrame，我可以一次添加所有列——例如，如下所示：

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

But now suppose farther down the road I want to add a set of additional columns to this DataFrame. Is there a way to add them all simultaneously, as in

但是现在假设在更远的地方我想向这个 DataFrame 添加一组额外的列。有没有办法同时添加它们，如

additional_data = {'col_3': [8, 9, 10, 11],
                   'col_4': [12, 13, 14, 15]}
#Below is a made-up function of the kind I desire.
df.add_data(additional_data)

I'm aware I could do this:

我知道我可以这样做：

for key, value in additional_data.iteritems():
    df[key] = value

Or this:

或这个：

df2 = pd.DataFrame(additional_data, index=df.index)
df = pd.merge(df, df2, on=df.index)

I was just hoping for something cleaner. If I'm stuck with these two options, which is preferred?

我只是希望有更清洁的东西。如果我坚持使用这两个选项，哪个是首选？

Answer 1

采纳答案by Zero

Pandas has assignmethod since 0.16.0. You could use it on dataframes like

Pandasassign从0.16.0. 你可以在像这样的数据帧上使用它

In [1506]: df1.assign(**df2)
Out[1506]:
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

or, you could directly use the dictionary like

或者，你可以直接使用字典

In [1507]: df1.assign(**additional_data)
Out[1507]:
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

Answer 2

回答by moenad

What you need is the joinfunction:

你需要的是join函数：

df1.join(df2, how='outer')
#or
df1.join(df2) # this works also

Example:

例子：

data = {'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]}
df1 = pd.DataFrame(data)

additional_data = {'col_3': [8, 9, 10, 11],
               'col_4': [12, 13, 14, 15]}
df2 = pd.DataFrame(additional_data)

df1.join(df2, how='outer')

output:

输出：

   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

Answer 3

回答by Roman Pekar

If you don't want to create new DataFrame from additional_data, you can use something like this:

如果您不想从中创建新的 DataFrame additional_data，您可以使用以下方法：

>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df['col3'], df['col4'] = additional_data
>>> df
   col_1  col_2  col3  col4
0      0      4     8    12
1      1      5     9    13
2      2      6    10    14
3      3      7    11    15

It's also possible to do something like this, but it would be new DataFrame, not inplace modification of existing DataFrame:

也可以做这样的事情，但它是新的 DataFrame，而不是对现有 DataFrame 的就地修改：

>>> additional_header = ['col_3', 'col_4']
>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df = pd.DataFrame(data=np.concatenate((df.values.T, additional_data)).T, columns=np.concatenate((df.columns, additional_header)))
>>> df
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

Answer 4

回答by Zaros

All you need to do is create the new columns with data from the additional dataframe.

您需要做的就是使用来自附加数据帧的数据创建新列。

data =            {'col_1': [0, 1, 2, 3],
                   'col_2': [4, 5, 6, 7]}
additional_data = {'col_3': [8, 9, 10, 11],
                   'col_4': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data)

df[df2.columns] = df2

df now looks like:

df 现在看起来像：

   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

Indices from the original dataframe will be used as if you had performed an in-place left join. Data from the original dataframe in columns with a matching name in the additional dataframe will be overwritten. For example:

将使用来自原始数据帧的索引，就像您执行了就地左连接一样。来自附加数据帧中具有匹配名称的列中的原始数据帧的数据将被覆盖。例如：

data =            {'col_1': [0, 1, 2, 3],
                   'col_2': [4, 5, 6, 7]}
additional_data = {'col_2': [8, 9, 10, 11],
                   'col_3': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data, index=[0,1,2,4])

df[df2.columns] = df2

df now looks like:

df 现在看起来像：

   col_1  col_2  col_3
0      0      8     12
1      1      9     13
2      2     10     14
3      3    NaN    NaN

Python 是否可以一次向 Pandas DataFrame 添加多列？

提问by dbliss

采纳答案by Zero

回答by moenad

回答by Roman Pekar

回答by Zaros

相关推荐

最近更新

标签

Python 是否可以一次向 Pandas DataFrame 添加多列？

提问by dbliss

采纳答案by Zero

回答by moenad

回答by Roman Pekar

回答by Zaros

相关推荐

Python 了解 Beautiful Soup 中的 Find() 函数

Python Matplotlib 维恩图

Python 没有模块名称pyspark错误

Python 将参数传递给 fsolve

相关推荐

最近更新

标签