Python 是否可以一次向 Pandas DataFrame 添加多列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19866377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is it possible to add several columns at once to a pandas DataFrame?
提问by dbliss
If I want to create a new DataFrame with several columns, I can add all the columns at once -- for example, as follows:
如果我想创建一个包含多列的新 DataFrame,我可以一次添加所有列——例如,如下所示:
data = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)
But now suppose farther down the road I want to add a set of additional columns to this DataFrame. Is there a way to add them all simultaneously, as in
但是现在假设在更远的地方我想向这个 DataFrame 添加一组额外的列。有没有办法同时添加它们,如
additional_data = {'col_3': [8, 9, 10, 11],
'col_4': [12, 13, 14, 15]}
#Below is a made-up function of the kind I desire.
df.add_data(additional_data)
I'm aware I could do this:
我知道我可以这样做:
for key, value in additional_data.iteritems():
df[key] = value
Or this:
或这个:
df2 = pd.DataFrame(additional_data, index=df.index)
df = pd.merge(df, df2, on=df.index)
I was just hoping for something cleaner. If I'm stuck with these two options, which is preferred?
我只是希望有更清洁的东西。如果我坚持使用这两个选项,哪个是首选?
采纳答案by Zero
Pandas has assign
method since 0.16.0
. You could use it on dataframes like
Pandasassign
从0.16.0
. 你可以在像这样的数据帧上使用它
In [1506]: df1.assign(**df2)
Out[1506]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
or, you could directly use the dictionary like
或者,你可以直接使用字典
In [1507]: df1.assign(**additional_data)
Out[1507]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
回答by moenad
What you need is the join
function:
你需要的是join
函数:
df1.join(df2, how='outer')
#or
df1.join(df2) # this works also
Example:
例子:
data = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
df1 = pd.DataFrame(data)
additional_data = {'col_3': [8, 9, 10, 11],
'col_4': [12, 13, 14, 15]}
df2 = pd.DataFrame(additional_data)
df1.join(df2, how='outer')
output:
输出:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
回答by Roman Pekar
If you don't want to create new DataFrame from additional_data
, you can use something like this:
如果您不想从 中创建新的 DataFrame additional_data
,您可以使用以下方法:
>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df['col3'], df['col4'] = additional_data
>>> df
col_1 col_2 col3 col4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
It's also possible to do something like this, but it would be new DataFrame, not inplace modification of existing DataFrame:
也可以做这样的事情,但它是新的 DataFrame,而不是对现有 DataFrame 的就地修改:
>>> additional_header = ['col_3', 'col_4']
>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df = pd.DataFrame(data=np.concatenate((df.values.T, additional_data)).T, columns=np.concatenate((df.columns, additional_header)))
>>> df
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
回答by Zaros
All you need to do is create the new columns with data from the additional dataframe.
您需要做的就是使用来自附加数据帧的数据创建新列。
data = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
additional_data = {'col_3': [8, 9, 10, 11],
'col_4': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data)
df[df2.columns] = df2
df now looks like:
df 现在看起来像:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
Indices from the original dataframe will be used as if you had performed an in-place left join. Data from the original dataframe in columns with a matching name in the additional dataframe will be overwritten. For example:
将使用来自原始数据帧的索引,就像您执行了就地左连接一样。来自附加数据帧中具有匹配名称的列中的原始数据帧的数据将被覆盖。例如:
data = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
additional_data = {'col_2': [8, 9, 10, 11],
'col_3': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data, index=[0,1,2,4])
df[df2.columns] = df2
df now looks like:
df 现在看起来像:
col_1 col_2 col_3
0 0 8 12
1 1 9 13
2 2 10 14
3 3 NaN NaN