Python 如何在一项任务中向 Pandas 数据框添加多列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39050539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to add multiple columns to pandas dataframe in one assignment?
提问by runningbirds
I'm new to pandas and trying to figure out how to add multiple columns to pandas simultaneously. Any help here is appreciated. Ideally I would like to do this in one step rather than multiple repeated steps...
我是熊猫的新手,并试图弄清楚如何同时向熊猫添加多个列。任何帮助在这里表示赞赏。理想情况下,我想在一个步骤中完成此操作,而不是多个重复步骤...
import pandas as pd
df = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3] #thought this would work here...
回答by Matthias Fripp
I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...
), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).
我本来希望你的语法也能工作。问题出现是因为当您使用 column-list 语法 ( df[[new1, new2]] = ...
)创建新列时,pandas 要求右侧是 DataFrame(请注意,DataFrame 的列是否与列的名称相同实际上并不重要)你正在创造)。
Your syntax works fine for assigning scalar values to existingcolumns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...
). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.
您的语法适用于将标量值分配给现有列,并且 Pandas 也很乐意使用单列语法 ( df[new1] = ...
)将标量值分配给新列。因此,解决方案要么将其转换为多个单列分配,要么为右侧创建一个合适的 DataFrame。
Here are several approaches that willwork:
这里有几种方法是将工作:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})
Then one of the following:
然后是以下之一:
1) Three assignments in one, using list unpacking:
1) 三项合一,使用列表解包:
df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]
2) DataFrame
conveniently expands a single row to match the index, so you can do this:
2)DataFrame
方便地扩展单行以匹配索引,因此您可以这样做:
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
3) Make a temporary data frame with new columns, then combine with the original data frame later:
3) 用新列创建一个临时数据框,然后再与原始数据框合并:
df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)
4) Similar to the previous, but using join
instead of concat
(may be less efficient):
4) 与前面类似,但使用join
代替concat
(可能效率较低):
df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))
5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):
5) 使用 dict 是一种比前两个更“自然”的方式来创建新数据框,但新列将按字母顺序排序(至少在 Python 3.6 或 3.7 之前):
df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))
6) Use .assign()
with multiple column arguments.
6).assign()
与多个列参数一起使用。
I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:
我非常喜欢@zero 的答案中的这个变体,但与前一个一样,新列将始终按字母顺序排序,至少在 Python 的早期版本中是这样:
df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:
7)这很有趣(基于https://stackoverflow.com/a/44951376/3830997),但我不知道什么时候值得麻烦:
new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols
8) In the end it's hard to beat three separate assignments:
8) 最后很难通过三个独立的任务:
df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3
Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame
注意:其他答案中已经涵盖了其中的许多选项:向 DataFrame 添加多个列并将它们设置为等于现有列,是否可以一次向 Pandas DataFrame 添加多个列?,添加多个空列到pandas DataFrame
回答by Zero
You could use assign
with a dict of column names and values.
您可以使用assign
列名和值的字典。
In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
col_1 col_2 col2_new_2 col3_new_3 col_new_1
0 0 4 dogs 3 NaN
1 1 5 dogs 3 NaN
2 2 6 dogs 3 NaN
3 3 7 dogs 3 NaN
回答by Nehal J Wani
With the use of concat:
使用concat:
In [128]: df
Out[128]:
col_1 col_2
0 0 4
1 1 5
2 2 6
3 3 7
In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN NaN NaN
1 1.0 5.0 NaN NaN NaN
2 2.0 6.0 NaN NaN NaN
3 3.0 7.0 NaN NaN NaN
Not very sure of what you wanted to do with [np.nan, 'dogs',3]
. Maybe now set them as default values?
不太确定你想做什么[np.nan, 'dogs',3]
。也许现在将它们设置为默认值?
In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]
In [144]: df1
Out[144]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN dogs 3
1 1.0 5.0 NaN dogs 3
2 2.0 6.0 NaN dogs 3
3 3.0 7.0 NaN dogs 3
回答by piRSquared
回答by Prometheus
I am defining the columns using the columns parameter. Here column1
and column2
are column names.
我正在使用 columns 参数定义列。这里column1
和column2
是列名。
df = pd.DataFrame(columns = ['column1', 'column2'])
回答by A. Rabus
if adding a lot of missing columns (a, b, c ,....) with the same value, here 0, i did this:
如果添加许多具有相同值的缺失列 (a, b, c ,....),这里是 0,我这样做了:
new_cols = ["a", "b", "c" ]
df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)
It's based on the second variant of the accepted answer.
它基于已接受答案的第二个变体。
回答by Markus Dutschke
If you just want to add empty new columns, reindexwill do the job
如果您只想添加空的新列,reindex将完成这项工作
df
col_1 col_2
0 0 4
1 1 5
2 2 6
3 3 7
df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0 4 NaN NaN NaN
1 1 5 NaN NaN NaN
2 2 6 NaN NaN NaN
3 3 7 NaN NaN NaN
full code example
完整代码示例
import numpy as np
import pandas as pd
df = {'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')
回答by Alex
I am not comfortable using "Index" and so on...could come up as below
我不习惯使用“索引”等等......可能会出现如下
df.columns
Index(['A123', 'B123'], dtype='object')
df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])
df.rename(columns={
'C':'C123',
'D':'D123',
'E':'E123'
},inplace=True)
df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')
回答by halfmoonhalf
Just want to point out that option2 in @Matthias Fripp's answer
只想在@Matthias Fripp 的回答中指出 option2
(2) I wouldn't necessarily expect DataFrame to work this way, but it does
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
(2) 我不一定希望 DataFrame 以这种方式工作,但确实如此
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
is already documented in pandas' own documentation http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
已经记录在熊猫自己的文档中 http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner.You may find this useful for applying a transform (in-place) to a subset of the columns.
您可以将列列表传递给 [] 以按该顺序选择列。如果 DataFrame 中不包含列,则会引发异常。 也可以通过这种方式设置多列。您可能会发现这对于将转换(就地)应用于列的子集很有用。