Python 如何将空列添加到数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16327055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:18:00  来源:igfitidea点击:

How to add an empty column to a dataframe?

pythonpandas

提问by kjo

What's the easiest way to add an empty column to a pandas DataFrameobject? The best I've stumbled upon is something like

将空列添加到 PandasDataFrame对象的最简单方法是什么?我偶然发现的最好的是

df['foo'] = df.apply(lambda _: '', axis=1)

Is there a less perverse method?

有没有更不反常的方法?

采纳答案by DSM

If I understand correctly, assignment should fill:

如果我理解正确,作业应填写:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

回答by emunsing

To add to DSM's answer and building on this associated question, I'd split the approach into two cases:

为了添加到 DSM 的答案并以此相关问题为基础,我将该方法分为两种情况:

  • Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan

  • Adding multiple columns: I'd suggest using the .reindex(columns=[...])method of pandasto add the new columns to the dataframe's column index. This also works for adding multiple new rows with .reindex(rows=[...]). Note that newer versions of Pandas (v>0.20) allow you to specify an axiskeyword rather than explicitly assigning to columnsor rows.

  • 添加单列:只需为新列分配空值,例如 df['C'] = np.nan

  • 添加多列:我建议使用.reindex(columns=[...])pandas方法将新列添加到数据框的列索引中。这也适用于添加多个新行.reindex(rows=[...])。请注意,较新版本的 Pandas (v>0.20) 允许您指定axis关键字而不是显式分配给columnsrows

Here is an example adding multiple columns:

这是添加多列的示例:

mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])

or

或者

mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0

You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn't feel as pythonic to me :)

您也可以始终将新的(空的)数据帧连接到现有的数据帧,但这对我来说并不像 Pythonic :)

回答by Nickil Maveli

Starting with v0.16.0, DF.assign()could be used to assign new columns (single/multiple) to a DF. These columns get inserted in alphabetical order at the end of the DF.

v0.16.0,开头,DF.assign()可用于将新列(单个/多个)分配给 a DF。这些列按字母顺序插入到DF.

This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.

在您想直接在返回的数据帧上执行一系列链接操作的情况下,这与简单分配相比变得有利。

Consider the same DFsample demonstrated by @DSM:

考虑DF@DSM 演示的相同示例:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
   A  B
0  1  2
1  2  3
2  3  4

df.assign(C="",D=np.nan)
Out[21]:
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

Note that this returns a copy with all the previous columns along with the newly created ones. Inorder for the original DFto be modified accordingly, use it like : df = df.assign(...)as it does not support inplaceoperation currently.

请注意,这将返回一个包含所有先前列以及新创建的列的副本。为了使原始文件DF进行相应的修改,请像 :df = df.assign(...)一样使用它,因为它目前不支持inplace操作。

回答by edge-case

@emunsing's answeris really cool for adding multiple columns, but I couldn't get it to work for me in python 2.7. Instead, I found this works:

@emunsing 的答案对于添加多列来说真的很酷,但我无法在 python 2.7 中使用它。相反,我发现这有效:

mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])

回答by liana

an even simpler solution is:

一个更简单的解决方案是:

df = df.reindex(columns = header_list)                

where "header_list" is a list of the headers you want to appear.

其中“header_list”是您要显示的标题列表。

any header included in the list that is not found already in the dataframe will be added with blank cells below.

列表中未在数据框中找到的任何标题都将在下方添加空白单元格。

so if

因此,如果

header_list = ['a','b','c', 'd']

then c and d will be added as columns with blank cells

然后 c 和 d 将添加为带有空白单元格的列

回答by Joy Mazumder

if you want to add column name from a list

如果要从列表中添加列名

df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
    df[i]=np.nan

回答by Carsten

I like:

我喜欢:

df['new'] = pd.Series()

This makes sure that a dfwith zero rows stays with zero rows.

这确保df具有零行的 a 保持零行。

回答by moys

The below code address the question "How do I add n number of empty columns to my existing dataframe". In the interest of keeping solutions to similar problems in one place, I am adding it here.

下面的代码解决了“如何向现有数据框添加 n 个空列”的问题。为了将类似问题的解决方案集中在一处,我将其添加到此处。

Approach 1 (to create 64 additional columns with column names from 1-64)

方法 1(创建 64 个附加列,列名从 1 到 64)

m = list(range(1,65,1)) 
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists

Approach 2 (to create 64 additional columns with column names from 1-64)

方法 2(创建 64 个附加列,列名从 1 到 64)

df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')

回答by Bharath_Raja

You can do

你可以做

df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe 

回答by Usman Ahmad

One can use df.insert(index_to_insert_at, column_header, init_value)to insert new column at a specific index.

可以使用df.insert(index_to_insert_at, column_header, init_value)在特定索引处插入新列。

cost_tbl.insert(1, "col_name", "") 

The above statement would insert an empty Column after the first column.

上面的语句将在第一列之后插入一个空列。