将新列添加到 Pandas 数据框的有效方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52289488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:01:40  来源:igfitidea点击:

Efficient way to add new column to pandas dataframe

pythonpandas

提问by thelogicalkoan

I know two ways of adding a new column to pandas dataframe

我知道两种向 Pandas 数据框添加新列的方法

df_new = df.assign(new_column=default_value)

and

df[new_column] = default_value

The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?

第一个不会在原地添加列,但第二个会。那么,使用哪一种更有效呢?

Apart from these two is there is any all the more efficient method than these?

除了这两个,还有比这更有效的方法吗?

回答by jezrael

I think second one, assignis used if want nice code witch chaining all functions - one line code:

我认为第二个,assign如果想要漂亮的代码女巫链接所有功能使用 - 一行代码:

df = pd.DataFrame({'A':np.random.rand(10000)})

default_value = 10

In [114]: %timeit df_new = df.assign(new_column=default_value)
228 μs ± 4.26 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [115]: %timeit df['new_column'] = default_value
86.1 μs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I use perfplot for ploting:

我使用 perfplot 进行绘图:

pic

图片



import perfplot

default_value = 10

def chained(df):
    df = df.assign(new_column=default_value)
    return df

def no_chained(df):
    df['new_column'] = default_value
    return df

def make_df(n):
    df = pd.DataFrame({'A':np.random.rand(n)})
    return df

perfplot.show(
    setup=make_df,
    kernels=[chained, no_chained],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False,
    xlabel='len(df)')