将新列添加到 Pandas 数据框的有效方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52289488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient way to add new column to pandas dataframe
提问by thelogicalkoan
I know two ways of adding a new column to pandas dataframe
我知道两种向 Pandas 数据框添加新列的方法
df_new = df.assign(new_column=default_value)
and
和
df[new_column] = default_value
The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?
第一个不会在原地添加列,但第二个会。那么,使用哪一种更有效呢?
Apart from these two is there is any all the more efficient method than these?
除了这两个,还有比这更有效的方法吗?
回答by jezrael
I think second one, assign
is used if want nice code witch chaining all functions - one line code:
我认为第二个,assign
如果想要漂亮的代码女巫链接所有功能使用 - 一行代码:
df = pd.DataFrame({'A':np.random.rand(10000)})
default_value = 10
In [114]: %timeit df_new = df.assign(new_column=default_value)
228 μs ± 4.26 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [115]: %timeit df['new_column'] = default_value
86.1 μs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I use perfplot for ploting:
我使用 perfplot 进行绘图:
import perfplot
default_value = 10
def chained(df):
df = df.assign(new_column=default_value)
return df
def no_chained(df):
df['new_column'] = default_value
return df
def make_df(n):
df = pd.DataFrame({'A':np.random.rand(n)})
return df
perfplot.show(
setup=make_df,
kernels=[chained, no_chained],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')