将新列添加到 Pandas 数据框的有效方法

Question

提问by thelogicalkoan

I know two ways of adding a new column to pandas dataframe

我知道两种向 Pandas 数据框添加新列的方法

df_new = df.assign(new_column=default_value)

and

和

df[new_column] = default_value

The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?

第一个不会在原地添加列，但第二个会。那么，使用哪一种更有效呢？

Apart from these two is there is any all the more efficient method than these?

除了这两个，还有比这更有效的方法吗？

Answer 1

回答by jezrael

I think second one, assignis used if want nice code witch chaining all functions - one line code:

我认为第二个，assign如果想要漂亮的代码女巫链接所有功能使用 - 一行代码：

df = pd.DataFrame({'A':np.random.rand(10000)})

default_value = 10

In [114]: %timeit df_new = df.assign(new_column=default_value)
228 μs ± 4.26 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [115]: %timeit df['new_column'] = default_value
86.1 μs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I use perfplot for ploting:

我使用 perfplot 进行绘图：

import perfplot

default_value = 10

def chained(df):
    df = df.assign(new_column=default_value)
    return df

def no_chained(df):
    df['new_column'] = default_value
    return df

def make_df(n):
    df = pd.DataFrame({'A':np.random.rand(n)})
    return df

perfplot.show(
    setup=make_df,
    kernels=[chained, no_chained],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False,
    xlabel='len(df)')

将新列添加到 Pandas 数据框的有效方法

提问by thelogicalkoan

回答by jezrael

相关推荐

最近更新

标签

将新列添加到 Pandas 数据框的有效方法

提问by thelogicalkoan

回答by jezrael

相关推荐

pandas 执行熊猫分组操作的更快替代方法

pandas 熊猫追加不起作用

pandas read_csv 使用 dtypes 但列中有 na 值

pandas 如何使用 Boto3 get_query_results 方法从 AWS Athena 创建数据帧

相关推荐

最近更新

标签