Python 创建一个零填充的熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22963263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:03:37  来源:igfitidea点击:

Creating a zero-filled pandas data frame

pythonpandasdataframe

提问by niedakh

What is the best way to create a zero-filled pandas data frame of a given size?

创建给定大小的零填充熊猫数据框的最佳方法是什么?

I have used:

我用过了:

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

Is there a better way to do it?

有没有更好的方法来做到这一点?

采纳答案by Shravan

You can try this:

你可以试试这个:

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)

回答by mtd

If you already have a dataframe, this is the fastest way:

如果您已经有一个数据框,这是最快的方法:

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 μs per loop

Compare to:

相比于:

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 μs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 μs per loop

回答by Mark Horvath

Assuming having a template DataFrame, which one would like to copy with zero values filled here...

假设有一个模板 DataFrame,一个人想复制其中填充的零值......

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

如果您的数据集中没有 NaN,乘以零会明显更快:

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

改进取决于 DataFrame 大小,但从未发现它更慢。

And just for the heck of it:

只是为了它:

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

但:

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

编辑!!!

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

假设您有一个使用 float64 的框架,这将是最快的!它还可以通过将 0.0 替换为所需的填充数来生成任何值。

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

根据口味,可以从外部定义 nan,并做一个通用的解决方案,而不管特定的浮点类型:

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

回答by AlexG

It's best to do this with numpy in my opinion

在我看来,最好用 numpy 来做到这一点

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

回答by WaveRider

Similar to @Shravan, but without the use of numpy:

类似于@Shravan,但不使用numpy:

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

然后你可以用它做任何你想做的事:

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

回答by chakuRak

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

如果您希望新数据框与现有数据框具有相同的索引和列,您可以将现有数据框乘以零:

df_zeros = df * 0