Python 创建一个零填充的熊猫数据框

Question

提问by niedakh

What is the best way to create a zero-filled pandas data frame of a given size?

创建给定大小的零填充熊猫数据框的最佳方法是什么？

I have used:

我用过了：

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

Is there a better way to do it?

有没有更好的方法来做到这一点？

Answer 1

采纳答案by Shravan

You can try this:

你可以试试这个：

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)

Answer 2

回答by mtd

If you already have a dataframe, this is the fastest way:

如果您已经有一个数据框，这是最快的方法：

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 μs per loop

Compare to:

相比于：

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 μs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 μs per loop

Answer 3

回答by Mark Horvath

Assuming having a template DataFrame, which one would like to copy with zero values filled here...

假设有一个模板 DataFrame，一个人想复制其中填充的零值......

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

如果您的数据集中没有 NaN，乘以零会明显更快：

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

改进取决于 DataFrame 大小，但从未发现它更慢。

And just for the heck of it:

只是为了它：

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

但：

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

编辑！！！

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

假设您有一个使用 float64 的框架，这将是最快的！它还可以通过将 0.0 替换为所需的填充数来生成任何值。

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

根据口味，可以从外部定义 nan，并做一个通用的解决方案，而不管特定的浮点类型：

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

Answer 4

回答by AlexG

It's best to do this with numpy in my opinion

在我看来，最好用 numpy 来做到这一点

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

Answer 5

回答by WaveRider

Similar to @Shravan, but without the use of numpy:

类似于@Shravan，但不使用numpy：

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

然后你可以用它做任何你想做的事：

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

Answer 6

回答by chakuRak

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

如果您希望新数据框与现有数据框具有相同的索引和列，您可以将现有数据框乘以零：

df_zeros = df * 0

Python 创建一个零填充的熊猫数据框

提问by niedakh

采纳答案by Shravan

回答by mtd

回答by Mark Horvath

回答by AlexG

回答by WaveRider

回答by chakuRak

相关推荐

最近更新

标签

Python 创建一个零填充的熊猫数据框

提问by niedakh

采纳答案by Shravan

回答by mtd

回答by Mark Horvath

回答by AlexG

回答by WaveRider

回答by chakuRak

相关推荐

如何清除ipython中的变量？

如何使用python在机器人框架中导入和使用用户定义的类

Python OSError: [Errno 8] Exec 格式错误

如何使用 OpenCV 在 Python 中向图像添加噪声（高斯/盐和胡椒等）

相关推荐

最近更新

标签