Python 创建一个零填充的熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22963263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a zero-filled pandas data frame
提问by niedakh
What is the best way to create a zero-filled pandas data frame of a given size?
创建给定大小的零填充熊猫数据框的最佳方法是什么?
I have used:
我用过了:
zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)
Is there a better way to do it?
有没有更好的方法来做到这一点?
采纳答案by Shravan
You can try this:
你可以试试这个:
d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)
回答by mtd
If you already have a dataframe, this is the fastest way:
如果您已经有一个数据框,这是最快的方法:
In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 μs per loop
Compare to:
相比于:
In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 μs per loop
In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 μs per loop
回答by Mark Horvath
Assuming having a template DataFrame, which one would like to copy with zero values filled here...
假设有一个模板 DataFrame,一个人想复制其中填充的零值......
If you have no NaNs in your data set, multiplying by zero can be significantly faster:
如果您的数据集中没有 NaN,乘以零会明显更快:
In [19]: columns = ["col{}".format(i) for i in xrange(3000)]
In [20]: indices = xrange(2000)
In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)
In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop
In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop
Improvement depends on DataFrame size, but never found it slower.
改进取决于 DataFrame 大小,但从未发现它更慢。
And just for the heck of it:
只是为了它:
In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop
In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop
But:
但:
In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop
EDIT!!!
编辑!!!
Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.
假设您有一个使用 float64 的框架,这将是最快的!它还可以通过将 0.0 替换为所需的填充数来生成任何值。
In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop
Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:
根据口味,可以从外部定义 nan,并做一个通用的解决方案,而不管特定的浮点类型:
In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop
回答by AlexG
It's best to do this with numpy in my opinion
在我看来,最好用 numpy 来做到这一点
import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))
回答by WaveRider
Similar to @Shravan, but without the use of numpy:
类似于@Shravan,但不使用numpy:
height = 10
width = 20
df_0 = pd.DataFrame(0, index=range(height), columns=range(width))
Then you can do whatever you want with it:
然后你可以用它做任何你想做的事:
post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)
回答by chakuRak
If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:
如果您希望新数据框与现有数据框具有相同的索引和列,您可以将现有数据框乘以零:
df_zeros = df * 0