Python 熊猫高效的数据框集行

Question

提问by wdg

First I have the following empty DataFrame preallocated:

首先，我预先分配了以下空数据帧：

df=DataFrame(columns=range(10000),index=range(1000))

Then I want to update the dfrow by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task.

然后我想df用长度为 10000 的 numpy 数组作为数据逐行（有效地）更新。我的问题是：我什至不知道应该使用什么 DataFrame 方法来完成这项任务。

Thank you!

谢谢！

Answer 1

采纳答案by Jeff

Here's 3 methods, only 100 columns, 1000 rows

这里有 3 种方法，只有 100 列，1000 行

In [5]: row = np.random.randn(100)

Row wise assignment

按行分配

In [6]: def method1():
   ...:     df = DataFrame(columns=range(100),index=range(1000))
   ...:     for i in xrange(len(df)):
   ...:         df.iloc[i] = row
   ...:     return df
   ...:

Build up the arrays in a list, create the frame all at once

在列表中构建数组，同时创建框架

In [9]: def method2():
   ...:     return DataFrame([ row for i in range(1000) ])
   ...:

Columnwise assignment (with transposes at both ends)

按列分配（两端都有转置）

In [13]: def method3():
   ....:     df = DataFrame(columns=range(100),index=range(1000)).T
   ....:     for i in xrange(1000):
   ....:         df[i] = row
   ....:     return df.T
   ....:

These all have the same output frame

这些都有相同的输出帧

In [22]: (method2() == method1()).all().all()
Out[22]: True

In [23]: (method2() == method3()).all().all()
Out[23]: True


In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop

In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop

In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop

It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.

很明显，建立一个列表，然后一次创建框架比进行任何形式的分配要快几个数量级。分配涉及复制。一次建立全部只复制一次。

Answer 2

回答by DataByDavid

df=DataFrame(columns=range(10),index=range(10))
a = np.array( [9,9,9,9,9,9,9,9,9,9] )

Update row:

更新行：

df.loc[2] = a

Using Jeff's idea...

使用杰夫的想法......

df2 = DataFrame(data=np.random.randn(10,10), index=arange(10))
df2.head().T

I have written up a notebook answering the question: https://www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row

我写了一个笔记本来回答这个问题：https: //www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row

Python 熊猫高效的数据框集行

提问by wdg

采纳答案by Jeff

回答by DataByDavid

相关推荐

最近更新

标签

Python 熊猫高效的数据框集行

提问by wdg

采纳答案by Jeff

回答by DataByDavid

相关推荐

Python Keras LSTM 时间序列

Python “WindowsError: [错误 2] 系统找不到指定的文件”未解决

Python 如何从 django 模型创建双长文本字段

Python 如何仅展平 numpy 数组的某些维度

相关推荐

最近更新

标签