Python 熊猫高效的数据框集行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18771963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas efficient dataframe set row
提问by wdg
First I have the following empty DataFrame preallocated:
首先,我预先分配了以下空数据帧:
df=DataFrame(columns=range(10000),index=range(1000))
Then I want to update the df
row by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task.
然后我想df
用长度为 10000 的 numpy 数组作为数据逐行(有效地)更新。我的问题是:我什至不知道应该使用什么 DataFrame 方法来完成这项任务。
Thank you!
谢谢!
采纳答案by Jeff
Here's 3 methods, only 100 columns, 1000 rows
这里有 3 种方法,只有 100 列,1000 行
In [5]: row = np.random.randn(100)
Row wise assignment
按行分配
In [6]: def method1():
...: df = DataFrame(columns=range(100),index=range(1000))
...: for i in xrange(len(df)):
...: df.iloc[i] = row
...: return df
...:
Build up the arrays in a list, create the frame all at once
在列表中构建数组,同时创建框架
In [9]: def method2():
...: return DataFrame([ row for i in range(1000) ])
...:
Columnwise assignment (with transposes at both ends)
按列分配(两端都有转置)
In [13]: def method3():
....: df = DataFrame(columns=range(100),index=range(1000)).T
....: for i in xrange(1000):
....: df[i] = row
....: return df.T
....:
These all have the same output frame
这些都有相同的输出帧
In [22]: (method2() == method1()).all().all()
Out[22]: True
In [23]: (method2() == method3()).all().all()
Out[23]: True
In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop
In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop
In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop
It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.
很明显,建立一个列表,然后一次创建框架比进行任何形式的分配要快几个数量级。分配涉及复制。一次建立全部只复制一次。
回答by DataByDavid
df=DataFrame(columns=range(10),index=range(10))
a = np.array( [9,9,9,9,9,9,9,9,9,9] )
Update row:
更新行:
df.loc[2] = a
Using Jeff's idea...
使用杰夫的想法......
df2 = DataFrame(data=np.random.randn(10,10), index=arange(10))
df2.head().T
I have written up a notebook answering the question: https://www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row
我写了一个笔记本来回答这个问题:https: //www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row