numpy.ndarray 与 pandas.DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25201143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
numpy.ndarray vs pandas.DataFrame
提问by Adam Ryczkowski
I need to make a strategic decision about choice of the basis for data structure holding statistical data frames in my program.
我需要就选择在我的程序中保存统计数据帧的数据结构的基础做出战略决策。
I store hundreds of thousands of records in one big table. Each field would be of a different type, including short strings. I'd perform multiple regression analysis and manipulations on the data that need to be done quick, in real time. I also need to use something, that is relatively popular and well supported.
我在一张大表中存储了数十万条记录。每个字段都是不同的类型,包括短字符串。我会对需要快速实时完成的数据进行多元回归分析和操作。我还需要使用一些相对流行且支持良好的东西。
I know about the following contestants:
我知道以下参赛者:
list of array.array
清单 array.array
That is the most basic thing to do. Unfortunately it doesn't support strings. And I need to use numpy anyway for its statistical part, so this one is out of question.
这是最基本的事情。不幸的是它不支持字符串。而且我无论如何都需要使用 numpy 作为其统计部分,所以这个是不可能的。
numpy.ndarray
numpy.ndarray
The ndarrayhas ability to hold arrays of different types in each column (e.g. np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])). It seems a natural winner, but...
的ndarray具有能力以保持不同类型的数组中的每一列(例如np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]))。这似乎是一个天生的赢家,但是……
pandas.DataFrame
pandas.DataFrame
This one is built with statistical use in mind, but is it efficient enough?
这个是在考虑统计用途的情况下构建的,但它是否足够有效?
I read, that the pandas.DataFrameis no longer based on the numpy.ndarray(although it shares the same interface). Can anyone shed some light on it? Or maybe there is an even better data structure out there?
我看,那pandas.DataFrame是不再基础上,numpy.ndarray(虽然它共享相同的接口)。任何人都可以对此有所了解吗?或者也许那里有更好的数据结构?
回答by daniel
pandas.DataFrameis awesome, and interacts very well with much of numpy. Much of the DataFrameis written in Cython and is quite optimized. I suspect the ease of use and the richness of the Pandas API will greatly outweigh any potential benefit you could obtain by rolling your own interfaces around numpy.
pandas.DataFrame很棒,并且与许多 numpy 交互非常好。大部分DataFrame是用 Cython 编写的,并且非常优化。我怀疑 Pandas API 的易用性和丰富性将大大超过通过围绕 numpy 滚动您自己的接口可以获得的任何潜在好处。

