Julia Dataframes 与 Python pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23322025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Julia Dataframes vs Python pandas
提问by ccsv
I am currently using python pandasand want to know if there is a way to output the data from pandas into julia Dataframesand vice versa. (I think you can call python from Julia with Pycallbut I am not sure if it works with dataframes) Is there a way to call Julia from python and have it take in pandas dataframes? (without saving to another file format like csv)
我目前正在使用 python pandas,想知道是否有办法将数据从 Pandas 输出到 julia Dataframes,反之亦然。(我认为您可以使用 Julia 调用 python,Pycall但我不确定它是否适用于数据帧)有没有办法从 python 调用 Julia 并接收panda数据帧?(不保存为另一种文件格式,如 csv)
When would it be advantageous to use Julia Dataframes than Pandas other than extremely large datasets and running things with many loops(like neural networks)?
除了极大的数据集和运行具有许多循环的事物(如神经网络)之外,什么时候使用 Julia Dataframes 比使用 Pandas 更有优势?
采纳答案by ccsv
So there is a library developed for this
所以有一个为此开发的库
PyJuliais a library used to interface with Julia using Python 2 and 3
PyJulia是一个用于使用 Python 2 和 3 与 Julia 交互的库
https://github.com/JuliaLang/pyjulia
https://github.com/JuliaLang/pyjulia
It is experimental but somewhat works
它是实验性的,但有些作用
Secondly Julia also has a front end for pandaswhich is pandas.jl
其次,Julia 也有一个前端,pandas它是pandas.jl
https://github.com/malmaud/Pandas.jl
https://github.com/malmaud/Pandas.jl
It looks to be just a wrapper for pandas but you might be able to execute multiple functions using julia's parallel features.
它看起来只是 pandas 的一个包装器,但您也许可以使用 julia 的并行功能执行多个函数。
As for the which is better so far pandashas faster I/O according to this reading csv in Julia is slow compared to Python
至于到目前为止哪个更好,pandas根据这个阅读 csv 在 Julia 中具有更快的 I/O比 Python 慢
回答by Chase CB
I'm a novice at this sort of thing but have definitely been using both as of late. Truth be told, they seem very quite comparable but there is far more documentation, Stack Overflow questions, etc pertaining to Pandas so I would give it a slight edge. Do not let that fact discourage you however because Julia has some amazing functionality that I'm only beginning to understand. With large datasets, say over a couple gigs, both packages are pretty slow but again Pandas seems to have a slight edge (by no means would I consider my benchmarking to be definitive). Without a more nuanced understanding of what you are trying to achieve, it's difficult for me to envision a circumstance where you would even want to call a Pandas function while working with a Julia DataFrame or vice versa. Unless you are doing something pretty cerebral or working with really large datasets, I can't see going too wrong with either. When you say "output the data" what do you mean? Couldn't you write the Pandas data object to a file and then open/manipulate that file in a Julia DataFrame (as you mention)? Again, unless you have a really good machine reading gigs of data into either pandas or a Julia DataFrame is tedious and can be prohibitively slow.
我是这类事情的新手,但最近肯定一直在使用两者。说实话,它们看起来非常具有可比性,但是关于 Pandas 的文档、堆栈溢出问题等要多得多,所以我会给它一点优势。但是不要让这个事实使您气馁,因为 Julia 有一些我才刚刚开始理解的惊人功能。对于大型数据集,比如几次演出,这两个包都非常慢,但 Pandas 似乎也有一点优势(我绝不认为我的基准测试是确定的)。如果对您要实现的目标没有更细致的理解,我很难想象您甚至想在使用 Julia DataFrame 时调用 Pandas 函数的情况,反之亦然。除非你正在做一些非常理智的事情或处理非常大的数据集,否则我看不出有什么问题。当您说“输出数据”时,您是什么意思?您不能将 Pandas 数据对象写入文件,然后在 Julia DataFrame 中打开/操作该文件(如您所述)吗?再说一次,除非你有一台非常好的机器将数据读取到 Pandas 或 Julia DataFrame 中,否则会很乏味,而且速度可能会慢得令人望而却步。

