Pandas read_csv 可以加载的行数是否有限制?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17269703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a limit to the amount of rows Pandas read_csv can load?
提问by d1337
I am trying to load a .csv file using Pandas read_csv method, the file has 29872046 rows and it's total size is 2.2G. I notice that most of the lines loaded miss their values, for a large amount of columns. The csv file when browsed from shell contains those values... Are there any limitations to loaded files? If not, how could this be debugged? Thanks
我正在尝试使用 Pandas read_csv 方法加载一个 .csv 文件,该文件有 29872046 行,总大小为 2.2G。我注意到,对于大量列,加载的大多数行都错过了它们的值。从 shell 浏览时的 csv 文件包含这些值......加载的文件有任何限制吗?如果没有,如何调试?谢谢
回答by John 9631
@d1337,
@d1337,
I wonder if you have memory issues. There is a hint of this here.
我想知道你是否有记忆问题。这里有一个暗示。
Possibly thisis relevant or this.
If I was attempting to debug it I would do the simple thing. Cut the file in half - what happens? If ok, go up 50%, if not down 50%, until able to identify the point where its happening. You might even want to start with 20 lines and just make sure it is size related.
如果我试图调试它,我会做简单的事情。将文件切成两半 - 会发生什么?如果可以,则上升 50%,如果不下降 50%,直到能够确定其发生的点。您甚至可能想从 20 行开始,并确保它与大小相关。
I'd also add OS and memory information plus the version of Pandas you're using to your post in case its relevant (I'm running Pandas 11.0, Python 3.2, Linux Mint x64 with 16G of RAM so I'd expect no issues, say). Also, possibly, you might post a link to your data so that someone else can test it.
我还会在帖子中添加操作系统和内存信息以及您在帖子中使用的 Pandas 版本,以防万一(我正在运行 Pandas 11.0、Python 3.2、Linux Mint x64 和 16G RAM,所以我希望没有问题, 说)。此外,您可能还可以发布指向您的数据的链接,以便其他人可以对其进行测试。
Hope that helps.
希望有帮助。

