Pandas read_csv 可以加载的行数是否有限制？

Question

提问by d1337

I am trying to load a .csv file using Pandas read_csv method, the file has 29872046 rows and it's total size is 2.2G. I notice that most of the lines loaded miss their values, for a large amount of columns. The csv file when browsed from shell contains those values... Are there any limitations to loaded files? If not, how could this be debugged? Thanks

我正在尝试使用 Pandas read_csv 方法加载一个 .csv 文件，该文件有 29872046 行，总大小为 2.2G。我注意到，对于大量列，加载的大多数行都错过了它们的值。从 shell 浏览时的 csv 文件包含这些值......加载的文件有任何限制吗？如果没有，如何调试？谢谢

Answer 1

回答by John 9631

@d1337,

@d1337，

I wonder if you have memory issues. There is a hint of this here.

我想知道你是否有记忆问题。这里有一个暗示。

Possibly thisis relevant or this.

可能this是相关的或this。

If I was attempting to debug it I would do the simple thing. Cut the file in half - what happens? If ok, go up 50%, if not down 50%, until able to identify the point where its happening. You might even want to start with 20 lines and just make sure it is size related.

如果我试图调试它，我会做简单的事情。将文件切成两半 - 会发生什么？如果可以，则上升 50%，如果不下降 50%，直到能够确定其发生的点。您甚至可能想从 20 行开始，并确保它与大小相关。

I'd also add OS and memory information plus the version of Pandas you're using to your post in case its relevant (I'm running Pandas 11.0, Python 3.2, Linux Mint x64 with 16G of RAM so I'd expect no issues, say). Also, possibly, you might post a link to your data so that someone else can test it.

我还会在帖子中添加操作系统和内存信息以及您在帖子中使用的 Pandas 版本，以防万一（我正在运行 Pandas 11.0、Python 3.2、Linux Mint x64 和 16G RAM，所以我希望没有问题，说）。此外，您可能还可以发布指向您的数据的链接，以便其他人可以对其进行测试。

Hope that helps.

希望有帮助。

Pandas read_csv 可以加载的行数是否有限制？

提问by d1337

回答by John 9631

相关推荐

最近更新

标签

Pandas read_csv 可以加载的行数是否有限制？

提问by d1337

回答by John 9631

相关推荐

如何将大量数据附加到 Pandas HDFStore 并获得自然唯一索引？

Python - 创建一个空的 Pandas DataFrame 并使用 For 循环从另一个 DataFrame 填充

Python Pandas——合并大部分重复的行

使用另一个系列过滤 Pandas 数据框

相关推荐

最近更新

标签