标记数据时出错。C 错误：pandas python 内存不足，大文件 csv

Question

提问by Amal Kostali Targhi

I have a large csv file of 3.5 go and I want to read it using pandas.

我有一个 3.5 go 的大型 csv 文件，我想使用 Pandas 读取它。

This is my code:

这是我的代码：

import pandas as pd
tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False)
df = pd.concat(tp, ignore_index=True)

I get this error:

我收到此错误：

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()

CParserError: Error tokenizing data. C error: out of

The capacity of my ram is 8 Go.

我的 ram 的容量是 8 Go。

Answer 1

回答by ??????

try this bro:

试试这个兄弟：

mylist = []

for chunk in  pd.read_csv('train_2011_2012_2013.csv', sep=';', chunksize=20000):
    mylist.append(chunk)

big_data = pd.concat(mylist, axis= 0)
del mylist

Answer 2

回答by Dutse I

You may try setting error_bad_lines = Falsewhen calling the csv file i.e.

您可以尝试在调用 csv 文件时设置error_bad_lines = False，即

import pandas as pd
df = pd.read_csv('my_big_file.csv', error_bad_lines = False)

Answer 3

回答by Justas

This error could also be caused by the chunksize=20000000. Decreasing that fixed the issue in my case. In ??????'s solution chunksize is also decreased which might have done the trick.

此错误也可能是由chunksize=20000000引起的。在我的情况下，减少解决了这个问题。在 ?????? 的解决方案中，chunksize 也减少了，这可能起到了作用。

标记数据时出错。C 错误：pandas python 内存不足，大文件 csv

提问by Amal Kostali Targhi

回答by ??????

回答by Dutse I

回答by Justas

相关推荐

最近更新

标签

标记数据时出错。C 错误：pandas python 内存不足，大文件 csv

提问by Amal Kostali Targhi

回答by ??????

回答by Dutse I

回答by Justas

相关推荐

Python 错误：找不到 pip 的匹配分布

Python：ImportError: lxml not found，请安装

Python 在模块内部使用时未定义 itertools

Python 如何将 RGB PIL 图像转换为具有 3 个通道的 numpy 数组？

相关推荐

最近更新

标签