标记数据时出错。C 错误:pandas python 内存不足,大文件 csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41303246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error tokenizing data. C error: out of memory pandas python, large file csv
提问by Amal Kostali Targhi
I have a large csv file of 3.5 go and I want to read it using pandas.
我有一个 3.5 go 的大型 csv 文件,我想使用 Pandas 读取它。
This is my code:
这是我的代码:
import pandas as pd
tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False)
df = pd.concat(tp, ignore_index=True)
I get this error:
我收到此错误:
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()
CParserError: Error tokenizing data. C error: out of
The capacity of my ram is 8 Go.
我的 ram 的容量是 8 Go。
回答by ??????
try this bro:
试试这个兄弟:
mylist = []
for chunk in pd.read_csv('train_2011_2012_2013.csv', sep=';', chunksize=20000):
mylist.append(chunk)
big_data = pd.concat(mylist, axis= 0)
del mylist
回答by Dutse I
You may try setting error_bad_lines = Falsewhen calling the csv file i.e.
您可以尝试在调用 csv 文件时设置error_bad_lines = False,即
import pandas as pd
df = pd.read_csv('my_big_file.csv', error_bad_lines = False)
回答by Justas
This error could also be caused by the chunksize=20000000. Decreasing that fixed the issue in my case. In ??????'s solution chunksize is also decreased which might have done the trick.
此错误也可能是由chunksize=20000000引起的。在我的情况下,减少解决了这个问题。在 ?????? 的解决方案中,chunksize 也减少了,这可能起到了作用。