pandas 我如何修复熊猫 csv 阅读器上的“错误标记数据”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53256091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can i fix "Error tokenizing data" on pandas csv reader
提问by user9191983
I'm trying to read a csv file with pandas. This file actually has only one row but it causes an error whenever I try to read it. Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.
我正在尝试使用 Pandas 读取 csv 文件。该文件实际上只有一行,但是每当我尝试读取它时都会导致错误。第 8 行似乎发生了错误,但我几乎找不到第 8 行,因为上面显然只有一行。
I do like:
我喜欢:
with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:
df = pd.read_csv(file, header=None, sep="\t")
df
Then I get:
然后我得到:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3
ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3
I don't get what's really going on, so any of your advice will be appreciated.
我不明白到底发生了什么,所以你的任何建议将不胜感激。
回答by Po Xin
Try df = pd.read_csv(file, header=None, error_bad_lines=False)
尝试 df = pd.read_csv(file, header=None, error_bad_lines=False)
回答by Adam Zeldin
The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:
现有答案不会在您的数据框中包含这些额外的行。如果您希望数据框与其最宽点一样宽,您可以使用以下命令:
delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))
Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.
如果确实有标题,则设置 skiprows = 1,以后您始终可以检索标题列名称。您还可以识别填充的列数多于原始标题中列名数的行。