pandas 标记数据时出错。C 错误:EOF 跟随转义字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34714070/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:29:36  来源:igfitidea点击:

Error tokenizing data. C error: EOF following escape character

pythoncsvpandastext-fileseof

提问by Bill

I'm trying to load a csv text file that I created with an OS X app written in Objective-C (using XCode). The text file (temp2.csv) looks fine in an editor but there's something wrong with it and I get this error when reading it into a Pandas dataframe. If I copy the data into a fresh text file (temp.csv) and save that it works fine! The two text files are clearly different (one is 74 bytes the other is 150) - invisible characters perhaps? - but it's very annoying as I want the python code to load the text files produced by the C code. Files are attached for reference.

我正在尝试加载我使用 Objective-C 编写的 OS X 应用程序(使用 XCode)创建的 csv 文本文件。文本文件 (temp2.​​csv) 在编辑器中看起来不错,但它有问题,在将其读入 Pandas 数据帧时出现此错误。如果我将数据复制到一个新的文本文件 (temp.csv) 并保存它工作正常!这两个文本文件明显不同(一个是 74 个字节,另一个是 150 个字节)- 也许是不可见的字符?- 但它非常烦人,因为我希望 python 代码加载由 C 代码生成的文本文件。附上文件以供参考。

temp.csv

临时文件

-3.132700,0.355885,9.000000,0.444416
-3.128256,0.444416,9.000000,0.532507

temp2.csv

临时文件

-3.132700,0.355885,9.000000,0.444416
-3.128256,0.444416,9.000000,0.532507

(I can't find any help on this specific error on StackExchange).

(我在 StackExchange 上找不到有关此特定错误的任何帮助)。

Python 2.7.11 |Anaconda 2.2.0 (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import pandas as pd
>>> df = pd.read_csv("temp2.csv", header=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 275, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 590, in __init__
    self._make_engine(self.engine)
  File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 731, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1103, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 515, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4948)
  File "pandas/parser.pyx", line 717, in pandas.parser.TextReader._get_header (pandas/parser.c:7496)
  File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
  File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: EOF following escape character
>>> df = pd.read_csv("temp.csv", header=None)
>>> df
          0         1  2         3
0 -3.132700  0.355885  9  0.444416
1 -3.128256  0.444416  9  0.532507

Footnote: I think I located the problem.

脚注:我想我找到了问题所在。

>>> f = open('temp2.csv')
>>> contents = f.read()
>>> print contents
??-3.132700,0.355885,9.000000,0.444416
-3.128256,0.444416,9.000000,0.532507
>>> contents
'\xff\xfe-\x003\x00.\x001\x003\x002\x007\x000\x000\x00,\x000\x00.\x003\x005\x005\x008\x008\x005\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00\n\x00-\x003\x00.\x001\x002\x008\x002\x005\x006\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x005\x003\x002\x005\x000\x007\x00'

It's full of escape characters! How to remove them?

它充满了转义字符!如何删除它们?

回答by jezrael

You need add parameter encodingto read_csv, because file encoding is UTF-16:

您需要将参数添加encodingread_csv,因为文件编码是UTF-16

import pandas as pd

contents = '\xff\xfe-\x003\x00.\x001\x003\x002\x007\x000\x000\x00,\x000\x00.\x003\x005\x005\x008\x008\x005\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00\n\x00-\x003\x00.\x001\x002\x008\x002\x005\x006\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x005\x003\x002\x005\x000\x007\x00'

text_file = open("test/file1.csv", "wb")
text_file.write(contents)
text_file.close()

df = pd.read_csv("test/file1.csv", header=None, encoding='utf-16')
print df

          0         1  2         3
0 -3.132700  0.355885  9  0.444416
1 -3.128256  0.444416  9  0.532507