如何在 Python 2.4 CSV 阅读器中禁用引用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/494054/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:13:55  来源:igfitidea点击:

How can I disable quoting in the Python 2.4 CSV reader?

pythoncsv

提问by Carl Meyer

I am writing a Python utility that needs to parse a large, regularly-updated CSV file I don't control. The utility must run on a server with only Python 2.4 available. The CSV file does not quote field values at all, but the Python 2.4 version of the csv librarydoes not seem to give me any way to turn off quoting, it just allows me to set the quote character (dialect.quotechar = '"'or whatever). If I try setting the quote character to Noneor the empty string, I get an error.

我正在编写一个 Python 实用程序,它需要解析一个我无法控制的大型、定期更新的 CSV 文件。该实用程序必须在只有 Python 2.4 可用的服务器上运行。CSV 文件根本不引用字段值,但csv 库Python 2.4 版本似乎没有给我任何关闭引用的方法,它只允许我设置引号字符(dialect.quotechar = '"'或其他)。如果我尝试将引号字符设置为None或 空字符串,则会出现错误。

I can sort of work around this by setting dialect.quotecharto some "rare" character, but this is brittle, as there is no ASCII character I can absolutely guarantee will not show up in field values (except the delimiter, but if I set dialect.quotechar = dialect.delimiter, things go predictably haywire).

我可以通过设置dialect.quotechar一些“稀有”字符来解决这个问题,但这很脆弱,因为没有 ASCII 字符我绝对可以保证不会出现在字段值中(分隔符除外,但如果我设置dialect.quotechar = dialect.delimiter了可预见的失控)。

In Python 2.5 and later, if I set dialect.quotingto csv.QUOTE_NONE, the CSV reader respects that and does not interpret any character as a quote character. Is there any way to duplicate this behavior in Python 2.4?

Python 2.5 及更高版本中,如果我设置dialect.quotingcsv.QUOTE_NONE,CSV 阅读器会尊重它并且不会将任何字符解释为引号字符。有没有办法在 Python 2.4 中复制这种行为?

UPDATE: Thanks Triptych and Mark Roddy for helping to narrow the problem down. Here's a simplest-case demonstration:

更新:感谢 Triptych 和 Mark Roddy 帮助缩小问题的范围。这是一个最简单的案例演示:

>>> import csv
>>> import StringIO
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data))
>>> for i in reader: print i
... 
[]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
_csv.Error: newline inside string

The problem only occurs when there's a single double-quote character in the finalcolumn of a row. Unfortunately, this situation exists in my dataset. I've accepted Tanj's solution: manually assign a nonprinting character ("\x07"or BEL) as the quotechar. This is hacky, but it works, and I haven't yet seen another solution that does. Here's a demo of the solution in action:

仅当一行的最后一列中有一个双引号字符时,才会出现此问题。不幸的是,这种情况存在于我的数据集中。我已经接受了 Tanj 的解决方案:手动分配一个非打印字符("\x07"BEL)作为引号。这很hacky,但它有效,我还没有看到另一个解决方案。这是该解决方案的演示:

>>> import csv
>>> import StringIO
>>> class MyDialect(csv.Dialect):
...     quotechar = '\x07'
...     delimiter = ','
...     lineterminator = '\n'
...     doublequote = False
...     skipinitialspace = False
...     quoting = csv.QUOTE_NONE
...     escapechar = '\'
... 
>>> dialect = MyDialect()
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data), dialect=dialect)
>>> for i in reader: print i
... 
[]
['1', '2', '3', '4', '"5']
['1', '2', '3', '4', '5']

In Python 2.5+ setting quoting to csv.QUOTE_NONE would be sufficient, and the value of quotecharwould then be irrelevant. (I'm actually getting my initial dialect via a csv.Snifferand then overriding the quotechar value, not by subclassing csv.Dialect, but I don't want that to be a distraction from the real issue; the above two sessions demonstrate that Snifferisn't the problem.)

在 Python 2.5+ 中,将引用设置为 csv.QUOTE_NONE 就足够了,而 的值quotechar则无关紧要。(我实际上是通过 a 获取我的初始方言csv.Sniffer,然后覆盖 quotechar 值,而不是通过子类化csv.Dialect,但我不希望这分散对真正问题的注意力;以上两个会话表明这Sniffer不是问题。 )

回答by Tanj

I don't know if python would like/allow it but could you use a non-printable ascii code such as BEL or BS (backspace) These I would think to be extremely rare.

我不知道 python 是否愿意/允许它,但你可以使用不可打印的 ascii 代码,例如 BEL 或 BS(退格)这些我认为非常罕见。

回答by Triptych

I tried a few examples using Python 2.4.3, and it seemed to be smart enough to detect that the fields were unquoted.

我使用 Python 2.4.3 尝试了一些示例,它似乎足够聪明,可以检测到字段没有被引用。

I know you've already accepted a (slightly hacky) answer, but have you tried just leaving the reader.dialect.quotecharvalue alone? What happens if you do?

我知道你已经接受了一个(有点老套的)答案,但你有没有试过不理会reader.dialect.quotechar价值?如果你这样做会怎样?

Any chance we could get example input?

我们有机会获得示例输入吗?

回答by Mark Roddy

+1 for Triptych

+1 三联画

Confirmation that csv.reader automatically handles csv files with out quotes:

确认 csv.reader 自动处理带引号的 csv 文件:

>>> import StringIO
>>> import csv
>>> data="""
... 1,2,3,4,5
... 1,2,3,4,5
... 1,2,3,4,5
... """
>>> reader=csv.reader(StringIO.StringIO(data))
>>> for i in reader:
...     print i
... 
[]
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']