Python _csv.Error: 字段大于字段限制 (131072)

Question

提问by user1251007

I have a script reading in a csv file with very huge fields:

我有一个读取 csv 文件的脚本，其中包含非常大的字段：

# example from http://docs.python.org/3.3/library/csv.html?highlight=csv%20dictreader#examples
import csv
with open('some.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

However, this throws the following error on some csv files:

但是，这会在某些 csv 文件上引发以下错误：

_csv.Error: field larger than field limit (131072)

How can I analyze csv files with huge fields? Skipping the lines with huge fields is not an option as the data needs to be analyzed in subsequent steps.

如何分析具有巨大字段的 csv 文件？跳过具有巨大字段的行不是一种选择，因为需要在后续步骤中分析数据。

Answer 1

采纳答案by user1251007

The csv file might contain very huge fields, therefore increase the field_size_limit:

csv 文件可能包含非常大的字段，因此增加field_size_limit：

import sys
import csv

csv.field_size_limit(sys.maxsize)

sys.maxsizeworks for Python 2.x and 3.x. sys.maxintwould only work with Python 2.x (SO: what-is-sys-maxint-in-python-3)

sys.maxsize适用于 Python 2.x 和 3.x。sys.maxint只适用于 Python 2.x ( SO: what-is-sys-maxint-in-python-3)

Update

更新

As Geoff pointed out, the code above might result in the following error: OverflowError: Python int too large to convert to C long. To circumvent this, you could use the following quick and dirtycode (which should work on every system with Python 2 and Python 3):

正如 Geoff 指出的那样，上面的代码可能会导致以下错误：OverflowError: Python int too large to convert to C long. 为了避免这种情况，您可以使用以下快速而肮脏的代码（这些代码应该适用于使用 Python 2 和 Python 3 的每个系统）：

import sys
import csv
maxInt = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.

    try:
        csv.field_size_limit(maxInt)
        break
    except OverflowError:
        maxInt = int(maxInt/10)

Answer 2

回答by CSP

This could be because your CSV file has embedded single or double quotes. If your CSV file is tab-delimited try opening it as:

这可能是因为您的 CSV 文件嵌入了单引号或双引号。如果您的 CSV 文件以制表符分隔，请尝试将其打开为：

c = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)

Answer 3

回答by Ahmet Erkan ?EL?K

Sometimes, a row contain double quote column. When csv reader try read this row, not understood end of column and fire this raise. Solution is below:

有时，一行包含双引号列。当 csv 阅读器尝试读取这一行时，无法理解列的结尾并触发此加薪。解决方法如下：

reader = csv.reader(cf, quoting=csv.QUOTE_MINIMAL)

Answer 4

回答by Tad

Below is to check the current limit

下面是检查电流限制

csv.field_size_limit()

Out[20]: 131072

出[20]：131072

Below is to increase the limit. Add it to the code

下面是增加限制。将其添加到代码中

csv.field_size_limit(100000000)

Try checking the limit again

再次尝试检查限制

csv.field_size_limit()

Out[22]: 100000000

出[22]：100000000

Now you won't get the error "_csv.Error: field larger than field limit (131072)"

现在您不会收到错误“_csv.Error：字段大于字段限制（131072）”

Answer 5

回答by Abdul Waseh

Find the cqlshrc file usually placed in .cassandra directory.

找到通常放在 .cassandra 目录中的 cqlshrc 文件。

In that file append,

在该文件追加中，

[csv]
field_size_limit = 1000000000

Answer 6

回答by CristiFati

csvfield sizes are controlled via [Python 3.Docs]: csv.field_size_limit([new_limit]):

csv字段大小通过[Python 3.Docs]: csv. field_size_limit( [new_limit])：

Returns the current maximum field size allowed by the parser. If new_limitis given, this becomes the new limit.

返回解析器允许的当前最大字段大小。如果给出了new_limit，这将成为新的限制。

It is set by default to 128kor 0x20000(131072), which should be enough for any decent .csv:

它默认设置为128k或0x20000( 131072)，这对于任何体面的.csv应该足够了：

>>> import csv
>>>
>>> limit0 = csv.field_size_limit()
>>> limit0
131072
>>> "0x{0:016X}".format(limit0)
'0x0000000000020000'

>>> import csv
>>>
>>> limit0 = csv.field_size_limit()
>>> limit0
131072
>>> "0x{0:016X}".format(limit0)
'0x0000000000020000'

However, when dealing with a .csvfile (with the correct quotingand delimiter) having (at least) one field longer than this size, the error pops up.
To get rid of the error, the size limit should be increased (to avoid any worries, the maximum possible value is attempted).

但是，在处理具有（至少）一个字段长于此大小的.csv文件（使用正确的引用和分隔符）时，会弹出错误。
为了消除错误，应该增加大小限制（为了避免任何担心，尝试最大可能值）。

Behind the scenes (check [GitHub]: python/cpython - (master) cpython/Modules/_csv.cfor implementation details), the variable that holds this value is a C long([Wikipedia]: C data types), whose size varies depending on CPUarchitecture and OS(ILP). The classical difference: for a 64bitOS(Pythonbuild), the longtype size (in bits) is:

在幕后（查看[GitHub]: python/cpython - (master) cpython/Modules/_csv.c以获取实现细节），保存该值的变量是一个C long( [Wikipedia]: C data types)，其大小因CPU架构和操作系统( I LP)而异。经典差异：对于64 位操作系统（Python构建），长类型大小（以位为单位）是：

Nix: 64
Win: 32

尼克斯：64
赢: 32

When attempting to set it, the new value is checked to be in the longboundaries, that's why in some cases another exception pops up (this case is common on Win):

当尝试设置它时，新值被检查是否在长边界内，这就是为什么在某些情况下会弹出另一个异常（这种情况在Win上很常见）：

>>> import sys
>>>
>>> sys.platform, sys.maxsize
('win32', 9223372036854775807)
>>>
>>> csv.field_size_limit(sys.maxsize)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> import sys
>>>
>>> sys.platform, sys.maxsize
('win32', 9223372036854775807)
>>>
>>> csv.field_size_limit(sys.maxsize)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

To avoid running into this problem, set the (maximum possible) limit (LONG_MAX) using an artifice (thanks to [Python 3.Docs]: ctypes - A foreign function library for Python). It should work on Python 3and Python 2, on any CPU/ OS.

为避免遇到此问题，请使用技巧设置（最大可能）限制（LONG_MAX）（感谢[Python 3.Docs]：ctypes - Python 的外部函数库）。它应该适用于Python 3和Python 2，在任何CPU/ OS 上。

>>> import ctypes as ct
>>>
>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))
131072
>>> limit1 = csv.field_size_limit()
>>> limit1
2147483647
>>> "0x{0:016X}".format(limit1)
'0x000000007FFFFFFF'

>>> import ctypes as ct
>>>
>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))
131072
>>> limit1 = csv.field_size_limit()
>>> limit1
2147483647
>>> "0x{0:016X}".format(limit1)
'0x000000007FFFFFFF'

64bitPythonon a Nixlike OS:

Nix等操作系统上的64 位Python：

>>> import sys, csv, ctypes as ct
>>>
>>> sys.platform, sys.maxsize
('linux', 9223372036854775807)
>>>
>>> csv.field_size_limit()
131072
>>>
>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))
131072
>>> limit1 = csv.field_size_limit()
>>> limit1
9223372036854775807
>>> "0x{0:016X}".format(limit1)
'0x7FFFFFFFFFFFFFFF'

>>> import sys, csv, ctypes as ct
>>>
>>> sys.platform, sys.maxsize
('linux', 9223372036854775807)
>>>
>>> csv.field_size_limit()
131072
>>>
>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))
131072
>>> limit1 = csv.field_size_limit()
>>> limit1
9223372036854775807
>>> "0x{0:016X}".format(limit1)
'0x7FFFFFFFFFFFFFFF'

For 32bitPython, things are uniform: it's the behavior encountered on Win.

对于32 位Python，事情是统一的：这是在Win 上遇到的行为。

Check the following resources for more details on:

查看以下资源以了解有关以下内容的更多详细信息：

Playing with Ctypes boundaries from Python: [SO]: Maximum and minimum value of C types integers from Python (@CristiFati's answer)
Python32bitvs64bitdifferences: [SO]: How do I determine if my python shell is executing in 32bit or 64bit mode on OS X? (@CristiFati's answer)

与打ç类型的边界从Python的：[SO]：最大和Python中的C型整数最小值（@ CristiFati的答案）
Python 32 位与64 位差异：[SO]：如何确定我的 Python shell 是在 OS X 上以 32 位还是 64 位模式执行？（@CristiFati 的回答）

Answer 7

回答by Steffen Winkler

I just had this happen to me on a 'plain' CSV file. Some people might call it an invalid formatted file. No escape characters, no double quotes and delimiter was a semicolon.

我刚刚在“普通”CSV 文件中遇到了这种情况。有些人可能会称其为无效的格式化文件。没有转义字符，没有双引号，分隔符是分号。

A sample line from this file would look like this:

此文件中的示例行如下所示：

First cell; Second " Cell with one double quote and leading space;'Partially quoted' cell;Last cell

第一个单元格；第二个 " 带有一个双引号和前导空格的单元格;'部分引用'单元格;最后一个单元格

the single quote in the second cell would throw the parser off its rails. What worked was:

第二个单元格中的单引号会使解析器脱离轨道。有效的是：

csv.reader(inputfile, delimiter=';', doublequote='False', quotechar='', quoting=csv.QUOTE_NONE)

Answer 8

回答by 0x01h

You can use read_csvfrom pandasto skip these lines.

您可以使用read_csvfrompandas跳过这些行。

import pandas as pd

data_df = pd.read_csv('data.csv', error_bad_lines=False)

Python _csv.Error: 字段大于字段限制 (131072)

提问by user1251007

采纳答案by user1251007

Update

更新

回答by CSP

回答by Ahmet Erkan ?EL?K

回答by Tad

回答by Abdul Waseh

回答by CristiFati

回答by Steffen Winkler

回答by 0x01h

相关推荐

最近更新

标签

Python _csv.Error: 字段大于字段限制 (131072)

提问by user1251007

采纳答案by user1251007

Update

更新

回答by CSP

回答by Ahmet Erkan ?EL?K

回答by Tad

回答by Abdul Waseh

回答by CristiFati

回答by Steffen Winkler

回答by 0x01h

相关推荐

如何在 Python 中创建空列表的列表或元组？

Python numpy 协方差矩阵

Python 如何将 pprint 模块的输出发送到日志文件

Python 3.3 TypeError：不支持的操作数类型+：'NoneType'和'str'

相关推荐

最近更新

标签