python 如何以 Pythonic 的方式检测 CSV 文件中的缺失字段?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1278749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:50:47  来源:igfitidea点击:

How do I detect missing fields in a CSV file in a Pythonic way?

pythonerror-handlingcsv

提问by bedwyr

I'm trying to parse a CSV file using Python's csvmodule (specifically, the DictReaderclass). Is there a Pythonic way to detect empty or missing fields and throw an error?

我正在尝试使用 Python 的csv模块(特别是DictReader类)解析 CSV 文件。是否有 Pythonic 方法来检测空字段或缺失字段并抛出错误?

Here's a sample file using the following headers: NAME, LABEL, VALUE

这是使用以下标题的示例文件:NAME、LABEL、VALUE

foo,bar,baz
yes,no
x,y,z

When parsing, I'd like the second line to throw an error since it's missing the VALUE field.

解析时,我希望第二行抛出错误,因为它缺少 VALUE 字段。

Here's a code snippet which shows how I'm approaching this (disregard the hard-coded strings...they're only present for brevity):

这是一个代码片段,它显示了我是如何处理这个的(忽略硬编码的字符串......它们只是为了简洁起见):

import csv

HEADERS = ["name", "label", "value" ]
fileH = open('configFile')
reader = csv.DictReader(fileH, HEADERS)

for row in reader:
    if row["name"] is None or row["name"] == "":
        # raise Error
    if row["label"] is None or row["label"] == "":
        # raise Error
    ...
fileH.close()

Is there a cleaner way of checking for fields in the CSV file w/out having a bunch of ifstatements? If I need to add more fields, I'll also need more conditionals, which I would like to avoid if possible.

有没有一种更简洁的方法来检查 CSV 文件中的字段,而没有一堆if语句?如果我需要添加更多字段,我还需要更多条件,如果可能的话,我想避免这种情况。

回答by balpha

if any(row[key] in (None, "") for key in row):
    # raise error

Edit: Even better:

编辑:更好:

if any(val in (None, "") for val in row.itervalues()):
    # raise error

回答by Triptych

Since Noneand empty strings both evaluate to False, you should consider this:

由于None和空字符串都计算为False,因此您应该考虑:

for row in reader:
    for header in HEADERS:
        if not row[header]:
            # raise error

Note that, unlike some other answers, you will still have the option of raising an informative, header-specific error.

请注意,与其他一些答案不同,您仍然可以选择提出信息丰富的、特定于标题的错误。

回答by John Millikin

This code will provide, for each row, a list of field names which are not present (or are empty) for that row. You could then provide a more detailed exception, such as "Missing fields: foo, baz".

此代码将为每一行提供该行不存在(或为空)的字段名称列表。然后,您可以提供更详细的异常,例如“缺少字段:foo, baz”。

def missing(row):
    return [h for h in HEADERS if not row.get(h)]

for row in reader:
    m = missing(row)
    if missing:
        # raise exception with list of missing field names

回答by retracile

Something like this?

像这样的东西?

...
for row in reader:
    for column, value in row.items():
        if value is None or value == "":
            # raise Error, using value of column to say which field is missing

You may be able to use 'if not value:' as your test instead of the more explicit test you gave.

您可以使用“if not value:”作为您的测试,而不是您提供的更明确的测试。

回答by dalloliogm

If you use matplotlib.mlab.csv2rec, it already saves the content of the file into an array and raise an error if one of the values is missing.

如果您使用 matplotlib.mlab.csv2rec,它已经将文件内容保存到一个数组中,如果其中一个值丢失,则会引发错误。

>>> from matplotlib.mlab import csv2rec
>>> content_array = csv2rec('file.txt')
IndexError: list index out of range

The problem is that there is not a simple way to customize this behaviour, or to supply a default value in case of missing rows. Moreover, the error message is not very explainatory (could be useful to post a bug report here).

问题是没有一种简单的方法来定制这种行为,或者在缺少行的情况下提供默认值。此外,错误消息不是很有解释性(在此处发布错误报告可能很有用)。

p.s. since csv2rec saves the content of the file into a numpy record, it will be easier to get the values equal to None.

ps 因为 csv2rec 将文件的内容保存到一个 numpy 记录中,所以更容易获得等于 None 的值。