Python 从 CSV 文件中去除空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14885908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:45:19  来源:igfitidea点击:

Strip white spaces from CSV file

pythoncsvdata-munging

提问by BAI

I need to stripe the white spaces from a CSV file that I read

我需要从我阅读的 CSV 文件中去除空格

import csv

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    # I need to strip the extra white space from each string in the row
    return(aList)

回答by sapi

You can do:

你可以做:

aList.append([element.strip() for element in row])

回答by mgilson

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]

回答by CaraW

There's also the embedded formatting parameter: skipinitialspace (the default is false) http://docs.python.org/2/library/csv.html#csv-fmt-params

还有嵌入的格式参数:skipinitialspace(默认为false) http://docs.python.org/2/library/csv.html#csv-fmt-params

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    return(aList)

回答by daniel kullmann

You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.

您可以在文件周围创建一个包装对象,在 CSV 阅读器看到它们之前去除空格。这样,您甚至可以将 csv 文件与 cvs.DictReader 一起使用。

import re

class CSVSpaceStripper:
  def __init__(self, filename):
    self.fh = open(filename, "r")
    self.surroundingWhiteSpace = re.compile("\s*;\s*")
    self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")

  def close(self):
    self.fh.close()
    self.fh = None

  def __iter__(self):
    return self

  def next(self):
    line = self.fh.next()
    line = self.surroundingWhiteSpace.sub(";", line)
    line = self.leadingOrTrailingWhiteSpace.sub("", line)
    return line

Then use it like this:

然后像这样使用它:

o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")

I hardcoded ";"to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.

我硬编码";"为分隔符。将代码概括为任何分隔符留给读者作为练习。

回答by CivFan

In my case, I only cared about stripping the whitespace from the field names(aka column headers, aka dictionary keys), when using csv.DictReader.

就我而言,我只关心从剥离空白字段名,(又名列标题,也就是字典键)当使用csv.DictReader

Create a class based on csv.DictReader, and override the fieldnamesproperty to strip out the whitespace from each field name (aka column header, aka dictionary key).

创建一个基于 的类csv.DictReader,并覆盖该fieldnames属性以从每个字段名称(又名列标题,又名字典键)中去除空格。

Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnamesattribute to this new list.

为此,获取字段名的常规列表,然后在创建一个新列表时对其进行迭代,并从每个字段名中删除空格,并将底层_fieldnames属性设置为这个新列表。

import csv

class DictReaderStrip(csv.DictReader):
    @property                                    
    def fieldnames(self):
        if self._fieldnames is None:
            # Initialize self._fieldnames
            # Note: DictReader is an old-style class, so can't use super()
            csv.DictReader.fieldnames.fget(self)
            if self._fieldnames is not None:
                self._fieldnames = [name.strip() for name in self._fieldnames]
        return self._fieldnames

回答by Finger Picking Good

Read a CSV (or Excel file) using Pandas and trim it using this custom function.

使用 Pandas 读取 CSV(或 Excel 文件)并使用此自定义函数对其进行修剪。

#Definition for strippping whitespace
def trim(dataset):
    trim = lambda x: x.strip() if type(x) is str else x
    return dataset.applymap(trim)

You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)

您现在可以像这样将 trim(CSV/Excel) 应用到您的代码中(作为循环的一部分等)

dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))

回答by Nuno André

The most memory-efficient method to format the cells after parsing is through generators. Something like:

解析后格式化单元格的最节省内存的方法是通过generators。就像是:

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        yield (cell.strip() for cell in row)

But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:

但是将它移到一个函数中可能是值得的,您可以使用它来继续调整并避免即将到来的迭代。例如:

nulls = {'NULL', 'null', 'None', ''}

def clean(reader):
    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield clean(row)

Or it can be used to factorize a class:

或者它可以用来分解一个类:

def factory(reader):
    fields = next(reader)

    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield dict(zip(fields, clean(row)))