Python 从 CSV 文件中去除空格

Question

提问by BAI

I need to stripe the white spaces from a CSV file that I read

我需要从我阅读的 CSV 文件中去除空格

import csv

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    # I need to strip the extra white space from each string in the row
    return(aList)

Answer 1

回答by sapi

You can do:

你可以做：

aList.append([element.strip() for element in row])

Answer 2

回答by mgilson

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]

Answer 3

回答by CaraW

There's also the embedded formatting parameter: skipinitialspace (the default is false) http://docs.python.org/2/library/csv.html#csv-fmt-params

还有嵌入的格式参数：skipinitialspace（默认为false） http://docs.python.org/2/library/csv.html#csv-fmt-params

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    return(aList)

Answer 4

回答by daniel kullmann

You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.

您可以在文件周围创建一个包装对象，在 CSV 阅读器看到它们之前去除空格。这样，您甚至可以将 csv 文件与 cvs.DictReader 一起使用。

import re

class CSVSpaceStripper:
  def __init__(self, filename):
    self.fh = open(filename, "r")
    self.surroundingWhiteSpace = re.compile("\s*;\s*")
    self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")

  def close(self):
    self.fh.close()
    self.fh = None

  def __iter__(self):
    return self

  def next(self):
    line = self.fh.next()
    line = self.surroundingWhiteSpace.sub(";", line)
    line = self.leadingOrTrailingWhiteSpace.sub("", line)
    return line

Then use it like this:

然后像这样使用它：

o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")

I hardcoded ";"to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.

我硬编码";"为分隔符。将代码概括为任何分隔符留给读者作为练习。

Answer 5

回答by CivFan

In my case, I only cared about stripping the whitespace from the field names(aka column headers, aka dictionary keys), when using csv.DictReader.

就我而言，我只关心从剥离空白字段名，（又名列标题，也就是字典键）当使用csv.DictReader。

Create a class based on csv.DictReader, and override the fieldnamesproperty to strip out the whitespace from each field name (aka column header, aka dictionary key).

创建一个基于的类csv.DictReader，并覆盖该fieldnames属性以从每个字段名称（又名列标题，又名字典键）中去除空格。

Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnamesattribute to this new list.

为此，获取字段名的常规列表，然后在创建一个新列表时对其进行迭代，并从每个字段名中删除空格，并将底层_fieldnames属性设置为这个新列表。

import csv

class DictReaderStrip(csv.DictReader):
    @property                                    
    def fieldnames(self):
        if self._fieldnames is None:
            # Initialize self._fieldnames
            # Note: DictReader is an old-style class, so can't use super()
            csv.DictReader.fieldnames.fget(self)
            if self._fieldnames is not None:
                self._fieldnames = [name.strip() for name in self._fieldnames]
        return self._fieldnames

Answer 6

回答by Finger Picking Good

Read a CSV (or Excel file) using Pandas and trim it using this custom function.

使用 Pandas 读取 CSV（或 Excel 文件）并使用此自定义函数对其进行修剪。

#Definition for strippping whitespace
def trim(dataset):
    trim = lambda x: x.strip() if type(x) is str else x
    return dataset.applymap(trim)

You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)

您现在可以像这样将 trim(CSV/Excel) 应用到您的代码中（作为循环的一部分等）

dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))

Answer 7

回答by Nuno André

The most memory-efficient method to format the cells after parsing is through generators. Something like:

解析后格式化单元格的最节省内存的方法是通过generators。就像是：

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        yield (cell.strip() for cell in row)

But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:

但是将它移到一个函数中可能是值得的，您可以使用它来继续调整并避免即将到来的迭代。例如：

nulls = {'NULL', 'null', 'None', ''}

def clean(reader):
    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield clean(row)

Or it can be used to factorize a class:

或者它可以用来分解一个类：

def factory(reader):
    fields = next(reader)

    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield dict(zip(fields, clean(row)))

Python 从 CSV 文件中去除空格

提问by BAI

回答by sapi

回答by mgilson

回答by CaraW

回答by daniel kullmann

回答by CivFan

回答by Finger Picking Good

回答by Nuno André

相关推荐

最近更新

标签

Python 从 CSV 文件中去除空格

提问by BAI

回答by sapi

回答by mgilson

回答by CaraW

回答by daniel kullmann

回答by CivFan

回答by Finger Picking Good

回答by Nuno André

相关推荐

Python 在 matplotlib 中设置 y 轴限制

Python 如何访问嵌套列表中的元组元素

可以访问索引/枚举的 Python 列表理解吗？

Python 将自定义按钮添加到 Django 应用程序的管理页面

相关推荐

最近更新

标签