Python 从 CSV 文件中去除空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14885908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Strip white spaces from CSV file
提问by BAI
I need to stripe the white spaces from a CSV file that I read
我需要从我阅读的 CSV 文件中去除空格
import csv
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
# I need to strip the extra white space from each string in the row
return(aList)
回答by sapi
You can do:
你可以做:
aList.append([element.strip() for element in row])
回答by mgilson
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
回答by CaraW
There's also the embedded formatting parameter: skipinitialspace (the default is false) http://docs.python.org/2/library/csv.html#csv-fmt-params
还有嵌入的格式参数:skipinitialspace(默认为false) http://docs.python.org/2/library/csv.html#csv-fmt-params
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
return(aList)
回答by daniel kullmann
You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.
您可以在文件周围创建一个包装对象,在 CSV 阅读器看到它们之前去除空格。这样,您甚至可以将 csv 文件与 cvs.DictReader 一起使用。
import re
class CSVSpaceStripper:
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile("\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def next(self):
line = self.fh.next()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
Then use it like this:
然后像这样使用它:
o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
I hardcoded ";"to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.
我硬编码";"为分隔符。将代码概括为任何分隔符留给读者作为练习。
回答by CivFan
In my case, I only cared about stripping the whitespace from the field names(aka column headers, aka dictionary keys), when using csv.DictReader.
就我而言,我只关心从剥离空白字段名,(又名列标题,也就是字典键)当使用csv.DictReader。
Create a class based on csv.DictReader, and override the fieldnamesproperty to strip out the whitespace from each field name (aka column header, aka dictionary key).
创建一个基于 的类csv.DictReader,并覆盖该fieldnames属性以从每个字段名称(又名列标题,又名字典键)中去除空格。
Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnamesattribute to this new list.
为此,获取字段名的常规列表,然后在创建一个新列表时对其进行迭代,并从每个字段名中删除空格,并将底层_fieldnames属性设置为这个新列表。
import csv
class DictReaderStrip(csv.DictReader):
@property
def fieldnames(self):
if self._fieldnames is None:
# Initialize self._fieldnames
# Note: DictReader is an old-style class, so can't use super()
csv.DictReader.fieldnames.fget(self)
if self._fieldnames is not None:
self._fieldnames = [name.strip() for name in self._fieldnames]
return self._fieldnames
回答by Finger Picking Good
Read a CSV (or Excel file) using Pandas and trim it using this custom function.
使用 Pandas 读取 CSV(或 Excel 文件)并使用此自定义函数对其进行修剪。
#Definition for strippping whitespace
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x
return dataset.applymap(trim)
You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)
您现在可以像这样将 trim(CSV/Excel) 应用到您的代码中(作为循环的一部分等)
dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))
回答by Nuno André
The most memory-efficient method to format the cells after parsing is through generators. Something like:
解析后格式化单元格的最节省内存的方法是通过generators。就像是:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:
但是将它移到一个函数中可能是值得的,您可以使用它来继续调整并避免即将到来的迭代。例如:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
Or it can be used to factorize a class:
或者它可以用来分解一个类:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))

