Python 在导入的 .csv 中将字符串更改为浮点数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18877484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Changing strings to Floats in an imported .csv
提问by userNaN
Quick question for an issue I haven't managed to solve quickly:
关于我未能快速解决的问题的快速问题:
I'm working with a .csv file and can't seem to find a simple way to convert strings to floats. Here's my code,
我正在使用 .csv 文件,但似乎找不到将字符串转换为浮点数的简单方法。这是我的代码,
import csv
def readLines():
with open('testdata.csv', 'rU') as data:
reader = csv.reader(data)
row = list(reader)
for x in row:
for y in x:
print type(float(y)),
readLines()
As you can see, it will currently print the type of every y element in x set of lists in the variable row; this produces a long list of "<type 'float'>"
. But this doesn't actually change each element to a float, nor does setting the for loop to execute float(y)
(a type test returns 'string' for each element) work either.
如您所见,它当前将打印变量行中 x 组列表中每个 y 元素的类型;这会产生一长串"<type 'float'>"
. 但这实际上并没有将每个元素更改为浮点数,也不float(y)
会将for 循环设置为执行(类型测试为每个元素返回“字符串”)。
I also tried literal_eval, but that failed as well. The only way to change the list elements to floats is to create a new list, either with list comprehension or manually, but that loses the original formatting of each list (as lists of a set amount of elements within one larger list).
我也尝试了literal_eval,但也失败了。将列表元素更改为浮动的唯一方法是使用列表理解或手动创建一个新列表,但这会丢失每个列表的原始格式(作为一个更大列表中一组元素的列表)。
I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"
我想整个问题实际上只是“使用 Python 以 .csv 或 excel 格式读取、组织和合成数据的最简单方法是什么?”
Thanks in advance to those courteous/knowledgeable enough to help.
在此先感谢那些有礼貌/知识渊博的人提供帮助。
采纳答案by smci
You are correct that Python's builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas.So, your ultimate answer is indeed stop using builtin csv import and start using pandas. But let's start with the literal answer to your question.
你是对的,Python 的内置 csv 模块在处理混合数据类型方面非常原始,在导入时进行所有类型转换,即使在那里也有一个非常严格的选项菜单,这将破坏大多数现实世界的数据集(不一致引用和转义,布尔值和因子中的值缺失或不完整,Unicode 编码不匹配导致字段内出现幻像引号或转义字符,不完整的行将导致异常)。修复 csv 导入是pandas的无数好处之一。因此,您的最终答案确实是停止使用内置 csv 导入并开始使用熊猫。但是,让我们从您的问题的字面答案开始。
First you asked "How to convert strings to floats, on csv import". The answer to that is to open the csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)
as per the csv doc
首先,您问“如何在 csv 导入时将字符串转换为浮点数”。答案是csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)
根据csv 文档打开
csv.QUOTE_NONNUMERIC: Instructs the reader to convert all non-quoted fields to type float.
csv.QUOTE_NONNUMERIC:指示阅读器将所有未引用的字段转换为浮点型。
That works if you're ok with all unquoted fields (integer, float, text, Boolean etc.) being converted to float, which is generally a bad idea for many reasons (missing or NA values in Booleans or factors will get silently squelched). Moreover it will fail (throw exception) on unquoted text fields obviously. So it's brittle and needs to be protected with try..catch
.
如果您可以将所有未加引号的字段(整数、浮点数、文本、布尔值等)转换为浮点数,这将起作用,由于许多原因,这通常是一个坏主意(布尔值或因子中的缺失值或 NA 值将被静默压制) . 此外,它显然会在未引用的文本字段上失败(抛出异常)。所以它很脆弱,需要用try..catch
.
Then you asked: 'I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"'to which the crappy csv.reader solution is to open with csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)
然后你问:'我想整个问题实际上只是“使用 Python 以 .csv 或 excel 格式读取、组织和合成数据的最简单方法是什么?”'糟糕的 csv.reader 解决方案是用它打开的csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)
But as @geoffspear correctly replied 'The answer to your "overall question" may be "Pandas", although it's a bit vague.'
但正如@geoffspear 正确回答的那样,“您的“总体问题”的答案可能是“熊猫”,尽管它有点含糊。
回答by Antti Haapala
Try something like the following
尝试类似以下内容
import csv
def read_lines():
with open('testdata.csv', 'rU') as data:
reader = csv.reader(data)
for row in reader:
yield [ float(i) for i in row ]
for i in read_lines():
print(i)
# to get a list, instead of a generator, use
xy = list(read_lines())
As for the easiest way, then I suggest you see the xlrd
, xlwt
modules, personally I always have hard time with all the varying CSV formats.
至于最简单的方法,那么我建议您查看xlrd
,xlwt
模块,我个人总是很难处理所有不同的 CSV 格式。
回答by dawg
When converting a bunch of strings to floats, you should use a try/exceptto catch errors:
将一堆字符串转换为浮点数时,您应该使用try/except来捕获错误:
def conv(s):
try:
s=float(s)
except ValueError:
pass
return s
print [conv(s) for s in ['1.1','bls','1','nan', 'not a float']]
# [1.1, 'bls', 1.0, nan, 'not a float']
Notice that the strings that cannot be converted are simply passed through unchanged.
请注意,无法转换的字符串只是原封不动地通过。
A csv file IS a text file, so you should use a similar functionality:
csv 文件是文本文件,因此您应该使用类似的功能:
def readLines():
def conv(s):
try:
s=float(s)
except ValueError:
pass
return s
with open('testdata.csv', 'rU') as data:
reader = csv.reader(data)
for row in reader:
for cell in row:
y=conv(cell)
# do what ever with the single float
# OR
# yield [conv(cell) for cell in row] if you want to write a generator...
回答by Paul Becotte
for y in x:
print type(float(y)),
float(y) takes the value of y and returns a float based on it. It does not modify y- it returns a new object.
float(y) 获取 y 的值并基于它返回一个浮点数。它不修改 y - 它返回一个新对象。
y = float(y)
y = 浮动(y)
is more like what you are looking for- you have to modify the objects.
更像是您正在寻找的东西-您必须修改对象。