Python 将逗号小数点分隔符转换为 Dataframe 中的点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31700691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:26:57  来源:igfitidea点击:

Convert commas decimal separators to dots within a Dataframe

pythonpandascsvdelimiterseparator

提问by Nautilius

I am importing a CSV file like the one below, using pandas.read_csv:

我正在导入一个像下面这样的 CSV 文件,使用pandas.read_csv

df = pd.read_csv(Input, delimiter=";")

Example of CSV file:

CSV 文件示例:

10;01.02.2015 16:58;01.02.2015 16:58;-0.59;0.1;-4.39;NotApplicable;0.79;0.2
11;01.02.2015 16:58;01.02.2015 16:58;-0.57;0.2;-2.87;NotApplicable;0.79;0.21

The problem is that when I later on in my code try to use these values I get this error: TypeError: can't multiply sequence by non-int of type 'float'

问题是,当我稍后在我的代码中尝试使用这些值时,我收到此错误: TypeError: can't multiply sequence by non-int of type 'float'

The error is because the number I'm trying to use is not written with a dot (.) as a decimal separator but a comma(,). After manually changing the commas to a dots my program works.

错误是因为我尝试使用的数字不是用点 ( .) 作为小数点分隔符写的,而是用逗号 ( ,) 写的。手动将逗号更改为点后,我的程序就可以工作了。

I can't change the format of my input, and thus have to replace the commas in my DataFrame in order for my code to work, and I want python to do this without the need of doing it manually. Do you have any suggestions?

我无法更改我的输入格式,因此必须替换我的 DataFrame 中的逗号才能使我的代码正常工作,我希望 python 无需手动执行此操作。你有什么建议吗?

采纳答案by stellasia

pandas.read_csvhas a decimalparameter for this: doc

pandas.read_csv有一个decimal参数:doc

I.e. try with:

即尝试:

df = pd.read_csv(Input, delimiter=";", decimal=",")

回答by Lo_

I think the earlier mentioned answer of including decimal=","in pandas read_csv is the preferred option.

我认为前面提到的包含decimal=","在熊猫 read_csv中的答案是首选。

However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=, read_csv will fall back to this engine and thus you can't use skiprow=and decimal=in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal=statement to work (probably due to me though)

但是,我发现它与 Python 解析引擎不兼容。例如,在使用时skiprow=, read_csv 将退回到该引擎,因此据我所知,您不能在相同的 read_csv 语句中使用skiprow=decimal=。另外,我实际上无法使decimal=语句起作用(不过可能是由于我)

The long way round I used to achieving the same result is with list comprehensions, .replaceand .astype. The major downside to this method is that it needs to be done one column at a time:

我用来实现相同结果的很长一段路是使用列表推导式.replace.astype. 这种方法的主要缺点是它需要一次完成一列:

df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'], 
                'b': ['51,23', '18,45', '28,90', '133,00']})

df['a'] = [x.replace(',', '.') for x in df['a']]

df['a'] = df['a'].astype(float)

Now, column a will have float type cells. Column b still contains strings.

现在,a 列将具有浮点型单元格。b 列仍然包含字符串。

Note that the .replaceused here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.

请注意,.replace这里使用的不是 pandas 而是 Python 的内置版本。Pandas 的版本要求字符串是精确匹配或正则表达式。

回答by hhh

I answer to the question about how to change the decimal commato the decimal dotwith Python Pandas.

我回答了有关如何使用 Python Pandascomma将小数更改为小数的问题dot

$ cat test.py 
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')

where we specify the reading in decimal separator as comma while the output separator is specified as dot. So

我们将十进制分隔符中的读数指定为逗号,而输出分隔符指定为点。所以

$ cat test.csv 
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv 
,header,header2
0,1,2.1
1,3,4.0

where you see that the separator has changed to dot.

您可以看到分隔符已更改为点。