string as.numeric 带逗号小数点分隔符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15236440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
as.numeric with comma decimal separators?
提问by Fhnuzoag
I have a large vector of strings of the form:
我有一个大的字符串向量,形式如下:
Input = c("1,223", "12,232", "23,0")
etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input)
just outputs NA
.
等等。也就是说,用逗号分隔的小数,而不是句号。我想将此向量转换为数值向量。不幸的是,as.numeric(Input)
只是输出NA
.
My first instinct would be to go to strsplit
, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?
我的第一直觉是去strsplit
,但在我看来这可能会很慢。有没有人知道更快的选择?
There's an existing question that suggests read.csv2
, but the strings in question are not directly read in that way.
有一个现有的问题表明read.csv2
,但有问题的字符串不是以这种方式直接读取的。
回答by adibender
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
应该管用。
回答by sebastian-c
scan(text=Input, dec=",")
## [1] 1.223 12.232 23.000
But it depends on how long your vector is. I used rep(Input, 1e6)
to make a long vector and my machine just hangs. 1e4
is fine, though. @adibender's solution is much faster. If we run on 1e4, a lotfaster:
但这取决于您的矢量有多长。我曾经rep(Input, 1e6)
制作一个很长的矢量,我的机器就挂了。1e4
不过还好。@adibender 的解决方案要快得多。如果我们在 1e4 上运行,速度会快很多:
Unit: milliseconds
expr min lq median uq max neval
adibender() 6.777888 6.998243 7.119136 7.198374 8.149826 100
sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254 100
回答by Ricardo Saporta
Also, if you are reading in the raw data, the read.table
and all the associated functions have a dec
argument. eg:
此外,如果您正在读取原始数据,则read.table
和所有相关函数都有一个dec
参数。例如:
read.table("file.txt", dec=",")
When all else fails, gsub
and sub
are your friends.
当一切都失败了,gsub
而且sub
是你的朋友。
回答by tspano
The readr
package has a function to parse numbers from strings. You can set many options via the locale
argument.
该readr
包具有从字符串解析数字的功能。您可以通过locale
参数设置许多选项。
For comma as decimal separator you can write:
对于逗号作为小数点分隔符,您可以这样写:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
回答by Deena
Building on @adibender solution:
基于@adibender 解决方案:
input = '23,67'
as.numeric(gsub(
# ONLY for strings containing numerics, comma, numerics
"^([0-9]+),([0-9]+)$",
# Substitute by the first part, dot, second part
"\1.\2",
input
))
I guess that is a safer match...
我想这是一个更安全的匹配......
回答by Emiliano
As stated by , it's way easier to do this while importing a file.
Thw recently released readspackage has a very useful features, locale
, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",")
as argument.
如 所述,在导入文件时执行此操作要容易得多。Thw 最近发布的读取包有一个非常有用的功能,这里有locale
很好的解释,它允许用户使用逗号小数点作为参数导入数字。locale = locale(decimal_mark = ",")
回答by Sextus Empiricus
The answer by adibender does not work when there are multiple commas.
当有多个逗号时,adibender 的答案不起作用。
In that case the suggestion from use554546 and answer from Deena can be used.
在这种情况下,可以使用 use554546 的建议和 Deena 的回答。
Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))
ouput:
输出:
[1] 1223765 122325000 23054
The function gsub
replaces all occurances. The function sub
replaces only the first.
该函数gsub
替换所有出现的情况。该函数sub
仅替换第一个。