string as.numeric 带逗号小数点分隔符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15236440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:50:01  来源:igfitidea点击:

as.numeric with comma decimal separators?

stringrnumber-formatting

提问by Fhnuzoag

I have a large vector of strings of the form:

我有一个大的字符串向量,形式如下:

Input = c("1,223", "12,232", "23,0")

etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input)just outputs NA.

等等。也就是说,用逗号分隔的小数,而不是句号。我想将此向量转换为数值向量。不幸的是,as.numeric(Input)只是输出NA.

My first instinct would be to go to strsplit, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?

我的第一直觉是去strsplit,但在我看来这可能会很慢。有没有人知道更快的选择?

There's an existing question that suggests read.csv2, but the strings in question are not directly read in that way.

有一个现有的问题表明read.csv2,但有问题的字符串不是以这种方式直接读取的。

回答by adibender

as.numeric(sub(",", ".", Input, fixed = TRUE))

should work.

应该管用。

回答by sebastian-c

scan(text=Input, dec=",")
## [1]  1.223 12.232 23.000

But it depends on how long your vector is. I used rep(Input, 1e6)to make a long vector and my machine just hangs. 1e4is fine, though. @adibender's solution is much faster. If we run on 1e4, a lotfaster:

但这取决于您的矢量有多长。我曾经rep(Input, 1e6)制作一个很长的矢量,我的机器就挂了。1e4不过还好。@adibender 的解决方案要快得多。如果我们在 1e4 上运行,速度会快很多

Unit: milliseconds
         expr        min         lq     median         uq        max neval
  adibender()   6.777888   6.998243   7.119136   7.198374   8.149826   100
 sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254   100

回答by Ricardo Saporta

Also, if you are reading in the raw data, the read.tableand all the associated functions have a decargument. eg:

此外,如果您正在读取原始数据,则read.table和所有相关函数都有一个dec参数。例如:

read.table("file.txt", dec=",")

When all else fails, gsuband subare your friends.

当一切都失败了,gsub而且sub是你的朋友。

回答by tspano

The readrpackage has a function to parse numbers from strings. You can set many options via the localeargument.

readr包具有从字符串解析数字的功能。您可以通过locale参数设置许多选项。

For comma as decimal separator you can write:

对于逗号作为小数点分隔符,您可以这样写:

readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))

回答by Deena

Building on @adibender solution:

基于@adibender 解决方案:

input = '23,67'
as.numeric(gsub(
                # ONLY for strings containing numerics, comma, numerics
                "^([0-9]+),([0-9]+)$", 
                # Substitute by the first part, dot, second part
                "\1.\2", 
                input
                ))

I guess that is a safer match...

我想这是一个更安全的匹配......

回答by Emiliano

As stated by , it's way easier to do this while importing a file. Thw recently released readspackage has a very useful features, locale, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",")as argument.

如 所述,在导入文件时执行此操作要容易得多。Thw 最近发布的读取包有一个非常有用的功能,这里有locale很好的解释,它允许用户使用逗号小数点作为参数导入数字。locale = locale(decimal_mark = ",")

回答by Sextus Empiricus

The answer by adibender does not work when there are multiple commas.

当有多个逗号时,adibender 的答案不起作用。

In that case the suggestion from use554546 and answer from Deena can be used.

在这种情况下,可以使用 use554546 的建议和 Deena 的回答。

Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))

ouput:

输出:

[1] 1223765 122325000 23054

The function gsubreplaces all occurances. The function subreplaces only the first.

该函数gsub替换所有出现的情况。该函数sub仅替换第一个。