string 将字符串转换为数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4931545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting string to numeric
提问by eliavs
I've imported a test file and tried to make a histogram
我导入了一个测试文件并尝试制作直方图
pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")
hist <- as.numeric(pichman$WS)
However, I get different numbers from values in my dataset. Originally I thought that this because I had text, so I deleted the text:
但是,我从数据集中的值中得到了不同的数字。本来我以为这是因为我有文字,所以我删除了文字:
table(pichman$WS)
ws <- pichman$WS[pichman$WS!="Down" & pichman$WS!="NoData"]
However, I am still getting very high numbers does anyone have an idea?
但是,我的数字仍然很高,有人知道吗?
回答by csgillespie
I suspect you are having a problem with factors. For example,
我怀疑你的因素有问题。例如,
> x = factor(4:8)
> x
[1] 4 5 6 7 8
Levels: 4 5 6 7 8
> as.numeric(x)
[1] 1 2 3 4 5
> as.numeric(as.character(x))
[1] 4 5 6 7 8
Some comments:
一些评论:
- You mention that your vector contains the characters "Down" and "NoData". What do expect/want
as.numeric
to do with these values? - In
read.csv
, try using the argumentstringsAsFactors=FALSE
- Are you sure it's
sep="/t
and notsep="\t"
- Use the command
head(pitchman)
to check the first fews rows of your data - Also, it's very tricky to guess what your problem is when you don't provide data. A minimal working example is always preferable. For example, I can't run the command
pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")
since I don't have access to the data set.
- 您提到您的向量包含字符“Down”和“NoData”。期望/想要
as.numeric
用这些值做什么? - 在 中
read.csv
,尝试使用参数stringsAsFactors=FALSE
- 你确定是
sep="/t
不是sep="\t"
- 使用命令
head(pitchman)
检查数据的前几行 - 此外,当您不提供数据时,很难猜测您的问题是什么。最小的工作示例总是可取的。例如,我无法运行命令,
pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")
因为我无权访问数据集。
回答by Joris Meys
As csgillespie said. stringsAsFactors is default on TRUE, which converts any text to a factor. So even after deleting the text, you still have a factor in your dataframe.
正如 csgillespie 所说。stringsAsFactors 默认为 TRUE,它将任何文本转换为因子。因此,即使删除文本后,您的数据框中仍然有一个因素。
Now regarding the conversion, there's a more optimal way to do so. So I put it here as a reference :
现在关于转换,有一种更优化的方法来做到这一点。所以我把它放在这里作为参考:
> x <- factor(sample(4:8,10,replace=T))
> x
[1] 6 4 8 6 7 6 8 5 8 4
Levels: 4 5 6 7 8
> as.numeric(levels(x))[x]
[1] 6 4 8 6 7 6 8 5 8 4
To show it works.
为了证明它有效。
The timings :
时间:
> x <- factor(sample(4:8,500000,replace=T))
> system.time(as.numeric(as.character(x)))
user system elapsed
0.11 0.00 0.11
> system.time(as.numeric(levels(x))[x])
user system elapsed
0 0 0
It's a big improvement, but not always a bottleneck. It gets important however if you have a big dataframe and a lot of columns to convert.
这是一个很大的改进,但并不总是一个瓶颈。但是,如果您有一个大数据框和许多要转换的列,这将变得很重要。