string 将字符串转换为数字

Question

提问by eliavs

I've imported a test file and tried to make a histogram

我导入了一个测试文件并尝试制作直方图

pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")   
hist <- as.numeric(pichman$WS)

However, I get different numbers from values in my dataset. Originally I thought that this because I had text, so I deleted the text:

但是，我从数据集中的值中得到了不同的数字。本来我以为这是因为我有文字，所以我删除了文字：

table(pichman$WS)    
ws <- pichman$WS[pichman$WS!="Down" & pichman$WS!="NoData"]

However, I am still getting very high numbers does anyone have an idea?

但是，我的数字仍然很高，有人知道吗？

Answer 1

回答by csgillespie

I suspect you are having a problem with factors. For example,

我怀疑你的因素有问题。例如，

> x = factor(4:8)
> x
[1] 4 5 6 7 8
Levels: 4 5 6 7 8
> as.numeric(x)
[1] 1 2 3 4 5
> as.numeric(as.character(x))
[1] 4 5 6 7 8

Some comments:

一些评论：

You mention that your vector contains the characters "Down" and "NoData". What do expect/want as.numericto do with these values?
In read.csv, try using the argument stringsAsFactors=FALSE
Are you sure it's sep="/tand not sep="\t"
Use the command head(pitchman)to check the first fews rows of your data
Also, it's very tricky to guess what your problem is when you don't provide data. A minimal working example is always preferable. For example, I can't run the command pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")since I don't have access to the data set.

您提到您的向量包含字符“Down”和“NoData”。期望/想要as.numeric用这些值做什么？
在中read.csv，尝试使用参数stringsAsFactors=FALSE
你确定是sep="/t不是sep="\t"
使用命令head(pitchman)检查数据的前几行
此外，当您不提供数据时，很难猜测您的问题是什么。最小的工作示例总是可取的。例如，我无法运行命令，pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")因为我无权访问数据集。

Answer 2

回答by Joris Meys

As csgillespie said. stringsAsFactors is default on TRUE, which converts any text to a factor. So even after deleting the text, you still have a factor in your dataframe.

正如 csgillespie 所说。stringsAsFactors 默认为 TRUE，它将任何文本转换为因子。因此，即使删除文本后，您的数据框中仍然有一个因素。

Now regarding the conversion, there's a more optimal way to do so. So I put it here as a reference :

现在关于转换，有一种更优化的方法来做到这一点。所以我把它放在这里作为参考：

> x <- factor(sample(4:8,10,replace=T))
> x
 [1] 6 4 8 6 7 6 8 5 8 4
Levels: 4 5 6 7 8
> as.numeric(levels(x))[x]
 [1] 6 4 8 6 7 6 8 5 8 4

To show it works.

为了证明它有效。

The timings :

时间：

> x <- factor(sample(4:8,500000,replace=T))
> system.time(as.numeric(as.character(x)))
   user  system elapsed 
   0.11    0.00    0.11 
> system.time(as.numeric(levels(x))[x])
   user  system elapsed 
      0       0       0

It's a big improvement, but not always a bottleneck. It gets important however if you have a big dataframe and a lot of columns to convert.

这是一个很大的改进，但并不总是一个瓶颈。但是，如果您有一个大数据框和许多要转换的列，这将变得很重要。

string 将字符串转换为数字

提问by eliavs

回答by csgillespie

回答by Joris Meys

相关推荐

最近更新

标签

string 将字符串转换为数字

提问by eliavs

回答by csgillespie

回答by Joris Meys

相关推荐

string 在 Perl 中识别空字符串

oracle 安装 sql loader linux

string 给定一个文件，尽可能高效地找出出现频率最高的十个词

Oracle Apex 5.0 - 显示静态图像

相关推荐

最近更新

标签