string 将数据帧中所有字符变量中的所有值从小写转换为大写

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16516593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:55:20  来源:igfitidea点击:

Convert from lowercase to uppercase all values in all character variables in dataframe

rstringuppercase

提问by user702432

I have a mixed dataframeof character and numeric variables.

我有一个字符和数字变量的混合数据框

city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,Male
Austin,1,5,,20,Female
Austin,2,2,,42,Female
Austin,2,1,,52,Male
Austin,2,3,,25,Male
Austin,2,4,,22,Female
Austin,3,3,,30,Female
Austin,3,1,,65,Female

I want to convert all the lower-case characters in the dataframe to uppercase. Is there any way to do this in one shot without doing it repeatedly over each character-variable?

我想将数据框中的所有小写字符转换为大写。有没有办法一次性做到这一点,而无需在每个字符变量上重复执行?

回答by juba

Starting with the following sample data :

从以下示例数据开始:

df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)

  v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n

You can use :

您可以使用 :

data.frame(lapply(df, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

Which gives :

这使 :

  v1 v2 v3
1  A  1  J
2  B  2  K
3  C  3  L
4  D  4  M
5  E  5  N

回答by Trenton Hoffman

From the dplyr package you can also use the mutate_all() function in combination with toupper(). This will affect both character and factor classes.

从 dplyr 包中,您还可以将 mutate_all() 函数与 toupper() 结合使用。这将影响字符和因子类。

library(dplyr)
df <- mutate_all(df, funs=toupper)

回答by Shalini Baranwal

It simple with apply function in R

R中的apply函数很简单

f <- apply(f,2,toupper)

No need to check if the column is character or any other type.

无需检查列是字符还是任何其他类型。

回答by OFish

A side comment here for those using any of these answers. Juba's answer is great, as it's very selective if your variables are either numberic or character strings. If however, you have a combination (e.g. a1, b1, a2, b2) etc. It will not convert the characters properly.

对于使用这些答案中的任何一个的人,这里有一个旁注。Juba 的回答很好,因为如果您的变量是数字或字符串,它的选择性非常大。但是,如果您有组合(例如 a1、b1、a2、b2)等。它不会正确转换字符。

As @Trenton Hoffman notes,

正如@Trenton Hoffman 指出的那样,

library(dplyr)
df <- mutate_each(df, funs(toupper))

affects both character and factor classes and works for "mixed variables"; e.g. if your variable contains both a character and a numberic value (e.g. a1) both will be converted to a factor. Overall this isn't too much of a concern, but if you end up wanting match data.frames for example

影响字符和因子类并适用于“混合变量”;例如,如果您的变量同时包含一个字符和一个数字值(例如 a1),两者都将被转换为一个因子。总的来说,这不是什么大问题,但如果你最终想要匹配 data.frames 例如

df3 <- df1[df1$v1 %in% df2$v1,]

where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems. The work around is that you briefly have to run

其中 df1 已被转换并且 df2 包含未转换的 data.frame 或类似的,这可能会导致一些问题。解决方法是您必须短暂地运行

df2 <- df2 %>% mutate_each(funs(toupper), v1)
#or
df2 <- df2 %>% mutate_each(df2, funs(toupper))
#and then
df3 <- df1[df1$v1 %in% df2$v1,]

If you work with genomic data, this is when knowing this can come in handy.

如果您使用基因组数据,这就是知道这可以派上用场的时候。

回答by mmann1123

If you need to deal with data.frames that include factors you can use:

如果您需要处理包含因素的 data.frames,您可以使用:

df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)

df
    v1 v2 v3 v4        v5
    1  a  1  j  a 0.1774909
    2  b  2  k  b 0.4405019
    3  c  3  l  c 0.7042878
    4  d  4  m  d 0.8829965
    5  e  5  n  e 0.9702505


sapply(df,class)
         v1          v2          v3          v4          v5
"character"   "integer" "character"    "factor"   "numeric"

Use mutate_each_ to convert factors to character then convert all to uppercase

使用 mutate_each_ 将因子转换为字符,然后全部转换为大写

   upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
   mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))}   # convert factor to character then uppercase

Gives

  upper_it(df)
      v1 v2 v3 v4
    1  A  1  J  A
    2  B  2  K  B
    3  C  3  L  C
    4  D  4  M  D
    5  E  5  N  E

While

尽管

sapply( upper_it(df),class)
         v1          v2          v3          v4          v5
"character"   "integer" "character" "character"   "numeric"

回答by André Cordeiro Valério

Another alternative is to use a combination of mutate_if() and str_to_uper() function, both from the tidyverse package:

另一种选择是使用 mutate_if() 和 str_to_uper() 函数的组合,两者都来自 tidyverse 包:

df %>% mutate_if(is.character, str_to_upper) -> df

This will convert all string variables in the data frame to upper case. str_to_lower() do the opposite.

这会将数据框中的所有字符串变量转换为大写。str_to_lower() 做相反的事情。

回答by Vaibhav Kabdwal

Alternatively, if you just want to convert one particular row to uppercase, use the code below:

或者,如果您只想将特定行转换为大写,请使用以下代码:

df[[1]] <- toupper(df[[1]])

df[[1]] <- toupper(df[[1]])