string 如何从 R 向量中的每个元素中删除最后 n 个字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23413331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 02:16:22  来源:igfitidea点击:

How to remove last n characters from every element in the R vector

rstring

提问by LucasSeveryn

I am very new to R, and I could not find a simple example online of how to remove the last n characters from every element of a vector (array?)

我对 R 很陌生,我在网上找不到一个简单的例子,说明如何从向量(数组?)的每个元素中删除最后 n 个字符

I come from a Java background, so what I would like to do is to iterate over every element of a$dataand remove the last 3 characters from every element.

我来自 Java 背景,所以我想做的是迭代每个元素a$data并从每个元素中删除最后 3 个字符。

How would you go about it?

你会怎么做?

回答by nfmcclure

Here is an example of what I would do. I hope it's what you're looking for.

这是我会做的一个例子。我希望这就是你要找的。

char_array = c("foo_bar","bar_foo","apple","beer")
a = data.frame("data"=char_array,"data2"=1:4)
a$data = substr(a$data,1,nchar(a$data)-3)

a should now contain:

a 现在应该包含:

  data data2
1 foo_ 1
2 bar_ 2
3   ap 3
4    b 4

回答by Matthew Plourde

Here's a way with gsub:

这是一种方法gsub

cs <- c("foo_bar","bar_foo","apple","beer")
gsub('.{3}$', '', cs)
# [1] "foo_" "bar_" "ap"   "b"

回答by Blaszard

Although this is mostly the same with the answer by @nfmcclure, I prefer using stringrpackage as it provdies a set of functions whose names are most consistent and descriptive than those in base R (in fact I always google for "how to get the number of characters in R"as I can't remember the name nchar()).

尽管这与@nfmcclure 的答案大致相同,但我更喜欢使用stringrpackage,因为它提供了一组函数,其名称与基础 R 中的函数名称最为一致和具有描述性(事实上,我总是在谷歌上搜索“如何获得R 中的字符”,因为我不记得名字了nchar())。

library(stringr)
str_sub(iris$Species, end=-4)
#or 
str_sub(iris$Species, 1, str_length(iris$Species)-3)

This removes the last 3 characters from each value at Speciescolumn.

这将从Species列的每个值中删除最后 3 个字符。

回答by gagolews

The same may be achieved with the stringipackage:

使用stringi包也可以实现相同的效果

library('stringi')
char_array <- c("foo_bar","bar_foo","apple","beer")
a <- data.frame("data"=char_array, "data2"=1:4)
(a$data <- stri_sub(a$data, 1, -4)) # from the first to the last but 4th char
## [1] "foo_" "bar_" "ap"   "b" 

回答by krads

Similar to @Matthew_Plourde using gsub

类似于@Matthew_Plourde 使用 gsub

However, using a pattern that will trim to zero characters i.e. return "" if the original string is shorterthan the number of characters to cut:

但是,如果原始字符串于要剪切的字符数,则使用将修剪为零字符的模式,即返回“” :

cs <- c("foo_bar","bar_foo","apple","beer","so","a")
gsub('.{0,3}$', '', cs)
# [1] "foo_" "bar_" "ap"   "b"    ""    ""

Difference is, {0,3}quantifier indicates 0 to 3 matches, whereas {3}requires exactly 3 matches otherwise no match is found in which case gsubreturns the original, unmodified string.

区别在于,{0,3}量词表示 0 到 3 个匹配,而{3}需要正好 3 个匹配,否则找不到匹配,在这种情况下gsub返回原始的、未修改的字符串。

N.B. using {,3}would be equivalent to {0,3}, I simply prefer the latter notation.

NB using{,3}等价于{0,3},我更喜欢后一种表示法。

See here for more information on regex quantifiers: https://www.regular-expressions.info/refrepeat.html

有关正则表达式量词的更多信息,请参见此处:https: //www.regular-expressions.info/refrepeat.html