string 如何将多个字符列组合成 R 数据框中的单个列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21003311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 02:12:15  来源:igfitidea点击:

How to combine multiple character columns into a single column in an R data frame

stringr

提问by SamanthaDS

I am working with Census data and I need to combine four character columns into a single column.

我正在处理人口普查数据,我需要将四个字符列合并为一列。

Example:

例子:

LOGRECNO STATE COUNTY  TRACT BLOCK
    60    01    001  021100  1053
    61    01    001  021100  1054
    62    01    001  021100  1055
    63    01    001  021100  1056
    64    01    001  021100  1057
    65    01    001  021100  1058

I want to create a new column that adds the strings of STATE, COUNTY, TRACT, and BLOCK together into a single string. Example:

我想创建一个新列,将 STATE、COUNTY、TRACT 和 BLOCK 的字符串添加到一个字符串中。例子:

LOGRECNO STATE COUNTY  TRACT BLOCK  BLOCKID
    60    01    001  021100  1053   01001021101053
    61    01    001  021100  1054   01001021101054
    62    01    001  021100  1055   01001021101055
    63    01    001  021100  1056   01001021101056
    64    01    001  021100  1057   01001021101057
    65    01    001  021100  1058   01001021101058

I've tried:

我试过了:

AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT,    AL_Blocks$BLOCK), collapse = "")

But this combines all rows of all four columns into a single string.

但这将所有四列的所有行组合成一个字符串。

采纳答案by JAponte

Try this:

尝试这个:

AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))

there was a typo in County... it should've been COUNTY. Also, you don't need the collapse parameter.

县有错别字……应该是县。此外,您不需要collapse 参数。

I hope that helps.

我希望这有帮助。

回答by A5C1D2H2I1M1N2O1R2T1

You can use do.calland paste0. Try:

您可以使用do.callpaste0。尝试:

AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])

Example output:

示例输出:

do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"


You can also use unitefrom "tidyr", like this:

您也可以使用unitefrom "tidyr",如下所示:

library(tidyr)
library(dplyr)
AL_Blocks %>% 
  unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
#   LOGRECNO        BLOCK_ID STATE COUNTY  TRACT BLOCK
# 1       60 010010211001053    01    001 021100  1053
# 2       61 010010211001054    01    001 021100  1054
# 3       62 010010211001055    01    001 021100  1055
# 4       63 010010211001056    01    001 021100  1056
# 5       64 010010211001057    01    001 021100  1057
# 6       65 010010211001058    01    001 021100  1058

where "AL_Blocks" is provided as:

其中“AL_Blocks”提供为:

AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"), 
    STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001", 
    "001", "001", "001", "001"), TRACT = c("021100", "021100", "021100", 
    "021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
    "1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT", 
    "BLOCK"), class = "data.frame", row.names = c(NA, -6L))

回答by Kou

You can try this too

你也可以试试这个

AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
                       TRACT, BLOCK, sep = "")

回答by Sophia J

You can use tidyversepackage:

您可以使用tidyverse包:

DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)

回答by Freda K

Or try this

或者试试这个

DF$BLOCKID <-
  paste(DF$LOGRECNO, DF$STATE, DF$COUNTY, 
        DF$TRACT, DF$BLOCK, sep = "")

(Here is a method to set up the dataframe for people coming into this discussion later)

(这是一种为稍后参与此讨论的人设置数据框的方法)

DF <- 
  data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
             STATE = c(1, 1, 1, 1, 1, 1),
             COUNTY = c(1, 1, 1, 1, 1, 1), 
             TRACT = c(21100, 21100, 21100, 21100, 21100, 21100), 
             BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))

回答by user11300405

You can both WRITEand READText files with any specified "string-separator", not necessarily a character separator. This is very useful in many cases when the data has practically all terminal symbols, and thus, no 1 symbol can be used as a separator. Here are examples of readand writefunctions:

您可以使用任何指定的“字符串分隔符”(不一定是字符分隔符)写入读取文本文件。这在许多情况下非常有用,因为数据实际上具有所有终结符,因此没有 1 符号可以用作分隔符。以下是读取写入函数的示例:

WRITE OUT Special Separator Text:

写出特殊分隔符文本:

writeSepText <- function(df, fileName, separator) {
    con <- file(fileName)
    data <- apply(df, 1, paste, collapse = separator)
    # data
    data <- writeLines(data, con)
    close(con)
    return
}

Test Writing out text file separated by a string "bra_break_ket"

测试写出由字符串“bra_break_ket”分隔的文本文件

writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")

READ In text files with special separator string

READ 在带有特殊分隔符字符串的文本文件中

readSepText <- function(fileName, separator) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separator)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}

Test Reading in text file separated by

测试阅读文本文件中的分隔符

df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df