SQL R 等效于两个或多个字段/变量上的 SELECT DISTINCT

Question

提问by wahalulu

Say I have a dataframe df with two or more columns, is there an easy way to use unique()or other R function to create a subset of unique combinations of two or more columns?

假设我有一个包含两列或更多列的数据框 df，是否有一种简单的方法可以使用unique()或其他 R 函数来创建两列或更多列的唯一组合的子集？

I know I can use sqldf()and write an easy "SELECT DISTINCT var1, var2, ... varN"query, but I am looking for an R way of doing this.

我知道我可以使用sqldf()和编写一个简单的"SELECT DISTINCT var1, var2, ... varN"查询，但我正在寻找一种 R 方法来做到这一点。

It occurred to me to try ftablecoerced to a dataframeand use the field names, but I also get the cross tabulations of combinations that don't exist in the dataset:

我突然想到尝试将ftable强制转换为数据框并使用字段名称，但我也得到了数据集中不存在的组合的交叉表：

uniques <- as.data.frame(ftable(df$var1, df$var2))

Answer 1

回答by Marek

uniqueworks on data.frameso unique(df[c("var1","var2")])should be what you want.

unique工作，data.frame所以unique(df[c("var1","var2")])应该是你想要的。

Another option is distinctfrom dplyrpackage:

另一种选择distinct来自dplyr包：

df %>% distinct(var1, var2) # or distinct(df, var1, var2)

Note:

笔记：

For older versions of dplyr (< 0.5.0, 2016-06-24) distinctrequired additional step

对于旧版本的 dplyr ( < 0.5.0, 2016-06-24)distinct需要额外的步骤

df %>% select(var1, var2) %>% distinct

(or oldish way distinct(select(df, var1, var2))).

（或古老的方式distinct(select(df, var1, var2))）。

Answer 2

回答by Tjebo

@Marek's answer is obviously correct, but may be outdated. The current dplyrversion (0.7.4) allows for an even simpler code:

@Marek 的答案显然是正确的，但可能已经过时。当前dplyr版本 (0.7.4) 允许使用更简单的代码：

Simply use:

只需使用：

df %>% distinct(var1, var2)

If you want to keep all columns, add

如果要保留所有列，请添加

df %>% distinct(var1, var2, .keep_all = TRUE)

Answer 3

回答by sbaniwal

To KEEP all other variables in df use this:

要保留 df 中的所有其他变量，请使用以下命令：

unique_rows <- !duplicated(df[c("var1","var2")])

unique.df <- df[unique_rows,]

Another less recommended method is using row.names() #(see David's comment below):

另一种不太推荐的方法是使用 row.names() #（见下面大卫的评论）：

unique_rows <- row.names(unique(df[c("var1","var2")]))

unique.df <- df[unique_rows,]

Answer 4

回答by Zaki

In addition to answers above, the data.table version:

除了上面的答案，data.table 版本：

setDT(df)

unique_dt = unique(df, by = c('var1', 'var2'))

SQL R 等效于两个或多个字段/变量上的 SELECT DISTINCT

提问by wahalulu

回答by Marek

回答by Tjebo

回答by sbaniwal

回答by Zaki

相关推荐

最近更新

标签

SQL R 等效于两个或多个字段/变量上的 SELECT DISTINCT

提问by wahalulu

回答by Marek

回答by Tjebo

回答by sbaniwal

回答by Zaki

相关推荐

SQL 如何运行查询并将结果放入变量中

SQL 表名作为变量

将组合框值传递给 MS Access 中的 SQL 查询

SQL 生成从 1 到 100 的数字列表

相关推荐

最近更新

标签