获取多列的唯一值作为 Pandas 中的新数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48131812/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:01:16  来源:igfitidea点击:

Get unique values of multiple columns as a new dataframe in pandas

pythonpandaspandas-groupby

提问by Ofek Ron

Having pandas data frame dfwith at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?

拥有df至少包含 C1、C2、C3 列的 Pandas数据框,您如何将所有唯一的 C1、C2、C3 值作为新的 DataFrame 获取?

in other words, similiar to :

换句话说,类似于:

SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3

Tried that

试过了

print df.groupby(by=['C1','C2','C3'])

but im getting

但我得到

<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>

回答by jezrael

I believe you need drop_duplicatesif want all unique triples:

我相信你需要drop_duplicates如果想要所有独特的三元组:

df = df.drop_duplicates(subset=['C1','C2','C3'])

If want use groupbyadd first:

如果要使用groupby添加first

df = df.groupby(by=['C1','C2','C3'], as_index=False).first()