获取多列的唯一值作为 Pandas 中的新数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48131812/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get unique values of multiple columns as a new dataframe in pandas
提问by Ofek Ron
Having pandas data frame df
with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?
拥有df
至少包含 C1、C2、C3 列的 Pandas数据框,您如何将所有唯一的 C1、C2、C3 值作为新的 DataFrame 获取?
in other words, similiar to :
换句话说,类似于:
SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3
Tried that
试过了
print df.groupby(by=['C1','C2','C3'])
but im getting
但我得到
<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>
回答by jezrael
I believe you need drop_duplicates
if want all unique triples:
我相信你需要drop_duplicates
如果想要所有独特的三元组:
df = df.drop_duplicates(subset=['C1','C2','C3'])
If want use groupby
add first
:
如果要使用groupby
添加first
:
df = df.groupby(by=['C1','C2','C3'], as_index=False).first()