列出 Pandas 数据框中的唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47933213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
List unique values in a Pandas dataframe
提问by Aerin
I know that
我知道
df.name.unique()
will give unique values in ONE column 'name'
.
将在 ONE 列中给出唯一值'name'
。
For example:
例如:
name report year
Coch Jason 2012
Pima Molly 2012
Santa Tina 2013
Mari Jake 2014
Yuma Amy 2014
array(['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], dtype=object)
However, let's say I have ~1000 columns and I want to see all columns' unique values all together.
但是,假设我有大约 1000 列,我想一起查看所有列的唯一值。
How do I do it?
我该怎么做?
采纳答案by root
Using a dictionary comprehension with unique
:
使用字典理解unique
:
pd.Series({c: df[c].unique() for c in df})
The resulting output:
结果输出:
name [Coch, Pima, Santa, Mari, Yuma]
report [Jason, Molly, Tina, Jake, Amy]
year [2012, 2013, 2014]
回答by Bryce Ramgovind
If you would like to have you results in a list you can do something like this
如果你想让你的结果在一个列表中,你可以做这样的事情
[df[col_name].unique() for col_name in df.columns]
out:
出去:
[array(['Coch', 'Pima', 'Santa', 'Mari', 'Yuma'], dtype=object),
array(['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], dtype=object),
array([2012, 2013, 2014])]
This will create a 2D list of array, where every row is a unique array of values in each column.
这将创建一个二维数组列表,其中每一行都是每列中唯一的值数组。
If you would like a 2D list of lists, you can modify the above to
如果你想要一个二维列表列表,你可以修改上面的
[df[i].unique().tolist() for i in df.columns]
out:
出去:
[['Coch', 'Pima', 'Santa', 'Mari', 'Yuma'],
['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
[2012, 2013, 2014]]
回答by YOBEN_S
You can using set
你可以使用 set
list(map(set,df.values.T))
Out[978]:
[{'Coch', 'Mari', 'Pima', 'Santa', 'Yuma'},
{'Amy', 'Jake', 'Jason', 'Molly', 'Tina'},
{2012, 2013, 2014}]
After put into Series
放入系列后
pd.Series(list(map(set,df.values.T)),index=df.columns)
Out[980]:
name {Santa, Pima, Yuma, Coch, Mari}
report {Jason, Amy, Jake, Tina, Molly}
year {2012, 2013, 2014}
dtype: object
回答by álvaro Salgado
I did the following. This gets all unique values from all columns in a dataframe into one set.
我做了以下事情。这将数据框中所有列的所有唯一值合并为一组。
unique_values = set()
for col in df:
unique_values.update(df[col])