Python 从 Pandas 中的系列创建一个集合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39551566/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:27:15  来源:igfitidea点击:

Create a set from a series in pandas

pythonpandasdataframeserieskaggle

提问by Julio Arriaga

I have a dataframe extracted from Kaggle's San Fransico Salaries: https://www.kaggle.com/kaggle/sf-salariesand I wish to create a set of the values of a column, for instance 'Status'.

我有一个从 Kaggle 的 San Fransico Salaries 中提取的数据框:https://www.kaggle.com/kaggle/sf-salaries ,我希望创建一组列的值,例如“状态”。

This is what I have tried but it brings a list of all the records instead of the set (sf is how I name the data frame).

这是我尝试过的,但它带来了所有记录的列表而不是集合(sf 是我命名数据框的方式)。

a=set(sf['Status'])
print a

According to this webpage, this should work. How to construct a set out of list items in python?

根据这个网页,这应该有效。 如何在python中构造一组列表项?

回答by grechut

If you only need to get list of unique values, you can just use uniquemethod. If you want to have Python's set, then do set(some_series)

如果您只需要获取唯一值列表,则可以使用unique方法。如果您想设置 Python,请执行以下操作set(some_series)

In [1]: s = pd.Series([1, 2, 3, 1, 1, 4])

In [2]: s.unique()
Out[2]: array([1, 2, 3, 4])

In [3]: set(s)
Out[3]: {1, 2, 3, 4}

However, if you have DataFrame, just select series out of it ( some_data_frame['<col_name>']).

但是,如果您有 DataFrame,只需从中选择系列 ( some_data_frame['<col_name>'])。

回答by Adrien Pacifico

With large size serieswith duplicates the set(some_series)execution-time will evolve exponentially with seriessize.

对于series具有重复项的大尺寸,set(some_series)执行时间将随series尺寸呈指数级变化。

Better practice would be to set(some_series.unique()).

更好的做法是将set(some_series.unique()).

显示 x16 执行时间的简单示例。enter image description here在此处输入图片说明