获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22342568/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
getting the unique values of every column in a pandas dataframe - to help me create smaller more manageable dataframes to perform metrics on
提问by yoshiserry
I started off wanting to turn a column from a pandas dataframe into a list, and then get the unique values, with the aim of iterating over those unique values in a for loop, and creating a few smaller dataframes. I.e. one for each cluster. Then I want to store these smaller dataframes in a dictionary object.
我开始想将 Pandas 数据帧中的一列转换为列表,然后获取唯一值,目的是在 for 循环中迭代这些唯一值,并创建一些较小的数据帧。即每个集群一个。然后我想将这些较小的数据帧存储在字典对象中。
@ben suggested I start a new question and ask about the GroupBy Method of pandas dataframes to perform this task?
@ben 建议我开始一个新问题并询问 Pandas 数据帧的 GroupBy 方法来执行此任务?
My original post is over here: get list from pandas dataframe column
我的原始帖子在这里: 从Pandas数据框列中获取列表
My Data: 
cluster load_date   budget  actual  fixed_price
A   1/1/2014    1000    4000    Y
A   2/1/2014    12000   10000   Y
A   3/1/2014    36000   2000    Y
B   4/1/2014    15000   10000   N
B   4/1/2014    12000   11500   N
B   4/1/2014    90000   11000   N
C   7/1/2014    22000   18000   N
C   8/1/2014    30000   28960   N
C   9/1/2014    53000   51200   N
For example: for item in cluster_list(where cluster list is the unique set of values in cluster)
例如:对于 cluster_list 中的项目(其中集群列表是集群中唯一的一组值)
create a dataframe for cluster a, where budget > X etc
Then do the same for the other clusters, and put them in a dictionary.
然后对其他集群做同样的事情,并将它们放入字典中。
Then be able to get a certain dataframe out of the dictionary, say only the dataframe for cluster B where budget > X
然后能够从字典中获取某个数据帧,仅说预算 > X 的集群 B 的数据帧
GetDf(key):
  return dict(key)
Thanks in advance
提前致谢
回答by Andy Hayden
There's two parts to this question. First, filter those columns where budget < X:
这个问题有两个部分。首先,过滤那些预算 < X 的列:
In [11]: df1 = df[df['budget'] > 10000]
In [12]: df1
Out[12]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y
3       B  4/1/2014   15000   10000           N
4       B  4/1/2014   12000   11500           N
5       B  4/1/2014   90000   11000           N
6       C  7/1/2014   22000   18000           N
7       C  8/1/2014   30000   28960           N
8       C  9/1/2014   53000   51200           N
Now you can groupby cluster, and get the groups:
现在您可以按集群分组,并获取组:
In [13]: g = df1.groupby('cluster')
In [14]: g.get_group('A')
Out[14]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y
Note: if you reallywant a dictionary then you can use:
注意:如果你真的想要一本字典,那么你可以使用:
In [15]: d = dict(iter(g))
In [16]: d['A']
Out[16]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y

