获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22342568/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:48:24  来源:igfitidea点击:

getting the unique values of every column in a pandas dataframe - to help me create smaller more manageable dataframes to perform metrics on

pythonpandas

提问by yoshiserry

I started off wanting to turn a column from a pandas dataframe into a list, and then get the unique values, with the aim of iterating over those unique values in a for loop, and creating a few smaller dataframes. I.e. one for each cluster. Then I want to store these smaller dataframes in a dictionary object.

我开始想将 Pandas 数据帧中的一列转换为列表,然后获取唯一值,目的是在 for 循环中迭代这些唯一值,并创建一些较小的数据帧。即每个集群一个。然后我想将这些较小的数据帧存储在字典对象中。

@ben suggested I start a new question and ask about the GroupBy Method of pandas dataframes to perform this task?

@ben 建议我开始一个新问题并询问 Pandas 数据帧的 GroupBy 方法来执行此任务?

My original post is over here: get list from pandas dataframe column

我的原始帖子在这里: 从Pandas数据框列中获取列表

My Data: 
cluster load_date   budget  actual  fixed_price
A   1/1/2014    1000    4000    Y
A   2/1/2014    12000   10000   Y
A   3/1/2014    36000   2000    Y
B   4/1/2014    15000   10000   N
B   4/1/2014    12000   11500   N
B   4/1/2014    90000   11000   N
C   7/1/2014    22000   18000   N
C   8/1/2014    30000   28960   N
C   9/1/2014    53000   51200   N

For example: for item in cluster_list(where cluster list is the unique set of values in cluster)

例如:对于 cluster_list 中的项目(其中集群列表是集群中唯一的一组值)

create a dataframe for cluster a, where budget > X etc

Then do the same for the other clusters, and put them in a dictionary.

然后对其他集群做同样的事情,并将它们放入字典中。

Then be able to get a certain dataframe out of the dictionary, say only the dataframe for cluster B where budget > X

然后能够从字典中获取某个数据帧,仅说预算 > X 的集群 B 的数据帧

GetDf(key):
  return dict(key)

Thanks in advance

提前致谢

回答by Andy Hayden

There's two parts to this question. First, filter those columns where budget < X:

这个问题有两个部分。首先,过滤那些预算 < X 的列:

In [11]: df1 = df[df['budget'] > 10000]

In [12]: df1
Out[12]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y
3       B  4/1/2014   15000   10000           N
4       B  4/1/2014   12000   11500           N
5       B  4/1/2014   90000   11000           N
6       C  7/1/2014   22000   18000           N
7       C  8/1/2014   30000   28960           N
8       C  9/1/2014   53000   51200           N

Now you can groupby cluster, and get the groups:

现在您可以按集群分组,并获取组:

In [13]: g = df1.groupby('cluster')

In [14]: g.get_group('A')
Out[14]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y

Note: if you reallywant a dictionary then you can use:

注意:如果你真的想要一本字典,那么你可以使用:

In [15]: d = dict(iter(g))

In [16]: d['A']
Out[16]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y