获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标

Question

提问by yoshiserry

I started off wanting to turn a column from a pandas dataframe into a list, and then get the unique values, with the aim of iterating over those unique values in a for loop, and creating a few smaller dataframes. I.e. one for each cluster. Then I want to store these smaller dataframes in a dictionary object.

我开始想将 Pandas 数据帧中的一列转换为列表，然后获取唯一值，目的是在 for 循环中迭代这些唯一值，并创建一些较小的数据帧。即每个集群一个。然后我想将这些较小的数据帧存储在字典对象中。

@ben suggested I start a new question and ask about the GroupBy Method of pandas dataframes to perform this task?

@ben 建议我开始一个新问题并询问 Pandas 数据帧的 GroupBy 方法来执行此任务？

My original post is over here: get list from pandas dataframe column

我的原始帖子在这里：从Pandas数据框列中获取列表

My Data: 
cluster load_date   budget  actual  fixed_price
A   1/1/2014    1000    4000    Y
A   2/1/2014    12000   10000   Y
A   3/1/2014    36000   2000    Y
B   4/1/2014    15000   10000   N
B   4/1/2014    12000   11500   N
B   4/1/2014    90000   11000   N
C   7/1/2014    22000   18000   N
C   8/1/2014    30000   28960   N
C   9/1/2014    53000   51200   N

For example: for item in cluster_list(where cluster list is the unique set of values in cluster)

例如：对于 cluster_list 中的项目（其中集群列表是集群中唯一的一组值）

create a dataframe for cluster a, where budget > X etc

Then do the same for the other clusters, and put them in a dictionary.

然后对其他集群做同样的事情，并将它们放入字典中。

Then be able to get a certain dataframe out of the dictionary, say only the dataframe for cluster B where budget > X

然后能够从字典中获取某个数据帧，仅说预算 > X 的集群 B 的数据帧

GetDf(key):
  return dict(key)

Thanks in advance

提前致谢

Answer 1

回答by Andy Hayden

There's two parts to this question. First, filter those columns where budget < X:

这个问题有两个部分。首先，过滤那些预算 < X 的列：

In [11]: df1 = df[df['budget'] > 10000]

In [12]: df1
Out[12]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y
3       B  4/1/2014   15000   10000           N
4       B  4/1/2014   12000   11500           N
5       B  4/1/2014   90000   11000           N
6       C  7/1/2014   22000   18000           N
7       C  8/1/2014   30000   28960           N
8       C  9/1/2014   53000   51200           N

Now you can groupby cluster, and get the groups:

现在您可以按集群分组，并获取组：

In [13]: g = df1.groupby('cluster')

In [14]: g.get_group('A')
Out[14]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y

Note: if you reallywant a dictionary then you can use:

注意：如果你真的想要一本字典，那么你可以使用：

In [15]: d = dict(iter(g))

In [16]: d['A']
Out[16]:
  cluster load_date  budget  actual fixed_price
1       A  2/1/2014   12000   10000           Y
2       A  3/1/2014   36000    2000           Y

获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标

提问by yoshiserry

回答by Andy Hayden

相关推荐

最近更新

标签

获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标

提问by yoshiserry

回答by Andy Hayden

相关推荐

pandas 如何从另一个数据框中用一行减去数据框中的所有行？

python pandas datetime.time - datetime.time

pandas matplotlib 中带有字符串数组的散点图

pandas 熊猫将 dtype 对象转换为字符串

相关推荐

最近更新

标签