Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40498463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values
提问by Sarah
I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:
我有一个大型数据集,列出了在全国不同地区销售的竞争对手产品。我希望通过使用这些新数据帧名称中的列值的迭代过程将此数据帧拆分为其他几个基于区域的数据帧,以便我可以分别处理每个数据帧 - 例如按价格对每个区域的信息进行排序以了解每个市场的情况。我给出了以下数据的简化版本:
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp1 B £11 £16
Comp1 C £11 £15
Comp2 A £9 £16
Comp2 B £12 £14
Comp2 C £14 £17
Comp3 A £11 £16
Comp3 B £10 £15
Comp3 C £12 £15
I can create a list of the regions using the below:
我可以使用以下内容创建区域列表:
region_list=df['Region'].unique().tolist()
Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.
我希望在产生许多数据帧的迭代循环中使用它,例如
df_A :
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp2 A £9 £16
Comp3 A £11 £16
I could do this manually for each region, with the code
我可以使用代码为每个区域手动执行此操作
df_A=df.loc[df['Region']==A]
but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.
但实际情况是,这个数据集有大量区域,这会使代码变得乏味。有没有办法创建一个迭代循环来复制这个?有一个类似的问题,询问拆分数据帧,但答案没有显示如何根据每个列值标记输出。
I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.
我对 Python 还是很陌生并且仍在学习,所以如果实际上有一种不同的、更明智的方法来解决这个问题,我非常愿意接受建议。
回答by maxymoo
Subsetting by distinct values is called a groupby
, if simply want to iterate through the groups with a for
loop, the syntax is:
通过不同的值进行子集化称为 a groupby
,如果只是想用循环遍历组for
,则语法为:
for region, df_region in df.groupby('Region'):
print(df_region)
Competitor Region ProductA ProductB
0 Comp1 A £10 £15
3 Comp2 A £9 £16
6 Comp3 A £11 £16
Competitor Region ProductA ProductB
1 Comp1 B £11 £16
4 Comp2 B £12 £14
7 Comp3 B £10 £15
Competitor Region ProductA ProductB
2 Comp1 C £11 £15
5 Comp2 C £14 £17
8 Comp3 C £12 £15