Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们

Question

提问by Sarah

I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:

我有一个大型数据集，列出了在全国不同地区销售的竞争对手产品。我希望通过使用这些新数据帧名称中的列值的迭代过程将此数据帧拆分为其他几个基于区域的数据帧，以便我可以分别处理每个数据帧 - 例如按价格对每个区域的信息进行排序以了解每个市场的情况。我给出了以下数据的简化版本：

Competitor  Region  ProductA  ProductB
Comp1       A       ￡10       ￡15
Comp1       B       ￡11       ￡16
Comp1       C       ￡11       ￡15
Comp2       A       ￡9        ￡16
Comp2       B       ￡12       ￡14
Comp2       C       ￡14       ￡17
Comp3       A       ￡11       ￡16
Comp3       B       ￡10       ￡15
Comp3       C       ￡12       ￡15

I can create a list of the regions using the below:

我可以使用以下内容创建区域列表：

region_list=df['Region'].unique().tolist()

Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.

我希望在产生许多数据帧的迭代循环中使用它，例如

df_A :

Competitor  Region  ProductA  ProductB
Comp1       A       ￡10       ￡15
Comp2       A       ￡9        ￡16
Comp3       A       ￡11       ￡16

I could do this manually for each region, with the code

我可以使用代码为每个区域手动执行此操作

df_A=df.loc[df['Region']==A]

but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.

但实际情况是，这个数据集有大量区域，这会使代码变得乏味。有没有办法创建一个迭代循环来复制这个？有一个类似的问题，询问拆分数据帧，但答案没有显示如何根据每个列值标记输出。

I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.

我对 Python 还是很陌生并且仍在学习，所以如果实际上有一种不同的、更明智的方法来解决这个问题，我非常愿意接受建议。

Answer 1

回答by maxymoo

Subsetting by distinct values is called a groupby, if simply want to iterate through the groups with a forloop, the syntax is:

通过不同的值进行子集化称为 a groupby，如果只是想用循环遍历组for，则语法为：

for region, df_region in df.groupby('Region'):
    print(df_region)

  Competitor Region ProductA ProductB
0      Comp1      A      ￡10      ￡15
3      Comp2      A       ￡9      ￡16
6      Comp3      A      ￡11      ￡16
  Competitor Region ProductA ProductB
1      Comp1      B      ￡11      ￡16
4      Comp2      B      ￡12      ￡14
7      Comp3      B      ￡10      ￡15
  Competitor Region ProductA ProductB
2      Comp1      C      ￡11      ￡15
5      Comp2      C      ￡14      ￡17
8      Comp3      C      ￡12      ￡15

Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们

提问by Sarah

回答by maxymoo

相关推荐

最近更新

标签

Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们

提问by Sarah

回答by maxymoo

相关推荐

Python 无法加载本机 TensorFlow 运行时。蟒蛇 3.5.2

Python 如何使用列表切片从列表中获取除第一个元素之外的所有内容

Python 使用 Qt Designer 表单和 PyQt5 在 QWidget 中绘制 matplotlib 图

Python .strip 方法不起作用

相关推荐

最近更新

标签