Pandas for 循环分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21800004/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas for loop on a group by
提问by user3302483
I have a dataset which has a category field, 'City' and 2 metrics, Age and Weight. I want to plot a scatterplot for each City using a loop. However I'm struggling to combine the group by and loop that I need in a single statement. If I just use a for loop I end up with a chart for each record and if I do a group by I get the right number of charts but with no values.
我有一个数据集,其中有一个类别字段“城市”和 2 个指标,即年龄和体重。我想使用循环为每个城市绘制散点图。但是,我正在努力在单个语句中组合我需要的 group by 和 loop。如果我只使用 for 循环,我最终会为每条记录生成一个图表,如果我按组进行分组,我会得到正确数量的图表但没有值。
Here is my code using just the for loop with my group by commented out:
这是我的代码仅使用 for 循环与我的组注释掉:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
d = {  'City': pd.Series(['London','New York', 'New York', 'London', 'Paris',
                        'Paris','New York', 'New York', 'London','Paris']),
       'Age' : pd.Series([36., 42., 6., 66., 38.,18.,22.,43.,34.,54]),
     'Weight': pd.Series([225,454,345,355,234,198,400, 256,323,310])
}
df = pd.DataFrame(d)
#for C in df.groupby('City'):
for C in df.City:
    fig = plt.figure(figsize=(5, 4))
    # Create an Axes object.
    ax = fig.add_subplot(1,1,1) # one row, one column, first plot
    # Plot the data.
    ax.scatter(df.Age,df.Weight, df.City == C, color="red", marker="^")
采纳答案by unutbu
Do not call plt.figuremore than once, as each call creates a new figure (roughly speaking, window).
不要plt.figure多次调用,因为每次调用都会创建一个新图形(粗略地说,窗口)。
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
d = {'City': ['London', 'New York', 'New York', 'London', 'Paris',
                        'Paris', 'New York', 'New York', 'London', 'Paris'],
     'Age': [36., 42., 6., 66., 38., 18., 22., 43., 34., 54],
     'Weight': [225, 454, 345, 355, 234, 198, 400, 256, 323, 310]}
df = pd.DataFrame(d)
fig, ax = plt.subplots(figsize=(5, 4))    # 1
df.groupby(['City']).plot(kind='scatter', x='Age', y='Weight', 
                          ax=ax,          # 2
                          color=['red', 'blue', 'green'])
plt.show()


- plt.subplotsreturns a figure,- figand an axes,- ax.
- If you pass ax=axto Panda's plot method, then all the plots will how up on the same axis.
- plt.subplots返回一个图形- fig和一个轴,- ax。
- 如果你传递ax=ax给 Panda 的 plot 方法,那么所有的图都会在同一轴上。
To make a separate figure for each city:
为每个城市制作一个单独的数字:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
d = {'City': ['London', 'New York', 'New York', 'London', 'Paris',
                        'Paris', 'New York', 'New York', 'London', 'Paris'],
     'Age': [36., 42., 6., 66., 38., 18., 22., 43., 34., 54],
     'Weight': [225, 454, 345, 355, 234, 198, 400, 256, 323, 310]}
df = pd.DataFrame(d)
groups = df.groupby(['City'])
for city, grp in groups:                           # 1
    fig, ax = plt.subplots(figsize=(5, 4))
    grp.plot(kind='scatter', x='Age', y='Weight',  # 2
             ax=ax)               
    plt.show()
- This is perhaps all you were missing. When you iterate over a GroupBy object, it returns a 2-tuple: the groupby key and the sub-DataFrame.
- Use grp, the sub-DataFrame instead ofdfinside the for-loop.
- 这也许就是你所缺少的。当您遍历 GroupBy 对象时,它会返回一个 2 元组:groupby 键和子 DataFrame。
- 使用grp,子 DataFrame 而不是dffor 循环内部。
回答by user3302483
I've used the group by from the other post and inserted into my code to generate a chart for each group by:
我使用了另一篇文章中的 group by 并插入到我的代码中以通过以下方式为每个组生成图表:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
d = {  'City': pd.Series(['London','New York', 'New York', 'London','Paris',
                        'Paris','New York', 'New York', 'London','Paris']),
       'Age' : pd.Series([36., 42., 6., 66., 38.,18.,22.,43.,34.,54]) ,
     'Weight': pd.Series([225,454,345,355,234,198,400, 256,323,310])
}
df = pd.DataFrame(d)
groups = df.groupby(['City'])
for city, grp in groups: 
    fig = plt.figure(figsize=(5, 4))
    # Create an Axes object.
    ax = fig.add_subplot(1,1,1) # one row, one column, first plot
    # Plot the data.
    ax.scatter(df.Age,df.Weight, df.City == city, color="red", marker="^")

