如何遍历 Pandas DataFrameGroupBy 并选择特定列的每个分组变量的所有条目？

Question

提问by Server Khalilov

Let's assume, there is a table like this:

让我们假设，有一个这样的表：

Id | Type | Guid

I perform on such a table the following operation:

我在这样的表上执行以下操作：

df = df.groupby('Id')

Now I would like to iterate through first nrows and for each specific Idas a listprint all the corresponding entries from column Guid. Please, help me with a solution.

现在，我将通过首先要迭代n行对于每一个特定Id的list打印全部来自列相应的条目Guid。请帮我解决一个问题。

Answer 1

回答by Scott Boston

I think I would do it like this:

我想我会这样做：

Create some data for testing

创建一些数据进行测试

df = pd.DataFrame({'Id':np.random.randint(1,10,100),'Type':np.random.choice(list('ABCD'),100),'Guid':np.random.randint(10000,99999,100)})

print(df.head()
   Id Type   Guid
0   2    A  89247
1   4    B  39262
2   3    C  45522
3   1    B  99724
4   4    C  51322

Choose n for number of records to return and groupby

选择 n 作为要返回的记录数和 groupby

n = 5
df_groups = df.groupby('Id')

Iterate through df_group with for loop and print

使用 for 循环遍历 df_group 并打印

for name,group in df_groups:
    print('ID: ' + str(name))
    print(group.head(n))
    print("\n")

Output:

输出：

ID: 1
    Id Type   Guid
3    1    B  99724
5    1    B  74182
37   1    D  49219
47   1    B  81464
65   1    C  84925


ID: 2
    Id Type   Guid
0    2    A  89247
6    2    A  16499
7    2    A  79956
34   2    C  56393
40   2    A  49883
.
.
.

EDIT To print all the Guids in a list for each ID you can use the following:

编辑要打印每个 ID 的列表中的所有 Guid，您可以使用以下内容：

for name,group in df_groups:
    print('ID: ' + str(name))
    print(group.Guid.tolist())
    print("\n")

Output:

输出：

ID: 1
[99724, 74182, 49219, 81464, 84925, 67834, 43275, 35743, 36478, 94662, 21183]


ID: 2
[89247, 16499, 79956, 56393, 49883, 97633, 11768, 14639, 88591, 31263, 98729]


ID: 3
[45522, 13971, 75882, 96489, 58414, 22051, 80304, 46144, 22481, 11278, 84622, 61145]


ID: 4
[39262, 51322, 76930, 83740, 60152, 90735, 42039, 22114, 76077, 83234, 96134, 93559, 87903, 98199, 76096, 64378]


ID: 5
[13444, 55762, 13206, 94768, 19665, 75761, 90755, 45737, 23506, 89345, 94912, 81200, 91868]
.
.
.

Answer 2

回答by Andy Hayden

I like to use get_groupfor this. First you can pull out the keys:

我喜欢用get_group这个。首先你可以拔出钥匙：

In [11]: df
Out[11]:
   A  B
0  1  2
1  1  4
2  2  6
3  3  8

In [12]: g = df.groupby("A")

In [13]: g.groups.keys()
Out[13]: dict_keys([1, 2, 3])

You can iterate through the keys:

您可以遍历键：

In [14]: for k in g.groups.keys():
             print(g.get_group(k))
             print("\n")
   A  B
0  1  2
1  1  4

   A  B
2  2  6

   A  B
3  3  8

To get the first nitems of a DataFrame you can use head:

要获取nDataFrame的第一项，您可以使用head：

In [21]: df.head(3)  # or g.get_group(k).head(n)
Out[21]:
   A  B
0  1  2
1  1  4
2  2  6

Note: The groupby also has a head method which takes the first n of each group:

注意：groupby 也有一个 head 方法，它取每组的前 n 个：

In [21]: g.head(1)
Out[21]:
   A  B
0  1  2
2  2  6
3  3  8

如何遍历 Pandas DataFrameGroupBy 并选择特定列的每个分组变量的所有条目？

提问by Server Khalilov

回答by Scott Boston

EDIT To print all the Guids in a list for each ID you can use the following:

编辑要打印每个 ID 的列表中的所有 Guid，您可以使用以下内容：

回答by Andy Hayden

相关推荐

最近更新

标签

如何遍历 Pandas DataFrameGroupBy 并选择特定列的每个分组变量的所有条目？

提问by Server Khalilov

回答by Scott Boston

EDIT To print all the Guids in a list for each ID you can use the following:

编辑要打印每个 ID 的列表中的所有 Guid，您可以使用以下内容：

回答by Andy Hayden

相关推荐

使用 Pandas 访问 json 列

pandas 熊猫数据阅读器

pandas Panda Python - 将一列除以 100（然后四舍五入 2.dp）

pandas 熊猫将列转换为日期时间

相关推荐

最近更新

标签