pandas 在熊猫中获取组名的有效方法

Question

提问by swopnilnep

I have a .csv file with around 300,000 rows. I have set it to group by a particular column, with each group having around 140 members (2138 total groups).

我有一个大约 300,000 行的 .csv 文件。我已将其设置为按特定列分组，每个组大约有 140 个成员（总共 2138 个组）。

I am trying to generate a numpy array of the group names. I have used a for loop to generate the names as of now but it takes a while for everything to process.

我正在尝试生成组名称的 numpy 数组。到目前为止，我已经使用 for 循环来生成名称，但处理所有内容都需要一段时间。

import numpy as np
import pandas as pd

df = pd.read_csv('file.csv')
grouped = df.groupby('col1')
group_names = []
for name,group in grouped: group_names.append(name)
group_names = np.array(group_names, dtype=object)

I am wondering if there is a more efficient way to do this, whether by using a pandas module or directly converting the names into a numpy array.

我想知道是否有更有效的方法来做到这一点，无论是使用 pandas 模块还是直接将名称转换为 numpy 数组。

Answer 1

回答by EdChum

groupbyobjects have a .groupsattribute:

groupby对象有一个.groups属性：

groups = df.groupby('col1').groups

this returns a dict of the group name->labels

这将返回组名称->标签的字典

example:

例子：

In[257]:
df = pd.DataFrame({'a':list('aabcccc'), 'b':np.random.randn(7)})
groups = df.groupby('a').groups
groups

Out[257]: 
{'a': Int64Index([0, 1], dtype='int64'),
 'b': Int64Index([2], dtype='int64'),
 'c': Int64Index([3, 4, 5, 6], dtype='int64')}

groups.keys()
Out[258]: dict_keys(['a', 'b', 'c'])

Answer 2

回答by sacuL

The fastest way would most likely be just to use uniqueon the column you are grouping by, which gives you all unique values. The output will be an array of your group names.

最快的方法很可能只是unique在您分组的列上使用，它为您提供所有唯一值。输出将是您的组名数组。

group_names = df.col1.unique()

pandas 在熊猫中获取组名的有效方法

提问by swopnilnep

回答by EdChum

回答by sacuL

相关推荐

最近更新

标签

pandas 在熊猫中获取组名的有效方法

提问by swopnilnep

回答by EdChum

回答by sacuL

相关推荐

pandas 如何将 NULL 视为带有熊猫的普通字符串？

pandas 熊猫删除行与过滤器

如何在 Pandas 中使用 read_excel 提高处理速度？

Pandas 查询功能不适用于列名中的空格

相关推荐

最近更新

标签