Python 如何将熊猫 value_counts() 合并到数据帧或使用它来对数据帧进行子集化

Question

提问by user2476665

I used pandas df.value_counts() to find the number of occurrences of particular brands. I want to merge those value counts with the respective brands in the initial dataframe.

我使用 pandas df.value_counts() 来查找特定品牌的出现次数。我想将这些值计数与初始数据框中的相应品牌合并。

 df has many columns including one named 'brands'
 brands = df.brands.value_counts()

 brand1   143
 brand2   21
 brand3   101
 etc.

How do I merge the value counts with the original dataframe such that each brand's corresponding count is in a new column, say "brand_count"?

我如何将值计数与原始数据框合并，以便每个品牌的相应计数都在一个新列中，比如“brand_count”？

Is it possible to assign headers to these columns; the names function won't work with series and I was unable to convert it to a dataframe to possibly merge the data that way. But, value_counts outputs a Series of dtype int64 (brand names should be type string) which means I cannot do the following:

是否可以为这些列分配标题；名称函数不适用于系列，我无法将其转换为数据框以可能以这种方式合并数据。但是，value_counts 输出一系列 dtype int64（品牌名称应该是字符串类型），这意味着我不能执行以下操作：

 df2 = pd.DataFrame({'brands': list(brands_all[0]), "brand_count":
 list(brands_all[1])})
 (merge with df)

Ultimately, I want to obtain this:

最终，我想获得这个：

 col1  col2  col3  brands  brand_count ... col150
                   A        30
                   C        140
                   A        30
                   B        111

Answer 1

回答by MaxU

is that what you want:

那是你要的吗：

import numpy as np
import pandas as pd

# generating random DataFrame
brands_list = ['brand{}'.format(i) for i in range(10)]
a = pd.DataFrame({'brands': np.random.choice(brands_list, 100)})
b = pd.DataFrame(np.random.randint(0,10,size=(100, 3)), columns=list('ABC'))
df = pd.concat([a, b], axis=1)
print(df.head())

# generate 'brands' DF
brands = pd.DataFrame(df.brands.value_counts().reset_index())
brands.columns = ['brands', 'count']
print(brands)

# merge 'df' & 'brands_count'
merged = pd.merge(df, brands, on='brands')
print(merged)

PS first big part is just a dataframe generation.

PS 第一个重要部分只是数据帧生成。

The part which is interesting for you starts with the # generate 'brands'DF comment

你感兴趣的部分从# generate 'brands'DF 注释开始

Answer 2

回答by Alexander

You want to use transform.

您想使用transform.

import numpy as np
import pandas as pd

np.random.seed(0)

# Create dummy data.
df = pd.DataFrame({'brands': ['brand{0}'.format(n) 
                   for n in np.random.random_integers(0, 5, 10)]})

df['brand_count'] = \
    df.groupby('brands', as_index=False)['brands'].transform(lambda s: s.count())

>>> df
   brands brand_count
0  brand4           1
1  brand5           2
2  brand0           1
3  brand3           4
4  brand3           4
5  brand3           4
6  brand1           1
7  brand3           4
8  brand5           2
9  brand2           1

For reference:

以供参考：

>>> df.brands.value_counts()
brand3    4
brand5    2
brand4    1
brand0    1
brand1    1
brand2    1
Name: brands, dtype: int64

Answer 3

回答by Egos

i think the best way is to use map

我认为最好的方法是使用地图

df['brand_count']= df.brand.map(df.brand.value_counts())

this is so much faster than groupby method for example (factor 500 on a 15000 row df) and take only one line

例如，这比 groupby 方法快得多（15000 行 df 上的因子 500）并且只需要一行

Answer 4

回答by pomber

df = ...
key_col = "brand"
count_col = "brand_count"

result = (
    df.join(
        df[key_col].value_counts().rename(count_col), 
        how="left", 
        on=key_col)
)

If you need to join the counts to a different dataframe remember to fill NaNs with zeros:

如果您需要将计数加入不同的数据帧，请记住NaN用零填充s：

df = ...
other = ...
key_col = "brand"
count_col = "brand_count"

result = (
    other.join(
        df[key_col].value_counts().rename(count_col), 
        how="left", 
        on=key_col)
    .fillna({count_col: 0})
)

Answer 5

回答by Michael H.

Pandas DataFrame's merge and value_counts attributes are pretty fast, so I would combine the two.

Pandas DataFrame 的 merge 和 value_counts 属性非常快，所以我将两者结合起来。

df.merge(df['brand'].value_counts().to_frame(), how='left', left_on='brand',
         right_index=True, suffixes=('', 'x'))\
  .rename(columns={'brandx':'brand_count'})

Python 如何将熊猫 value_counts() 合并到数据帧或使用它来对数据帧进行子集化

提问by user2476665

回答by MaxU

回答by Alexander

回答by Egos

回答by pomber

回答by Michael H.

相关推荐

最近更新

标签

Python 如何将熊猫 value_counts() 合并到数据帧或使用它来对数据帧进行子集化

提问by user2476665

回答by MaxU

回答by Alexander

回答by Egos

回答by pomber

回答by Michael H.

相关推荐

Python Sklearn：用于多类分类的 ROC

Python pip.conf 不注意受信任的主机

Python 如何在 Django REST Framework 上启用 CORS

即使“Pip 中的需求已经满足”，也未找到 Python 模块

相关推荐

最近更新

标签