Python 如何将熊猫 value_counts() 合并到数据帧或使用它来对数据帧进行子集化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35809098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to merge pandas value_counts() to dataframe or use it to subset a dataframe
提问by user2476665
I used pandas df.value_counts() to find the number of occurrences of particular brands. I want to merge those value counts with the respective brands in the initial dataframe.
我使用 pandas df.value_counts() 来查找特定品牌的出现次数。我想将这些值计数与初始数据框中的相应品牌合并。
df has many columns including one named 'brands'
brands = df.brands.value_counts()
brand1 143
brand2 21
brand3 101
etc.
How do I merge the value counts with the original dataframe such that each brand's corresponding count is in a new column, say "brand_count"?
我如何将值计数与原始数据框合并,以便每个品牌的相应计数都在一个新列中,比如“brand_count”?
Is it possible to assign headers to these columns; the names function won't work with series and I was unable to convert it to a dataframe to possibly merge the data that way. But, value_counts outputs a Series of dtype int64 (brand names should be type string) which means I cannot do the following:
是否可以为这些列分配标题;名称函数不适用于系列,我无法将其转换为数据框以可能以这种方式合并数据。但是,value_counts 输出一系列 dtype int64(品牌名称应该是字符串类型),这意味着我不能执行以下操作:
df2 = pd.DataFrame({'brands': list(brands_all[0]), "brand_count":
list(brands_all[1])})
(merge with df)
Ultimately, I want to obtain this:
最终,我想获得这个:
col1 col2 col3 brands brand_count ... col150
A 30
C 140
A 30
B 111
回答by MaxU
is that what you want:
那是你要的吗:
import numpy as np
import pandas as pd
# generating random DataFrame
brands_list = ['brand{}'.format(i) for i in range(10)]
a = pd.DataFrame({'brands': np.random.choice(brands_list, 100)})
b = pd.DataFrame(np.random.randint(0,10,size=(100, 3)), columns=list('ABC'))
df = pd.concat([a, b], axis=1)
print(df.head())
# generate 'brands' DF
brands = pd.DataFrame(df.brands.value_counts().reset_index())
brands.columns = ['brands', 'count']
print(brands)
# merge 'df' & 'brands_count'
merged = pd.merge(df, brands, on='brands')
print(merged)
PS first big part is just a dataframe generation.
PS 第一个重要部分只是数据帧生成。
The part which is interesting for you starts with the # generate 'brands'
DF comment
你感兴趣的部分从# generate 'brands'
DF 注释开始
回答by Alexander
You want to use transform
.
您想使用transform
.
import numpy as np
import pandas as pd
np.random.seed(0)
# Create dummy data.
df = pd.DataFrame({'brands': ['brand{0}'.format(n)
for n in np.random.random_integers(0, 5, 10)]})
df['brand_count'] = \
df.groupby('brands', as_index=False)['brands'].transform(lambda s: s.count())
>>> df
brands brand_count
0 brand4 1
1 brand5 2
2 brand0 1
3 brand3 4
4 brand3 4
5 brand3 4
6 brand1 1
7 brand3 4
8 brand5 2
9 brand2 1
For reference:
以供参考:
>>> df.brands.value_counts()
brand3 4
brand5 2
brand4 1
brand0 1
brand1 1
brand2 1
Name: brands, dtype: int64
回答by Egos
i think the best way is to use map
我认为最好的方法是使用地图
df['brand_count']= df.brand.map(df.brand.value_counts())
this is so much faster than groupby method for example (factor 500 on a 15000 row df) and take only one line
例如,这比 groupby 方法快得多(15000 行 df 上的因子 500)并且只需要一行
回答by pomber
df = ...
key_col = "brand"
count_col = "brand_count"
result = (
df.join(
df[key_col].value_counts().rename(count_col),
how="left",
on=key_col)
)
If you need to join the counts to a different dataframe remember to fill NaN
s with zeros:
如果您需要将计数加入不同的数据帧,请记住NaN
用零填充s:
df = ...
other = ...
key_col = "brand"
count_col = "brand_count"
result = (
other.join(
df[key_col].value_counts().rename(count_col),
how="left",
on=key_col)
.fillna({count_col: 0})
)
回答by Michael H.
Pandas DataFrame's merge and value_counts attributes are pretty fast, so I would combine the two.
Pandas DataFrame 的 merge 和 value_counts 属性非常快,所以我将两者结合起来。
df.merge(df['brand'].value_counts().to_frame(), how='left', left_on='brand',
right_index=True, suffixes=('', 'x'))\
.rename(columns={'brandx':'brand_count'})