Python 我想在我的熊猫数据框中创建一列 value_counts

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17709270/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:58:21  来源:igfitidea点击:

I want to create a column of value_counts in my pandas dataframe

pythonmergepandas

提问by user2592989

I am more familiar with R but I wanted to see if there was a way to do this in pandas. I want to create a count of unique values from one of my dataframe columns and then add a new column with those counts to my original data frame. I've tried a couple different things. I created a pandas series and then calculated counts with the value_counts method. I tried to merge these values back to my original dataframe, but I the keys that I want to merge on are in the Index(ix/loc). Any suggestions or solutions would be appreciated

我更熟悉 R,但我想看看是否有办法在 Pandas 中做到这一点。我想从我的数据框列之一创建唯一值的计数,然后将包含这些计数的新列添加到我的原始数据框中。我尝试了几种不同的方法。我创建了一个熊猫系列,然后使用 value_counts 方法计算了计数。我试图将这些值合并回我的原始数据帧,但我想合并的键在索引(ix/loc)中。任何建议或解决方案将不胜感激

Color Value
Red   100
Red   150
Blue  50

and I wanted to return something like

我想返回类似的东西

Color Value Counts
Red   100   2
Red   150   2 
Blue  50    1

回答by unutbu

df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

For example,

例如,

In [102]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})

In [103]: df
Out[103]: 
  Color  Value
0   Red    100
1   Red    150
2  Blue     50

In [104]: df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

In [105]: df
Out[105]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

Note that transform('count')ignores NaNs. If you want to count NaNs, use transform(len).

请注意,transform('count')忽略 NaN。如果要计算 NaN,请使用transform(len).



To the anonymous editor: If you are getting an error while using transform('count')it may be due to your version of Pandas being too old. The above works with pandas version 0.15 or newer.

致匿名编辑: 如果您在使用transform('count')时遇到错误,可能是您的 Pandas 版本太旧。以上适用于 0.15 或更高版本的熊猫。

回答by Steven C. Howell

My initial thought would be to use list comprehension as shown below but, as was pointed out in the comment, this is slower than the groupbyand transformmethod. I will leave this answer to demonstrate WHAT NOT TO DO:

我最初的想法是使用如下所示的列表理解,但正如评论中指出的那样,这比groupbyandtransform方法慢。我将留下这个答案来证明什么不该做

In [94]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
In [95]: df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
In [96]: df
Out[100]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

[3 rows x 3 columns]

@unutbu's method gets complicated for DataFrames with several columns which make this simpler to code. If you are working with a small data frame, this is faster (see below), but otherwise, you should use NOTuse this.

@unutbu 的方法对于具有多列的 DataFrames 变得复杂,这使得编码更简单。如果您使用的是小数据框,这会更快(见下文),否则,您应该使用NOT使用它。

In [97]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = df.groupby(['Color']).transform('count')
100 loops, best of 3: 2.87 ms per loop
In [98]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
1000 loops, best of 3: 1.03 ms per loop

回答by 1''

df['Counts'] = df.Color.groupby(df.Color).transform('count')

df['Counts'] = df.Color.groupby(df.Color).transform('count')

You can do this with any series: group it by itself and call transform('count'):

您可以对任何系列执行此操作:将其单独分组并调用transform('count')

>>> series = pd.Series(['Red', 'Red', 'Blue'])
>>> series.groupby(series).transform('count')
0    2
1    2
2    1
dtype: int64

回答by ZakS

One other option:

另一种选择:

z = df['Color'].value_counts 

z1 = z.to_dict() #converts to dictionary

df['Count_Column'] = df['Color'].map(z1) 

This option will give you a column with repeated values of the counts, corresponding to the frequency of each value in the 'Color' column.

此选项将为您提供一个包含重复计数值的列,对应于“颜色”列中每个值的频率。