pandas 创建计数的熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31076698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:32:32  来源:igfitidea点击:

Create a pandas dataframe of counts

pythonpandas

提问by Tchotchke

I want to create a pandas dataframe with two columns, the first being the unique values of one of my columns and the second being the count of unique values.

我想创建一个包含两列的 Pandas 数据框,第一列是其中一列的唯一值,第二列是唯一值的计数。

I have seen many posts (such here) as that describe how to get the counts, but the issue I'm running into is when I try to create a dataframe the column values become my index.

我看过很多帖子(例如here)描述如何获取计数,但我遇到的问题是当我尝试创建数据框时,列值成为我的索引。

Sample data: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']}). I want to end up with a dataframe like:

样本数据:df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})。我想最终得到一个数据框,如:

   Color Count
0   Red  2
1  Blue  1

I have tried the following, but in all cases the index ends up as Color and the Count is the only column in the dataframe.

我尝试了以下方法,但在所有情况下,索引都以 Color 结束,而 Count 是数据框中唯一的列。

Attempt 1:

尝试 1:

df2 = pd.DataFrame(data=df['Color'].value_counts())
# And resetting the index just gets rid of Color, which I want to keep
df2 = df2.reset_index(drop=True)

Attempt 2:

尝试 2:

df3 = df['Color'].value_counts()
df3 = pd.DataFrame(data=df3, index=range(df3.shape[0]))

Attempt 3:

尝试 3:

df4 = df.groupby('Color')
df4 = pd.DataFrame(df4['Color'].count())

回答by Phillip Cloud

Another way to do this, using value_counts:

另一种方法是使用value_counts

In [10]: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})

In [11]: df.Color.value_counts().reset_index().rename(columns={'index': 'Color', 0: 'count'})
Out[11]:
  Color  count
0   Red      2
1  Blue      1

回答by mdurant

Essentially equivalent to setting the column names, but using the rename method instead:

本质上等同于设置列名,但使用重命名方法:

df.groupby('Color').count().reset_index().rename(columns={'State': 'Count'})

回答by jpp

One readable solution is to use to_frameand rename_axismethods:

一种可读的解决方案是使用to_framerename_axis方法:

res = df['Color'].value_counts()\
                 .to_frame('count').rename_axis('Color')\
                 .reset_index()

print(res)

  Color  count
0   Red      2
1  Blue      1

回答by khammel

df=df.groupby('Color').count().reset_index()
df.columns=['Color','Count']

回答by letterjung

label_sentiment=[]
for i in range(len(score)):
   if score[i]==0:
       label_sentiment.append('NEUTRAL')
   elif score[i]>0:
       label_sentiment.append('POSITIVE')
   elif score[i]<0:
       label_sentiment.append('NEGATIVE')
data['label_sentiment']=label_sentiment

# #pythonT