pandas 创建计数的熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31076698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create a pandas dataframe of counts
提问by Tchotchke
I want to create a pandas dataframe with two columns, the first being the unique values of one of my columns and the second being the count of unique values.
我想创建一个包含两列的 Pandas 数据框,第一列是其中一列的唯一值,第二列是唯一值的计数。
I have seen many posts (such here) as that describe how to get the counts, but the issue I'm running into is when I try to create a dataframe the column values become my index.
我看过很多帖子(例如here)描述如何获取计数,但我遇到的问题是当我尝试创建数据框时,列值成为我的索引。
Sample data: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']}). I want to end up with a dataframe like:
样本数据:df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})。我想最终得到一个数据框,如:
Color Count
0 Red 2
1 Blue 1
I have tried the following, but in all cases the index ends up as Color and the Count is the only column in the dataframe.
我尝试了以下方法,但在所有情况下,索引都以 Color 结束,而 Count 是数据框中唯一的列。
Attempt 1:
尝试 1:
df2 = pd.DataFrame(data=df['Color'].value_counts())
# And resetting the index just gets rid of Color, which I want to keep
df2 = df2.reset_index(drop=True)
Attempt 2:
尝试 2:
df3 = df['Color'].value_counts()
df3 = pd.DataFrame(data=df3, index=range(df3.shape[0]))
Attempt 3:
尝试 3:
df4 = df.groupby('Color')
df4 = pd.DataFrame(df4['Color'].count())
回答by Phillip Cloud
Another way to do this, using value_counts:
另一种方法是使用value_counts:
In [10]: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})
In [11]: df.Color.value_counts().reset_index().rename(columns={'index': 'Color', 0: 'count'})
Out[11]:
Color count
0 Red 2
1 Blue 1
回答by mdurant
Essentially equivalent to setting the column names, but using the rename method instead:
本质上等同于设置列名,但使用重命名方法:
df.groupby('Color').count().reset_index().rename(columns={'State': 'Count'})
回答by jpp
One readable solution is to use to_frameand rename_axismethods:
一种可读的解决方案是使用to_frame和rename_axis方法:
res = df['Color'].value_counts()\
.to_frame('count').rename_axis('Color')\
.reset_index()
print(res)
Color count
0 Red 2
1 Blue 1
回答by khammel
df=df.groupby('Color').count().reset_index()
df.columns=['Color','Count']
回答by letterjung
label_sentiment=[]
for i in range(len(score)):
if score[i]==0:
label_sentiment.append('NEUTRAL')
elif score[i]>0:
label_sentiment.append('POSITIVE')
elif score[i]<0:
label_sentiment.append('NEGATIVE')
data['label_sentiment']=label_sentiment
# #pythonT

