Python 使用 Pandas 进行计数和排序

Question

提问by Rubans

I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:

我有一个值的数据框形成一个文件，我通过该文件按两列分组，这些列返回聚合的计数。现在我想按最大计数值排序，但是出现以下错误：

KeyError: 'count'

关键错误：'计数'

Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda. Here's the actual code, please let me know if you need more detail:

看起来 group by agg count 列是某种索引，所以不知道该怎么做，我是 Python 和 Panda 的初学者。这是实际代码，如果您需要更多详细信息，请告诉我：

def answer_five():
    df = census_df#.set_index(['STNAME'])
    df = df[df['SUMLEV'] == 50]
    df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])
    #df.set_index(['count'])
    print(df.index)
    # get sorted count max item
    return df.head(5)

Answer 1

回答by jezrael

I think you need add reset_index, then parameter ascending=Falseto sort_valuesbecause sortreturn:

我认为你需要 add reset_index, then parameter ascending=Falsetosort_values因为sort返回：

FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) .sort_values(['count'], ascending=False)

FutureWarning: sort(columns=....) 已弃用，使用 sort_values(by=.....) .sort_values(['count'], Ascending=False)

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
                             .count() \
                             .reset_index(name='count') \
                             .sort_values(['count'], ascending=False) \
                             .head(5)

Sample:

样本：

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
                   'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})

print (df)
    CTYNAME STNAME
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
                             .count() \
                             .reset_index(name='count') \
                             .sort_values(['count'], ascending=False) \
                             .head(5)

print (df)
  STNAME  count
2      c      5
5      s      4
1      b      3
0      a      2
3      d      1

But it seems you need Series.nlargest:

但似乎你需要Series.nlargest：

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)

or:

或者：

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)

The difference between sizeand countis:
sizecounts NaNvalues, countdoes not.

之间的区别size和count是：
size计NaN数值，count不。

Sample:

样本：

df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
                   'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})

print (df)
    CTYNAME STNAME
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']
                             .size()
                             .nlargest(5)
                             .reset_index(name='top5')
print (df)
  STNAME  top5
0      c     5
1      s     4
2      b     3
3      a     2
4      d     1

Answer 2

回答by Christoph Schranz

I don't know exactly how your df looks like. But if you have to sort the frequency of several categories by its count, it is easier to slice a Series from the df and sort the series:

我不知道你的 df 到底长什么样。但是，如果您必须按计数对多个类别的频率进行排序，则更容易从 df 中切出一个系列并对系列进行排序：

series = df.count().sort_values(ascending=False)
series.head()

Note that this series will use the name of the category as index!

注意本系列将使用分类名称作为索引！

Answer 3

回答by Angelin Nadar

I agree with @Christoph Schranz to take slice a series from dataframe

我同意@Christoph Schranz 从数据帧中截取一个系列

df[['STNAME','CTYNAME']].groupby('STNAME')['CTYNAME'].count().nlargest(3)

Python 使用 Pandas 进行计数和排序

提问by Rubans

回答by jezrael

回答by Christoph Schranz

回答by Angelin Nadar

相关推荐

最近更新

标签

Python 使用 Pandas 进行计数和排序

提问by Rubans

回答by jezrael

回答by Christoph Schranz

回答by Angelin Nadar

相关推荐

Python 分组条形图 Pandas

Python 条件 If 语句：如果行中的值包含字符串...设置另一列等于字符串

Python 有没有办法抑制 TensorFlow 打印的消息？

Python 哪些参数应该用于提前停止？

相关推荐

最近更新

标签