Python 熊猫计数（不同）等价物

Question

提问by Adriano Almeida

I am using pandas as a db substitute as I have multiple databases (oracle, mssql, etc) and I am unable to make a sequence of commands to a SQL equivalent.

我使用 Pandas 作为数据库替代品，因为我有多个数据库（oracle、mssql 等），而且我无法为 SQL 等价物制作一系列命令。

I have a table loaded in a DataFrame with some columns:

我在 DataFrame 中加载了一个表，其中包含一些列：

YEARMONTH, CLIENTCODE, SIZE, .... etc etc

In SQL, to count the amount of different clients per year would be:

在 SQL 中，要计算每年不同客户端的数量是：

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;

And the result would be

结果是

201301    5000
201302    13245

How can I do that in pandas?

我怎么能在熊猫中做到这一点？

Answer 1

采纳答案by Dan Allan

I believe this is what you want:

我相信这就是你想要的：

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

例子：

In [2]: table
Out[2]: 
   CLIENTCODE  YEARMONTH
0           1     201301
1           1     201301
2           2     201301
3           1     201302
4           2     201302
5           2     201302
6           3     201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]: 
YEARMONTH
201301       2
201302       3

Answer 2

回答by Roman Kh

Interestingly enough, very often len(unique())is a few times (3x-15x) faster than nunique().

有趣的是，通常len(unique())比nunique().

Answer 3

回答by Enthusiast

Here is another method, much simple, lets say your dataframe name is daatand column name is YEARMONTH

这是另一种方法，非常简单，假设您的数据框名称是daat，列名称是YEARMONTH

daat.YEARMONTH.value_counts()

Answer 4

回答by YOBEN_S

Using crosstab, this will return more information than groupbynunique

使用crosstab，这将返回比groupbynunique

pd.crosstab(df.YEARMONTH,df.CLIENTCODE)
Out[196]: 
CLIENTCODE  1  2  3
YEARMONTH          
201301      2  1  0
201302      1  2  1

After a little bit modify ,yield the result

稍加修改后，产生结果

pd.crosstab(df.YEARMONTH,df.CLIENTCODE).ne(0).sum(1)
Out[197]: 
YEARMONTH
201301    2
201302    3
dtype: int64

Answer 5

回答by Gangaraju

I am also using nuniquebut it will be very helpful if you have to use an aggregate function like 'min', 'max', 'count' or 'mean'etc.

我也在使用，nunique但如果您必须使用诸如'min', 'max', 'count' or 'mean'等的聚合函数，它将非常有帮助。

df.groupby('YEARMONTH')['CLIENTCODE'].transform('nunique') #count(distinct)
df.groupby('YEARMONTH')['CLIENTCODE'].transform('min')     #min
df.groupby('YEARMONTH')['CLIENTCODE'].transform('max')     #max
df.groupby('YEARMONTH')['CLIENTCODE'].transform('mean')    #average
df.groupby('YEARMONTH')['CLIENTCODE'].transform('count')   #count

Answer 6

回答by Vivek Ananthan

With new pandas version, it is easy to get as dataframe

使用新的 Pandas 版本，很容易获得数据框

unique_count = pd.groupby(['YEARMONTH'], as_index=False).agg(uniq_CLIENTCODE =('CLIENTCODE',pd.Series.count))

Answer 7

回答by Ramon

Here an approach to have count distinct over multiple columns. Let's have some data:

这是一种在多列上计数不同的方法。让我们有一些数据：

data = {'CLIENT_CODE':[1,1,2,1,2,2,3],
        'YEAR_MONTH':[201301,201301,201301,201302,201302,201302,201302],
        'PRODUCT_CODE': [100,150,220,400,50,80,100]
       }
table = pd.DataFrame(data)
table

CLIENT_CODE YEAR_MONTH  PRODUCT_CODE
0   1       201301      100
1   1       201301      150
2   2       201301      220
3   1       201302      400
4   2       201302      50
5   2       201302      80
6   3       201302      100

Now, list the columns of interest and use groupby in a slightly modified syntax:

现在，列出感兴趣的列并以稍微修改的语法使用 groupby：

columns = ['YEAR_MONTH', 'PRODUCT_CODE']
table[columns].groupby(table['CLIENT_CODE']).nunique()

We obtain:

我们获得：

YEAR_MONTH  PRODUCT_CODE CLIENT_CODE        
1           2            3
2           2            3
3           1            1

Answer 8

回答by Vivek Payasi

Distinct of column along with aggregations on other columns

不同列以及其他列上的聚合

To get the distinct number of values for any column (CLIENTCODEin your case), we can use nunique. We can pass the input as a dictionary in aggfunction, along with aggregations on other columns:

要获得任何列的不同数量的值（CLIENTCODE在您的情况下），我们可以使用nunique. 我们可以在agg函数中将输入作为字典传递，以及其他列上的聚合：

grp_df = df.groupby('YEARMONTH').agg({'CLIENTCODE': ['nunique'],
                                      'other_col_1': ['sum', 'count']})

# to flatten the multi-level columns
grp_df.columns = ["_".join(col).strip() for col in grp_df.columns.values]

# if you wish to reset the index
grp_df.reset_index(inplace=True)

Python 熊猫计数（不同）等价物

提问by Adriano Almeida

采纳答案by Dan Allan

回答by Roman Kh

回答by Enthusiast

回答by YOBEN_S

回答by Gangaraju

回答by Vivek Ananthan

回答by Ramon

回答by Vivek Payasi

Distinct of column along with aggregations on other columns

不同列以及其他列上的聚合

相关推荐

最近更新

标签

Python 熊猫计数（不同）等价物

提问by Adriano Almeida

采纳答案by Dan Allan

回答by Roman Kh

回答by Enthusiast

回答by YOBEN_S

回答by Gangaraju

回答by Vivek Ananthan

回答by Ramon

回答by Vivek Payasi

Distinct of column along with aggregations on other columns

不同列以及其他列上的聚合

相关推荐

Python Django：如何创建多选表单？

Python Pycharm 和 unittest 不起作用

Python 中的函数是对象吗？

创建多维零 Python

相关推荐

最近更新

标签