Python pandas 数据框中选定列中值的唯一组合和计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35268817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:12:59  来源:igfitidea点击:

unique combinations of values in selected columns in pandas data frame and count

pythonpandas

提问by Ratchainant Thammasudjarit

I have my data in pandas data frame as follows:

我在熊猫数据框中的数据如下:

df1 = pd.DataFrame({'A':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],
                   'B':['yes','no','no','no','yes','yes','no','yes','yes','no']})

So, my data looks like this

所以,我的数据看起来像这样

----------------------------
index         A        B
0           yes      yes
1           yes       no
2           yes       no
3           yes       no
4            no      yes
5            no      yes
6           yes       no
7           yes      yes
8           yes      yes
9            no       no
-----------------------------

I would like to transform it to another data frame. The expected output can be shown in the following python script:

我想将其转换为另一个数据框。预期的输出可以显示在以下 python 脚本中:

output = pd.DataFrame({'A':['no','no','yes','yes'],'B':['no','yes','no','yes'],'count':[1,2,4,3]})

So, my expected output looks like this

所以,我的预期输出看起来像这样

--------------------------------------------
index      A       B       count
--------------------------------------------
0         no       no        1
1         no      yes        2
2        yes       no        4
3        yes      yes        3
--------------------------------------------

Actually, I can achieve to find all combinations and count them by using the following command: mytable = df1.groupby(['A','B']).size()

实际上,我可以使用以下命令找到所有组合并计算它们: mytable = df1.groupby(['A','B']).size()

However, it turns out that such combinations are in a single column. I would like to separate each value in a combination into different column and also add one more column for the result of counting. Is it possible to do that? May I have your suggestions? Thank you in advance.

然而,事实证明这些组合都在一个列中。我想将组合中的每个值分成不同的列,并为计数结果再添加一列。有可能这样做吗?我可以有你的建议吗?先感谢您。

采纳答案by EdChum

You can groupbyon cols 'A' and 'B' and call sizeand then reset_indexand renamethe generated column:

您可以groupby在 cols 'A' 和 'B' 上调用size然后reset_indexrename生成的列:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

update

更新

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call sizewhich returns the number of unique groups:

稍微解释一下,通过对 2 列进行分组,这对 A 和 B 值相同的行进行分组,我们称之为size返回唯一组的数量:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

So now to restore the grouped columns, we call reset_index:

所以现在要恢复分组列,我们调用reset_index

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

这将恢复索引,但大小聚合变成了一个生成的 column 0,所以我们必须重命名它:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupbydoes accept the arg as_indexwhich we could have set to Falseso it doesn't make the grouped columns the index, but this generates a seriesand you'd still have to restore the indices and so on....:

groupby确实接受as_index我们可以设置的 arg ,False因此它不会使分组列成为索引,但这会生成 aseries并且您仍然必须恢复索引等等....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

回答by Martin Alexandersson

Slightly related, I was looking for the unique combinations and I came up with this method:

稍微相关,我正在寻找独特的组合,我想出了这个方法:

def unique_columns(df,columns):

    result = pd.Series(index = df.index)

    groups = meta_data_csv.groupby(by = columns)
    for name,group in groups:
       is_unique = len(group) == 1
       result.loc[group.index] = is_unique

    assert not result.isnull().any()

    return result

And if you only want to assert that all combinations are unique:

如果您只想断言所有组合都是唯一的:

df1.set_index(['A','B']).index.is_unique

回答by Paul Rougieux

Placing @EdChum's very nice answer into a function count_unique_index. The unique method only works on pandas series, not on data frames. The function below reproduces the behavior of the uniquefunction in R:

将@EdChum 非常好的答案放入一个函数中count_unique_index。独特的方法仅适用于熊猫系列,不适用于数据框。下面的函数再现了R 中唯一函数的行为:

unique returns a vector, data frame or array like x but with duplicate elements/rows removed.

unique 返回一个向量、数据框或数组,如 x 但删除了重复的元素/行。

And adds a count of the occurrences as requested by the OP.

并根据 OP 的要求添加出现次数。

df1 = pd.DataFrame({'A':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],                                                                                             
                    'B':['yes','no','no','no','yes','yes','no','yes','yes','no']})                                                                                               
def count_unique_index(df, by):                                                                                                                                                 
    return df.groupby(by).size().reset_index().rename(columns={0:'count'})                                                                                                      

count_unique_index(df1, ['A','B'])                                                                                                                                              
     A    B  count                                                                                                                                                                  
0   no   no      1                                                                                                                                                                  
1   no  yes      2                                                                                                                                                                  
2  yes   no      4                                                                                                                                                                  
3  yes  yes      3