Python Pandas:将“.value_counts”输出转换为数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47136436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:01:13  来源:igfitidea点击:

Python Pandas: Convert ".value_counts" output to dataframe

pythonpandasdataframe

提问by s900n

Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:

嗨,我想获取数据帧的唯一值的计数。count_values 实现了这一点,但是我想在其他地方使用它的输出。如何将 .count_values 输出转换为 Pandas 数据帧。这是一个示例代码:

import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))

output is:

输出是:

2    3
1    2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>

What I need is a dataframe like this:

我需要的是这样的数据框:

unique_values  counts
2              3
1              2

Thank you.

谢谢你。

回答by jezrael

Use rename_axisfor name of column from index and reset_index:

使用rename_axis从索引和列的名称reset_index

df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
   unique_values  counts
0              2       3
1              1       2

Or if need one column DataFrame use Series.to_frame:

或者如果需要一列 DataFrame 使用Series.to_frame

df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
               counts
unique_values        
2                   3
1                   2

回答by WY Hsu

I just run into the same problem, so I provide my thoughts here.

我刚遇到同样的问题,所以我在这里提供我的想法。

Warning

警告

When you deal with the data structure of Pandas, you have to aware of the return type.

当您处理 的数据结构时Pandas,您必须了解返回类型

Another solution here

这里的另一个解决方案

Like @jezrael mentioned before, Pandasdo provide API pd.Series.to_frame.

就像前面提到的@jezrael 一样,Pandas提供 API pd.Series.to_frame

Step 1

第1步

You can also wrap the pd.Seriesto pd.DataFrameby just doing

您也可以包装pd.Series,以pd.DataFrame通过只是做

df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame

Then, you have a pd.DataFramewith column name 'a', and your first column become the index

然后,您有一个pd.DataFramewith column name 'a',并且您的第一列成为索引

Input:  print(df_value_counts.index.values)
Output: [2 1]

Input:  print(df_value_counts.columns)
Output: Index(['a'], dtype='object')

Step 2

第2步

What now?

现在怎么办?

If you want to add new column names here, as a pd.DataFrame, you can simply reset the index by the API of reset_index().

如果您想在此处添加新的列名,作为 a pd.DataFrame,您可以通过reset_index()的 API 简单地重置索引。

And then, change the column name by a list by API df.coloumns

然后,通过 API df.coloumns的列表更改列名

df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']

Then, you got what you need

然后,你得到了你需要的东西

Output:

       unique_values    counts
    0              2         3
    1              1         2

Full Answer here

完整答案在这里

import pandas as pd

df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)

# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts'] # change column names

回答by Constantino

I'll throw in my hat as well, essentially the same as @wy-hsu solution, but in function format:

我也会提出我的想法,本质上与@wy-hsu 解决方案相同,但采用函数格式:

def value_counts_df(df, col):
    """
    Returns pd.value_counts() as a DataFrame

    Parameters
    ----------
    df : Pandas Dataframe
        Dataframe on which to run value_counts(), must have column `col`.
    col : str
        Name of column in `df` for which to generate counts

    Returns
    -------
    Pandas Dataframe
        Returned dataframe will have a single column named "count" which contains the count_values()
        for each unique value of df[col]. The index name of this dataframe is `col`.

    Example
    -------
    >>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
       count
    a
    2      3
    1      2
    """
    df = pd.DataFrame(df[col].value_counts())
    df.index.name = col
    df.columns = ['count']
    return df