Python Pandas:将“.value_counts”输出转换为数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47136436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: Convert ".value_counts" output to dataframe
提问by s900n
Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:
嗨,我想获取数据帧的唯一值的计数。count_values 实现了这一点,但是我想在其他地方使用它的输出。如何将 .count_values 输出转换为 Pandas 数据帧。这是一个示例代码:
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))
output is:
输出是:
2 3
1 2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>
What I need is a dataframe like this:
我需要的是这样的数据框:
unique_values counts
2 3
1 2
Thank you.
谢谢你。
回答by jezrael
Use rename_axisfor name of column from index and reset_index:
使用rename_axis从索引和列的名称reset_index:
df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
unique_values counts
0 2 3
1 1 2
Or if need one column DataFrame use Series.to_frame:
或者如果需要一列 DataFrame 使用Series.to_frame:
df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
counts
unique_values
2 3
1 2
回答by WY Hsu
I just run into the same problem, so I provide my thoughts here.
我刚遇到同样的问题,所以我在这里提供我的想法。
Warning
警告
When you deal with the data structure of Pandas, you have to aware of the return type.
当您处理 的数据结构时Pandas,您必须了解返回类型。
Another solution here
这里的另一个解决方案
Like @jezrael mentioned before, Pandasdo provide API pd.Series.to_frame.
就像前面提到的@jezrael 一样,Pandas提供 API pd.Series.to_frame。
Step 1
第1步
You can also wrap the pd.Seriesto pd.DataFrameby just doing
您也可以包装pd.Series,以pd.DataFrame通过只是做
df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame
Then, you have a pd.DataFramewith column name 'a', and your first column become the index
然后,您有一个pd.DataFramewith column name 'a',并且您的第一列成为索引
Input: print(df_value_counts.index.values)
Output: [2 1]
Input: print(df_value_counts.columns)
Output: Index(['a'], dtype='object')
Step 2
第2步
What now?
现在怎么办?
If you want to add new column names here, as a pd.DataFrame, you can simply reset the index by the API of reset_index().
如果您想在此处添加新的列名,作为 a pd.DataFrame,您可以通过reset_index()的 API 简单地重置索引。
And then, change the column name by a list by API df.coloumns
然后,通过 API df.coloumns的列表更改列名
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']
Then, you got what you need
然后,你得到了你需要的东西
Output:
unique_values counts
0 2 3
1 1 2
Full Answer here
完整答案在这里
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts'] # change column names
回答by Constantino
I'll throw in my hat as well, essentially the same as @wy-hsu solution, but in function format:
我也会提出我的想法,本质上与@wy-hsu 解决方案相同,但采用函数格式:
def value_counts_df(df, col):
"""
Returns pd.value_counts() as a DataFrame
Parameters
----------
df : Pandas Dataframe
Dataframe on which to run value_counts(), must have column `col`.
col : str
Name of column in `df` for which to generate counts
Returns
-------
Pandas Dataframe
Returned dataframe will have a single column named "count" which contains the count_values()
for each unique value of df[col]. The index name of this dataframe is `col`.
Example
-------
>>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
count
a
2 3
1 2
"""
df = pd.DataFrame(df[col].value_counts())
df.index.name = col
df.columns = ['count']
return df

