Python Pandas:将“.value_counts”输出转换为数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47136436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: Convert ".value_counts" output to dataframe
提问by s900n
Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:
嗨,我想获取数据帧的唯一值的计数。count_values 实现了这一点,但是我想在其他地方使用它的输出。如何将 .count_values 输出转换为 Pandas 数据帧。这是一个示例代码:
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))
output is:
输出是:
2 3
1 2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>
What I need is a dataframe like this:
我需要的是这样的数据框:
unique_values counts
2 3
1 2
Thank you.
谢谢你。
回答by jezrael
Use rename_axis
for name of column from index and reset_index
:
使用rename_axis
从索引和列的名称reset_index
:
df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
unique_values counts
0 2 3
1 1 2
Or if need one column DataFrame use Series.to_frame
:
或者如果需要一列 DataFrame 使用Series.to_frame
:
df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
counts
unique_values
2 3
1 2
回答by WY Hsu
I just run into the same problem, so I provide my thoughts here.
我刚遇到同样的问题,所以我在这里提供我的想法。
Warning
警告
When you deal with the data structure of Pandas
, you have to aware of the return type.
当您处理 的数据结构时Pandas
,您必须了解返回类型。
Another solution here
这里的另一个解决方案
Like @jezrael mentioned before, Pandas
do provide API pd.Series.to_frame
.
就像前面提到的@jezrael 一样,Pandas
提供 API pd.Series.to_frame
。
Step 1
第1步
You can also wrap the pd.Series
to pd.DataFrame
by just doing
您也可以包装pd.Series
,以pd.DataFrame
通过只是做
df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame
Then, you have a pd.DataFrame
with column name 'a'
, and your first column become the index
然后,您有一个pd.DataFrame
with column name 'a'
,并且您的第一列成为索引
Input: print(df_value_counts.index.values)
Output: [2 1]
Input: print(df_value_counts.columns)
Output: Index(['a'], dtype='object')
Step 2
第2步
What now?
现在怎么办?
If you want to add new column names here, as a pd.DataFrame
, you can simply reset the index by the API of reset_index().
如果您想在此处添加新的列名,作为 a pd.DataFrame
,您可以通过reset_index()的 API 简单地重置索引。
And then, change the column name by a list by API df.coloumns
然后,通过 API df.coloumns的列表更改列名
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']
Then, you got what you need
然后,你得到了你需要的东西
Output:
unique_values counts
0 2 3
1 1 2
Full Answer here
完整答案在这里
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts'] # change column names
回答by Constantino
I'll throw in my hat as well, essentially the same as @wy-hsu solution, but in function format:
我也会提出我的想法,本质上与@wy-hsu 解决方案相同,但采用函数格式:
def value_counts_df(df, col):
"""
Returns pd.value_counts() as a DataFrame
Parameters
----------
df : Pandas Dataframe
Dataframe on which to run value_counts(), must have column `col`.
col : str
Name of column in `df` for which to generate counts
Returns
-------
Pandas Dataframe
Returned dataframe will have a single column named "count" which contains the count_values()
for each unique value of df[col]. The index name of this dataframe is `col`.
Example
-------
>>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
count
a
2 3
1 2
"""
df = pd.DataFrame(df[col].value_counts())
df.index.name = col
df.columns = ['count']
return df