Python 熊猫“描述”没有返回所有列的摘要
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24524104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas 'describe' is not returning summary of all columns
提问by user2808117
I am running 'describe()' on a dataframe and getting summaries of only int columns (pandas 14.0).
我在数据帧上运行“describe()”并获取仅 int 列的摘要(pandas 14.0)。
The documentation says that for object columns frequency of most common value, and additional statistics would be returned. What could be wrong? (no error message is returned by the way)
该文档说,对于最常见值的对象列频率,将返回额外的统计信息。可能有什么问题?(顺便没有返回错误信息)
Edit:
编辑:
I think it's how the function is set to behave on mixed column types in a dataframe. Although the documentation fails to mention it.
我认为这是该函数在数据帧中的混合列类型上的行为方式。虽然文档没有提到它。
Example code:
示例代码:
df_test = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
df_test.dtypes
df_test.describe()
df_test['$a'] = df_test['$a'].astype(str)
df_test.describe()
df_test['$a'].describe()
df_test['$b'].describe()
My ugly work around in the meanwhile:
与此同时,我的丑陋工作:
def my_df_describe(df):
objects = []
numerics = []
for c in df:
if (df[c].dtype == object):
objects.append(c)
else:
numerics.append(c)
return df[numerics].describe(), df[objects].describe()
采纳答案by ilyas patanam
As of pandas v15.0, use the parameter, DataFrame.describe(include = 'all')
to get a summary of all the columns when the dataframe has mixed column types. The default behavior is to only provide a summary for the numerical columns.
从 pandas v15.0 开始,DataFrame.describe(include = 'all')
当数据框具有混合列类型时,使用参数,获取所有列的摘要。默认行为是仅提供数字列的摘要。
Example:
例子:
In[1]:
df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df.describe(include = 'all')
Out[1]:
$a $b
count 5 5.000000
unique 4 NaN
top a NaN
freq 2 NaN
mean NaN 2.000000
std NaN 1.581139
min NaN 0.000000
25% NaN 1.000000
50% NaN 2.000000
75% NaN 3.000000
max NaN 4.000000
The numerical columns will have NaNs for summary statistics pertaining to objects (strings) and vice versa.
数字列将包含 NaN,用于与对象(字符串)有关的汇总统计信息,反之亦然。
Summarizing only numerical or object columns
仅汇总数字或对象列
- To call
describe()
on just the numerical columns usedescribe(include = [np.number])
To call
describe()
on just the objects (strings) usingdescribe(include = ['O'])
.In[2]: df.describe(include = [np.number]) Out[3]: $b count 5.000000 mean 2.000000 std 1.581139 min 0.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 4.000000 In[3]: df.describe(include = ['O']) Out[3]: $a count 5 unique 4 top a freq 2
- 要
describe()
仅调用数字列,请使用describe(include = [np.number])
describe()
只调用对象(字符串)使用describe(include = ['O'])
.In[2]: df.describe(include = [np.number]) Out[3]: $b count 5.000000 mean 2.000000 std 1.581139 min 0.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 4.000000 In[3]: df.describe(include = ['O']) Out[3]: $a count 5 unique 4 top a freq 2
回答by RJT
'describe()' on a DataFrame only works for numeric types. If you think you have a numeric variable and it doesn't show up in 'decribe()', change the type with:
DataFrame 上的“describe()”仅适用于数字类型。如果您认为您有一个数字变量并且它没有出现在“decribe()”中,请更改类型:
df[['col1', 'col2']] = df[['col1', 'col2']].astype(float)
You could also create new columns for handling the numeric part of a mix type column, or convert strings to numbers using a dictionary and the map() function.
您还可以创建新列来处理混合类型列的数字部分,或者使用字典和 map() 函数将字符串转换为数字。
'describe()' on a non-numerical Series will give you some statistics (like count, unique and the most frequently occurring value).
非数字系列上的“describe()”将为您提供一些统计信息(例如计数、唯一值和最常出现的值)。
回答by Taras
In addition to DataFrame.describe(include = 'all')
one can also use Series.value_counts()
for each categorical column:
除了DataFrame.describe(include = 'all')
一个还可以Series.value_counts()
用于每个分类列:
In[1]:
df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df['$a'].value_counts()
Out[1]:
$a
a 2
d 1
b 1
c 1
回答by Shubham Chopra
You can execute df_test.info()
to get the list of datatypes your data frame contains.If your data frame contains only numerical columns than df_test.describe() will work perfectly fine.As by default, it provides the summary of numerical values. If you want the summary of your Object(String) features you can use df_test.describe(include=['O'])
.
您可以执行df_test.info()
以获取数据框包含的数据类型列表。如果您的数据框仅包含数字列,则 df_test.describe() 将工作得很好。默认情况下,它提供数值的摘要。如果您想要 Object(String) 功能的摘要,您可以使用df_test.describe(include=['O'])
.
Or in short, you can just use df_test.describe(include='all')
to get summary of all the feature columns when your data frame has columns of various data types.
或者简而言之,df_test.describe(include='all')
当您的数据框具有各种数据类型的列时,您可以使用它来获取所有特征列的摘要。
回答by MoeChen
pd.options.display.max_columns = DATA.shape[1]
will work.
pd.options.display.max_columns = DATA.shape[1]
将工作。
Here DATA
is a 2d matrix, and above code will display stats vertically.
这DATA
是一个二维矩阵,上面的代码将垂直显示统计信息。
回答by Jasper
In addition to the data type issues discussed in the other answers, you might also have too many columns to display. If there are too many columns, the middle columns will be replaced with a total of three dots (...
).
除了其他答案中讨论的数据类型问题之外,您可能还有太多要显示的列。如果列太多,中间的列将被替换为总共三个点 ( ...
)。
Other answers have pointed out that the include='all'
parameter of describe
can help with the data type issue. Another question asked, "How do I expand the output display to see more columns?" The solution is to modify the display.max_columns
setting, which can even be done temporarily. For example, to display up to 40 columns of output from a single describe
statement:
其他答案指出include='all'
参数 ofdescribe
可以帮助解决数据类型问题。另一个问题问,“如何扩大输出显示以查看更多列?”解决方案是修改display.max_columns
设置,甚至可以临时完成。例如,要显示单个describe
语句的最多 40 列输出:
with pd.option_context('display.max_columns', 40):
print(df.describe(include='all'))
回答by Anoop
With my dataframe named as "data". The below code works for me to show all the features after using data.describe()
将我的数据框命名为“数据”。以下代码适用于我在使用后显示所有功能data.describe()
with pd.option_context('display.max_columns', 40):
print(data.describe(include = 'all'))