Python 熊猫“描述”没有返回所有列的摘要

Question

提问by user2808117

I am running 'describe()' on a dataframe and getting summaries of only int columns (pandas 14.0).

我在数据帧上运行“describe()”并获取仅 int 列的摘要（pandas 14.0）。

The documentation says that for object columns frequency of most common value, and additional statistics would be returned. What could be wrong? (no error message is returned by the way)

该文档说，对于最常见值的对象列频率，将返回额外的统计信息。可能有什么问题？（顺便没有返回错误信息）

Edit:

编辑：

I think it's how the function is set to behave on mixed column types in a dataframe. Although the documentation fails to mention it.

我认为这是该函数在数据帧中的混合列类型上的行为方式。虽然文档没有提到它。

Example code:

示例代码：

df_test = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
df_test.dtypes
df_test.describe()
df_test['$a'] = df_test['$a'].astype(str)
df_test.describe()
df_test['$a'].describe()
df_test['$b'].describe()

My ugly work around in the meanwhile:

与此同时，我的丑陋工作：

def my_df_describe(df):
    objects = []
    numerics = []
    for c in df:
        if (df[c].dtype == object):
            objects.append(c)
        else:
            numerics.append(c)

    return df[numerics].describe(), df[objects].describe()

Answer 1

采纳答案by ilyas patanam

As of pandas v15.0, use the parameter, DataFrame.describe(include = 'all')to get a summary of all the columns when the dataframe has mixed column types. The default behavior is to only provide a summary for the numerical columns.

从 pandas v15.0 开始，DataFrame.describe(include = 'all')当数据框具有混合列类型时，使用参数，获取所有列的摘要。默认行为是仅提供数字列的摘要。

Example:

例子：

In[1]:

df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df.describe(include = 'all')

Out[1]:

        $a    $b
count   5   5.000000
unique  4   NaN
top     a   NaN
freq    2   NaN
mean    NaN 2.000000
std     NaN 1.581139
min     NaN 0.000000
25%     NaN 1.000000
50%     NaN 2.000000
75%     NaN 3.000000
max     NaN 4.000000

The numerical columns will have NaNs for summary statistics pertaining to objects (strings) and vice versa.

数字列将包含 NaN，用于与对象（字符串）有关的汇总统计信息，反之亦然。

Summarizing only numerical or object columns

仅汇总数字或对象列

To call describe()on just the numerical columns use describe(include = [np.number])

To call describe()on just the objects (strings) using describe(include = ['O']).

In[2]:

df.describe(include = [np.number])

Out[3]:

         $b
count   5.000000
mean    2.000000
std     1.581139
min     0.000000
25%     1.000000
50%     2.000000
75%     3.000000
max     4.000000

In[3]:

df.describe(include = ['O'])

Out[3]:

    $a
count   5
unique  4
top     a
freq    2

要describe()仅调用数字列，请使用describe(include = [np.number])

describe()只调用对象（字符串）使用describe(include = ['O']).

In[2]:

df.describe(include = [np.number])

Out[3]:

         $b
count   5.000000
mean    2.000000
std     1.581139
min     0.000000
25%     1.000000
50%     2.000000
75%     3.000000
max     4.000000

In[3]:

df.describe(include = ['O'])

Out[3]:

    $a
count   5
unique  4
top     a
freq    2

Answer 2

回答by RJT

'describe()' on a DataFrame only works for numeric types. If you think you have a numeric variable and it doesn't show up in 'decribe()', change the type with:

DataFrame 上的“describe()”仅适用于数字类型。如果您认为您有一个数字变量并且它没有出现在“decribe()”中，请更改类型：

df[['col1', 'col2']] = df[['col1', 'col2']].astype(float)

You could also create new columns for handling the numeric part of a mix type column, or convert strings to numbers using a dictionary and the map() function.

您还可以创建新列来处理混合类型列的数字部分，或者使用字典和 map() 函数将字符串转换为数字。

'describe()' on a non-numerical Series will give you some statistics (like count, unique and the most frequently occurring value).

非数字系列上的“describe()”将为您提供一些统计信息（例如计数、唯一值和最常出现的值）。

Answer 3

回答by Taras

In addition to DataFrame.describe(include = 'all')one can also use Series.value_counts()for each categorical column:

除了DataFrame.describe(include = 'all')一个还可以Series.value_counts()用于每个分类列：

In[1]:

df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df['$a'].value_counts()

Out[1]:
$a
a    2
d    1
b    1
c    1

Answer 4

回答by Shubham Chopra

You can execute df_test.info()to get the list of datatypes your data frame contains.If your data frame contains only numerical columns than df_test.describe() will work perfectly fine.As by default, it provides the summary of numerical values. If you want the summary of your Object(String) features you can use df_test.describe(include=['O']).

您可以执行df_test.info()以获取数据框包含的数据类型列表。如果您的数据框仅包含数字列，则 df_test.describe() 将工作得很好。默认情况下，它提供数值的摘要。如果您想要 Object(String) 功能的摘要，您可以使用df_test.describe(include=['O']).

Or in short, you can just use df_test.describe(include='all')to get summary of all the feature columns when your data frame has columns of various data types.

或者简而言之，df_test.describe(include='all')当您的数据框具有各种数据类型的列时，您可以使用它来获取所有特征列的摘要。

Answer 5

回答by MoeChen

pd.options.display.max_columns = DATA.shape[1]will work.

pd.options.display.max_columns = DATA.shape[1]将工作。

Here DATAis a 2d matrix, and above code will display stats vertically.

这DATA是一个二维矩阵，上面的代码将垂直显示统计信息。

Answer 6

回答by Jasper

In addition to the data type issues discussed in the other answers, you might also have too many columns to display. If there are too many columns, the middle columns will be replaced with a total of three dots (...).

除了其他答案中讨论的数据类型问题之外，您可能还有太多要显示的列。如果列太多，中间的列将被替换为总共三个点 ( ...)。

Other answers have pointed out that the include='all'parameter of describecan help with the data type issue. Another question asked, "How do I expand the output display to see more columns?" The solution is to modify the display.max_columnssetting, which can even be done temporarily. For example, to display up to 40 columns of output from a single describestatement:

其他答案指出include='all'参数 ofdescribe可以帮助解决数据类型问题。另一个问题问，“如何扩大输出显示以查看更多列？”解决方案是修改display.max_columns设置，甚至可以临时完成。例如，要显示单个describe语句的最多 40 列输出：

with pd.option_context('display.max_columns', 40):
    print(df.describe(include='all'))

Answer 7

回答by Anoop

With my dataframe named as "data". The below code works for me to show all the features after using data.describe()

将我的数据框命名为“数据”。以下代码适用于我在使用后显示所有功能data.describe()

with pd.option_context('display.max_columns', 40):
    print(data.describe(include = 'all'))

Python 熊猫“描述”没有返回所有列的摘要

提问by user2808117

采纳答案by ilyas patanam

回答by RJT

回答by Taras

回答by Shubham Chopra

回答by MoeChen

回答by Jasper

回答by Anoop

相关推荐

最近更新

标签

Python 熊猫“描述”没有返回所有列的摘要

提问by user2808117

采纳答案by ilyas patanam

回答by RJT

回答by Taras

回答by Shubham Chopra

回答by MoeChen

回答by Jasper

回答by Anoop

相关推荐

tkinter python 入口高度

Python 如何检查 Pandas DataFrame 中是否有任何值是 NaN

Python 按时间戳对 mongodb 文档进行排序（按降序排列）

Python 如何使pip“试运行”？

相关推荐

最近更新

标签