Python 计算数据框中列的汇总统计信息

Question

提问by Tyler Wood

I have a dataframe of the following form (for example)

我有以下形式的数据框（例如）

shopper_num,is_martian,number_of_items,count_pineapples,birth_country,tranpsortation_method
1,FALSE,0,0,MX,
2,FALSE,1,0,MX,
3,FALSE,0,0,MX,
4,FALSE,22,0,MX,
5,FALSE,0,0,MX,
6,FALSE,0,0,MX,
7,FALSE,5,0,MX,
8,FALSE,0,0,MX,
9,FALSE,4,0,MX,
10,FALSE,2,0,MX,
11,FALSE,0,0,MX,
12,FALSE,13,0,MX,
13,FALSE,0,0,CA,
14,FALSE,0,0,US,

How can I use Pandas to calculate summary statistics of each column (column data types are variable, some columns have no information

如何使用 Pandas 计算每列的汇总统计信息（列数据类型是可变的，有些列没有信息

And then return the a dataframe of the form:

然后返回表单的数据框：

columnname, max, min, median,

is_martian, NA, NA, FALSE

So on and so on

等等等等

Answer 1

采纳答案by EdChum

describemay give you everything you want otherwise you can perform aggregations using groupby and pass a list of agg functions: http://pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once

describe可能会给你你想要的一切，否则你可以使用 groupby 执行聚合并传递 agg 函数列表：http: //pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once

In [43]:

df.describe()

Out[43]:

       shopper_num is_martian  number_of_items  count_pineapples
count      14.0000         14        14.000000                14
mean        7.5000          0         3.357143                 0
std         4.1833          0         6.452276                 0
min         1.0000      False         0.000000                 0
25%         4.2500          0         0.000000                 0
50%         7.5000          0         0.000000                 0
75%        10.7500          0         3.500000                 0
max        14.0000      False        22.000000                 0

[8 rows x 4 columns]

Note that some columns cannot be summarised as there is no logical way to summarise them, for instance columns containing string data

请注意，某些列无法汇总，因为没有逻辑方法来汇总它们，例如包含字符串数据的列

As you prefer you can transpose the result if you prefer:

如果您愿意，您可以根据自己的喜好转置结果：

In [47]:

df.describe().transpose()

Out[47]:

                 count      mean       std    min   25%  50%    75%    max
shopper_num         14       7.5    4.1833      1  4.25  7.5  10.75     14
is_martian          14         0         0  False     0    0      0  False
number_of_items     14  3.357143  6.452276      0     0    0    3.5     22
count_pineapples    14         0         0      0     0    0      0      0

[4 rows x 8 columns]

Answer 2

回答by Ken Wallace

To clarify one point in @EdChum's answer, per the documentation, you can include the object columns by using df.describe(include='all'). It won't provide many statistics, but will provide a few pieces of info, including count, number of unique values, top value. This may be a new feature, I don't know as I am a relatively new user.

为了澄清@EdChum 回答中的一点，根据文档，您可以使用df.describe(include='all'). 它不会提供很多统计信息，但会提供一些信息，包括计数、唯一值的数量、最高值。这可能是一个新功能，我不知道，因为我是一个相对较新的用户。

Answer 3

回答by akilat90

Now there is the pandas_profilingpackage, which is a more complete alternative to df.describe().

现在有了pandas_profiling包，它是df.describe().

If your pandas dataframe is df, the below will return a complete analysis including some warnings about missing values, skewness, etc. It presents histograms and correlation plots as well.

如果您的 Pandas 数据框是df，下面将返回一个完整的分析，包括一些关于缺失值、偏度等的警告。它还显示了直方图和相关图。

import pandas_profiling
pandas_profiling.ProfileReport(df)

See the example notebookdetailing the usage.

请参阅示例笔记本，详细说明用法。

Python 计算数据框中列的汇总统计信息

提问by Tyler Wood

采纳答案by EdChum

回答by Ken Wallace

回答by akilat90

相关推荐

最近更新

标签

Python 计算数据框中列的汇总统计信息

提问by Tyler Wood

采纳答案by EdChum

回答by Ken Wallace

回答by akilat90

相关推荐

Python - 四舍五入到最接近的十

Python pandas：如何将一列中的文本拆分为多行？

swift if or/and 语句像 python

Python 如何将数据帧行分组到pandas groupby中的列表中？

相关推荐

最近更新

标签