如何从 describe() 函数在 Python 中打印整数？

Question

提问by catris25

I am doing some statistical work using Python's pandas and I am having the following code to print out the data description (mean, count, median, etc).

我正在使用 Python 的 Pandas 做一些统计工作，我有以下代码来打印数据描述（平均值、计数、中位数等）。

data=pandas.read_csv(input_file)
print(data.describe())

But my data is pretty big (around 4 million rows) and each rows has very small data. So inevitably, the count would be big and the mean would be pretty small and thus Python print it like this.

但是我的数据非常大（大约 400 万行），每一行的数据都非常小。所以不可避免地，计数会很大，平均值会非常小，因此 Python 会像这样打印它。

I just want to print these numbers entirely just for ease of use and understanding, for example it better be 4393476instead of 4.393476e+06. I have googled it around and the most I can find is Display a float with two decimal places in Pythonand some other similar posts. But that will only work only if I have the numbers in a variable already. Not in my case though. In my case I haven't got those numbers. The numbers are created by the describe() function, so I don't know what numbers I will get.

我只想打印这些数字完全只是为了方便使用和理解的，例如，它可以更好4393476的代替4.393476e+06。我已经用谷歌搜索了它，我能找到的最多的是在 Python和其他一些类似的帖子中显示一个带有两位小数的浮点数。但这只有在我已经在变量中有数字时才有效。但在我的情况下不是。就我而言，我没有这些数字。这些数字是由 describe() 函数创建的，所以我不知道我会得到什么数字。

Sorry if this seems like a very basic question, I am still new to Python. Any response is appreaciated. Thanks.

对不起，如果这看起来是一个非常基本的问题，我还是 Python 的新手。任何回应都被认可。谢谢。

Answer 1

回答by juanpa.arrivillaga

Suppose you have the following DataFrame:

假设您有以下内容DataFrame：

Edit

编辑

I checked the docs and you should probably use the pandas.set_optionAPI to do this:

我检查了文档，您可能应该使用pandas.set_optionAPI 来执行此操作：

In [13]: df
Out[13]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [14]: pd.set_option('float_format', '{:f}'.format)

In [15]: df
Out[15]: 
                 a                b                c
0 440554429.333866 142530512.999182 638719977.824965
1 879250168.522411 713590875.479215  46526045.819487
2 507493741.709532 300876106.387427 178135140.583541
3  11884941.851962 792671390.499431 948594814.816647
4 607137206.305609 323694879.619369 446424361.522071
5 174424035.448168 406285189.907148 445616045.754137
6  76226556.685384 979050957.963583 758710090.127867
7 876261954.607558 129857447.076183 448719292.453509
8 626264394.999419 464814260.796770 594750038.747595
9 595118819.308896 974480400.272515 857247528.610996

In [16]: df.describe()
Out[16]: 
                     a                b                c
count        10.000000        10.000000        10.000000
mean  479461624.877280 522785202.100082 536344333.626082
std   306428177.277935 320806568.078629 284507176.411675
min    11884941.851962 129857447.076183  46526045.819487
25%   240956633.919592 306580799.695412 445818124.696121
50%   551306280.509214 435549725.351959 521734665.600552
75%   621482597.825966 772901261.744377 728712562.052142
max   879250168.522411 979050957.963583 948594814.816647

End of edit

编辑结束

In [7]: df
Out[7]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [8]: df.describe()
Out[8]: 
                  a             b             c
count  1.000000e+01  1.000000e+01  1.000000e+01
mean   4.794616e+08  5.227852e+08  5.363443e+08
std    3.064282e+08  3.208066e+08  2.845072e+08
min    1.188494e+07  1.298574e+08  4.652605e+07
25%    2.409566e+08  3.065808e+08  4.458181e+08
50%    5.513063e+08  4.355497e+08  5.217347e+08
75%    6.214826e+08  7.729013e+08  7.287126e+08
max    8.792502e+08  9.790510e+08  9.485948e+08

You need to fiddle with the pandas.options.display.float_formatattribute. Note, in my code I've used import pandas as pd. A quick fix is something like:

你需要摆弄pandas.options.display.float_format属性。请注意，在我的代码中，我使用了import pandas as pd. 快速修复是这样的：

In [29]: pd.options.display.float_format = "{:.2f}".format

In [10]: df
Out[10]: 
             a            b            c
0 440554429.33 142530513.00 638719977.82
1 879250168.52 713590875.48  46526045.82
2 507493741.71 300876106.39 178135140.58
3  11884941.85 792671390.50 948594814.82
4 607137206.31 323694879.62 446424361.52
5 174424035.45 406285189.91 445616045.75
6  76226556.69 979050957.96 758710090.13
7 876261954.61 129857447.08 448719292.45
8 626264395.00 464814260.80 594750038.75
9 595118819.31 974480400.27 857247528.61

In [11]: df.describe()
Out[11]: 
                 a            b            c
count        10.00        10.00        10.00
mean  479461624.88 522785202.10 536344333.63
std   306428177.28 320806568.08 284507176.41
min    11884941.85 129857447.08  46526045.82
25%   240956633.92 306580799.70 445818124.70
50%   551306280.51 435549725.35 521734665.60
75%   621482597.83 772901261.74 728712562.05
max   879250168.52 979050957.96 948594814.82

Answer 2

回答by unutbu

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe()
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
print(desc)

yields

产量

              A         B         C
count   4393476   4393476   4393476
mean   0.050039  0.050056  0.050057
std    0.028834  0.028836  0.028849
min    0.000100  0.000100  0.000100
25%    0.025076  0.025081  0.025065
50%    0.050047  0.050050  0.050037
75%    0.074987  0.075027  0.075055
max    0.100000  0.100000  0.100000

Under the hood, DataFrames are organized in columns. The values in a column can only have one data type (the column's dtype). The DataFrame returned by df.describe()has columns of floating-point dtype:

在引擎盖下，DataFrame 按列组织。一列中的值只能有一种数据类型（列的dtype）。返回的 DataFramedf.describe()具有浮点数据类型的列：

In [116]: df.describe().info()
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 3 columns):
A    8 non-null float64
B    8 non-null float64
C    8 non-null float64
dtypes: float64(3)
memory usage: 256.0+ bytes

DataFrames do not allow you to treat one rowas integers and the other rows as floats. However, if you change the contents of the DataFrame to strings, then you have full control over the way the values are displayed since all the values are just strings.

DataFrames 不允许您将一行视为整数而将其他行视为浮点数。但是，如果您将 DataFrame 的内容更改为字符串，则您可以完全控制值的显示方式，因为所有值都只是字符串。

Thus, to create a DataFrame in the desired format, you could use

因此，要以所需格式创建 DataFrame，您可以使用

desc.loc['count'] = desc.loc['count'].astype(int).astype(str)

to convert the countrow to integers (by calling astype(int)), and then convert the integers to strings (by calling astype(str)). Then

将count行转换为整数（通过调用astype(int)），然后将整数转换为字符串（通过调用astype(str)）。然后

desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)

converts the rest of the floats to strings using the str.formatmethodto format the floats to 6 digits after the decimal point.

使用将浮点数格式化为小数点后 6 位的str.format方法将其余浮点数转换为字符串。

Alternatively, you could use

或者，您可以使用

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe().T
desc['count'] = desc['count'].astype(int)
print(desc)

which yields

这产生

     count      mean       std     min       25%       50%       75%  max
A  4393476  0.050039  0.028834  0.0001  0.025076  0.050047  0.074987  0.1
B  4393476  0.050056  0.028836  0.0001  0.025081  0.050050  0.075027  0.1
C  4393476  0.050057  0.028849  0.0001  0.025065  0.050037  0.075055  0.1

By transposing the descDataFrame, the counts are now in their own column. So now the problem can be solved by converting that column's dtype to int.

通过转置descDataFrame，counts 现在位于它们自己的列中。所以现在可以通过将该列的 dtype 转换为int.

One advantage of doing it this way is that the values in descremain numerical. So further calculations based on the numeric values can still be done.

这样做的一个优点是中的值desc保持数值。因此，仍然可以根据数值进行进一步的计算。

I think this solution is preferrable, provided that the transposed format is acceptable.

我认为这种解决方案是可取的，前提是转置格式是可以接受的。

如何从 describe() 函数在 Python 中打印整数？

提问by catris25

回答by juanpa.arrivillaga

Edit

编辑

End of edit

编辑结束

回答by unutbu

相关推荐

最近更新

标签

如何从 describe() 函数在 Python 中打印整数？

提问by catris25

回答by juanpa.arrivillaga

Edit

编辑

End of edit

编辑结束

回答by unutbu

相关推荐

Python 无法在seaborn distplot中显示图例

Python 尝试运行 pgAdmin4 时出错

Python JSONDecodeError: 需要 ',' 分隔符：第 1 行第 43 列（字符 42）

Python 轴类 - 以给定单位明确设置轴的大小（宽度/高度）

相关推荐

最近更新

标签