Python 在 Pandas DafaFrame 中舍入条目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19100540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:52:02  来源:igfitidea点击:

Rounding entries in a Pandas DafaFrame

pythonnumpypandas

提问by dartdog

Using :

使用 :

newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean)

which yields:

产生:

           Alabama_exp  Credit_exp  Inventory_exp   National_exp    Price_exp   Sales_exp
Quradate                        
2010-01-15   0.568003    0.404481    0.488601    0.483097    0.431211    0.570755
2010-04-15   0.543620    0.385417    0.455078    0.468750    0.408203    0.564453

I'd like to get the decimal numbers rounded to two digit and multiplied by 100 eg .568003 should be 57 been fiddling with it for a while to no avail; tried this

我想将十进制数四舍五入为两位数并乘以 100,例如 .568003 应该是 57 已经摆弄了一段时间无济于事;试过这个

newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean).apply(round(2)) #and got:
TypeError: ("'float' object is not callable", u'occurred at index Alabama_exp')

Tried a number of other approaches to no avail most complain about the item not being a float... I see that the Pandas series object has a round method but DF does not I tried using df.apply but it complained about the float issue.

尝试了许多其他方法都无济于事,大多数人抱怨该项目不是浮点数...我看到 Pandas 系列对象有一个圆形方法,但 DF 我没有尝试使用 df.apply 但它抱怨浮点数问题。

采纳答案by ely

Just use numpy.round, e.g.:

只需使用numpy.round,例如:

100 * np.round(newdf3.pivot_table(rows=['Quradate'], aggfunc=np.mean), 2) 

As long as round is appropriate for all column types, this works on a DataFrame.

只要 round 适用于所有列类型,这适用于DataFrame.

With some data:

用一些数据:

In [9]: dfrm
Out[9]:
          A         B         C
0 -1.312700  0.760710  1.044006
1 -0.792521 -0.076913  0.087334
2 -0.557738  0.982031  1.365357
3  1.013947  0.345896 -0.356652
4  1.278278 -0.195477  0.550492
5  0.116599 -0.670163 -1.290245
6 -1.808143 -0.818014  0.713614
7  0.233726  0.634349  0.561103
8  2.344671 -2.331232 -0.759296
9 -1.658047  1.756503 -0.996620

In [10]: 100*np.round(dfrm, 2)
Out[10]:
     A    B    C
0 -131   76  104
1  -79   -8    9
2  -56   98  137
3  101   35  -36
4  128  -20   55
5   12  -67 -129
6 -181  -82   71
7   23   63   56
8  234 -233  -76
9 -166  176 -100

回答by Dan Allan

I'm leaving this here for the explanation of why the OP's approach threw an error, but subsequent solutions are better.

我把这个留在这里是为了解释为什么 OP 的方法会引发错误,但后续的解决方案更好。

The best solution is to simply use Series' roundmethod:

最好的解决方案是简单地使用 Series 的round方法:

In [11]: s
Out[11]: 
0    0.026574
1    0.304801
2    0.057819
dtype: float64

In [12]: 100*s.round(2)
Out[12]:  
0     3
1    30
2     6
dtype: float64

You might tack .astype('int')on there as well, depending on what you want to do next.

您也可以.astype('int')在那里进行操作,具体取决于您接下来要做什么。

To understand why your approach didn't work, remember that the function roundneeds two arguments, the number of decimal places and the data to be rounded. In general, to apply functions that take two arguments, you can "curry" the function like so:

要了解您的方法为何不起作用,请记住该函数round需要两个参数,小数位数和要四舍五入的数据。一般来说,要应用带有两个参数的函数,您可以像这样“咖喱”该函数:

In [13]: s.apply(lambda x: round(x, 2))
Out[13]: 
0    1.03
1    1.30
2   -1.06
dtype: float64

As DSM points out on the comments, for this case one actually needs the currying approach -because there is no roundmethod for DataFrames. df.applymap(...)is the way to go.

正如 DSM 在评论中指出的那样,对于这种情况,实际上需要使用柯里化方法 - 因为没有round用于 DataFrames 的方法。df.applymap(...)是要走的路。

回答by Phillip Cloud

For a modestly sized DataFrame, applymapwill be horrendously slow, since it is applying a Python function element by element in Python (i.e., there's no Cython speeding this up). It's faster to use applywith functools.partial:

对于中等大小的DataFrame,applymap会非常慢,因为它在 Python 中逐个元素地应用 Python 函数(即,没有 Cython 加速)。这是更快地使用applyfunctools.partial

In [22]: from functools import partial

In [23]: df = DataFrame(randn(100000, 20))

In [24]: f = partial(Series.round, decimals=2)

In [25]: timeit df.applymap(lambda x: round(x, 2))
1 loops, best of 3: 2.52 s per loop

In [26]: timeit df.apply(f)
10 loops, best of 3: 33.4 ms per loop

You could even make a function that returns a partial function that you can apply:

您甚至可以创建一个返回可以应用的部分函数的函数:

In [27]: def column_round(decimals):
   ....:     return partial(Series.round, decimals=decimals)
   ....:

In [28]: df.apply(column_round(2))

As @EMS suggests, you can use np.roundas well, since DataFrameimplements the __array__attribute and automatically wraps many of numpy's ufuncs. It's also about twice as fast with the frame shown above:

正如@EMS 所建议的,您也可以使用np.round,因为DataFrame实现了该__array__属性并自动包装了许多numpyufunc。使用上面显示的框架,它的速度也大约是其两倍:

In [47]: timeit np.round(df, 2)
100 loops, best of 3: 17.4 ms per loop

If you have non-numeric columns you can do this:

如果您有非数字列,您可以这样做:

In [12]: df = DataFrame(randn(100000, 20))

In [13]: df['a'] = tm.choice(['a', 'b'], size=len(df))

In [14]: dfnum = df._get_numeric_data()

In [15]: np.round(dfnum)

to avoid the cryptic error thrown by numpywhen you try to round a column of strings.

避免numpy当您尝试舍入一列字符串时抛出的神秘错误。

回答by Tickon

Since Pandas 0.17, DataFrames have a 'round'method:

从 Pandas 0.17 开始,DataFrames 有一个“round”方法:

df =newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean)
df.round()

which even allows you to have different precision for each column

它甚至允许您对每列有不同的精度

df.round({'Alabama_exp':2, 'Credit_exp':3})