Python 在 Pandas DafaFrame 中舍入条目
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19100540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Rounding entries in a Pandas DafaFrame
提问by dartdog
Using :
使用 :
newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean)
which yields:
产生:
Alabama_exp Credit_exp Inventory_exp National_exp Price_exp Sales_exp
Quradate
2010-01-15 0.568003 0.404481 0.488601 0.483097 0.431211 0.570755
2010-04-15 0.543620 0.385417 0.455078 0.468750 0.408203 0.564453
I'd like to get the decimal numbers rounded to two digit and multiplied by 100 eg .568003 should be 57 been fiddling with it for a while to no avail; tried this
我想将十进制数四舍五入为两位数并乘以 100,例如 .568003 应该是 57 已经摆弄了一段时间无济于事;试过这个
newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean).apply(round(2)) #and got:
TypeError: ("'float' object is not callable", u'occurred at index Alabama_exp')
Tried a number of other approaches to no avail most complain about the item not being a float... I see that the Pandas series object has a round method but DF does not I tried using df.apply but it complained about the float issue.
尝试了许多其他方法都无济于事,大多数人抱怨该项目不是浮点数...我看到 Pandas 系列对象有一个圆形方法,但 DF 我没有尝试使用 df.apply 但它抱怨浮点数问题。
采纳答案by ely
Just use numpy.round
, e.g.:
只需使用numpy.round
,例如:
100 * np.round(newdf3.pivot_table(rows=['Quradate'], aggfunc=np.mean), 2)
As long as round is appropriate for all column types, this works on a DataFrame
.
只要 round 适用于所有列类型,这适用于DataFrame
.
With some data:
用一些数据:
In [9]: dfrm
Out[9]:
A B C
0 -1.312700 0.760710 1.044006
1 -0.792521 -0.076913 0.087334
2 -0.557738 0.982031 1.365357
3 1.013947 0.345896 -0.356652
4 1.278278 -0.195477 0.550492
5 0.116599 -0.670163 -1.290245
6 -1.808143 -0.818014 0.713614
7 0.233726 0.634349 0.561103
8 2.344671 -2.331232 -0.759296
9 -1.658047 1.756503 -0.996620
In [10]: 100*np.round(dfrm, 2)
Out[10]:
A B C
0 -131 76 104
1 -79 -8 9
2 -56 98 137
3 101 35 -36
4 128 -20 55
5 12 -67 -129
6 -181 -82 71
7 23 63 56
8 234 -233 -76
9 -166 176 -100
回答by Dan Allan
I'm leaving this here for the explanation of why the OP's approach threw an error, but subsequent solutions are better.
我把这个留在这里是为了解释为什么 OP 的方法会引发错误,但后续的解决方案更好。
The best solution is to simply use Series' round
method:
最好的解决方案是简单地使用 Series 的round
方法:
In [11]: s
Out[11]:
0 0.026574
1 0.304801
2 0.057819
dtype: float64
In [12]: 100*s.round(2)
Out[12]:
0 3
1 30
2 6
dtype: float64
You might tack .astype('int')
on there as well, depending on what you want to do next.
您也可以.astype('int')
在那里进行操作,具体取决于您接下来要做什么。
To understand why your approach didn't work, remember that the function round
needs two arguments, the number of decimal places and the data to be rounded. In general, to apply functions that take two arguments, you can "curry" the function like so:
要了解您的方法为何不起作用,请记住该函数round
需要两个参数,小数位数和要四舍五入的数据。一般来说,要应用带有两个参数的函数,您可以像这样“咖喱”该函数:
In [13]: s.apply(lambda x: round(x, 2))
Out[13]:
0 1.03
1 1.30
2 -1.06
dtype: float64
As DSM points out on the comments, for this case one actually needs the currying approach -because there is no round
method for DataFrames. df.applymap(...)
is the way to go.
正如 DSM 在评论中指出的那样,对于这种情况,实际上需要使用柯里化方法 - 因为没有round
用于 DataFrames 的方法。df.applymap(...)
是要走的路。
回答by Phillip Cloud
For a modestly sized DataFrame
, applymap
will be horrendously slow, since it is applying a Python function element by element in Python (i.e., there's no Cython speeding this up). It's faster to use apply
with functools.partial
:
对于中等大小的DataFrame
,applymap
会非常慢,因为它在 Python 中逐个元素地应用 Python 函数(即,没有 Cython 加速)。这是更快地使用apply
有functools.partial
:
In [22]: from functools import partial
In [23]: df = DataFrame(randn(100000, 20))
In [24]: f = partial(Series.round, decimals=2)
In [25]: timeit df.applymap(lambda x: round(x, 2))
1 loops, best of 3: 2.52 s per loop
In [26]: timeit df.apply(f)
10 loops, best of 3: 33.4 ms per loop
You could even make a function that returns a partial function that you can apply:
您甚至可以创建一个返回可以应用的部分函数的函数:
In [27]: def column_round(decimals):
....: return partial(Series.round, decimals=decimals)
....:
In [28]: df.apply(column_round(2))
As @EMS suggests, you can use np.round
as well, since DataFrame
implements the __array__
attribute and automatically wraps many of numpy
's ufuncs. It's also about twice as fast with the frame shown above:
正如@EMS 所建议的,您也可以使用np.round
,因为DataFrame
实现了该__array__
属性并自动包装了许多numpy
ufunc。使用上面显示的框架,它的速度也大约是其两倍:
In [47]: timeit np.round(df, 2)
100 loops, best of 3: 17.4 ms per loop
If you have non-numeric columns you can do this:
如果您有非数字列,您可以这样做:
In [12]: df = DataFrame(randn(100000, 20))
In [13]: df['a'] = tm.choice(['a', 'b'], size=len(df))
In [14]: dfnum = df._get_numeric_data()
In [15]: np.round(dfnum)
to avoid the cryptic error thrown by numpy
when you try to round a column of strings.
避免numpy
当您尝试舍入一列字符串时抛出的神秘错误。