尝试将日志方法应用于 Python 中的 Pandas 数据框列时出错

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16968433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:52:57  来源:igfitidea点击:

Error when trying to apply log method to pandas data frame column in Python

pythonnumpypandasdataframe

提问by user2460677

So, I am very new to Python and Pandas (and programming in general), but am having trouble with a seemingly simple function. So I created the following dataframe using data pulled with a SQL query (if you need to see the SQL query, let me know and I'll paste it)

所以,我对 Python 和 Pandas(以及一般的编程)非常陌生,但是在使用一个看似简单的函数时遇到了麻烦。因此,我使用通过 SQL 查询提取的数据创建了以下数据框(如果您需要查看 SQL 查询,请告诉我,我将粘贴它)

spydata = pd.DataFrame(row,columns=['date','ticker','close', 'iv1m', 'iv3m'])
tickerlist = unique(spydata[spydata['date'] == '2013-05-31'])

After that, I have written a function to create some new columns in the dataframe using the data already held in it:

之后,我编写了一个函数来使用数据框中已经保存的数据在数据框中创建一些新列:

def demean(arr):
    arr['retlog'] = log(arr['close']/arr['close'].shift(1))

    arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['1060rat'] = arr['10dvol']/arr['60dvol']
    arr['1090rat'] = arr['10dvol']/arr['90dvol']
    arr['60dis'] = (arr['1060rat'] - arr['1060rat'].mean())/arr['1060rat'].std()
    arr['90dis'] = (arr['1090rat'] - arr['1090rat'].mean())/arr['1090rat'].std()
    return arr

The only part that I'm having a problem with is the first line of the function:

我唯一遇到问题的部分是函数的第一行:

arr['retlog'] = log(arr['close']/arr['close'].shift(1))

Which, when I run, with this command, I get an error:

其中,当我使用此命令运行时,出现错误:

result = spydata.groupby(['ticker']).apply(demean)

Error:

错误:

    ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-196-4a66225e12ea> in <module>()
----> 1 result = spydata.groupby(['ticker']).apply(demean)
      2 results2 = result[result.date == result.date.max()]
      3 

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
    323         func = _intercept_function(func)
    324         f = lambda g: func(g, *args, **kwargs)
--> 325         return self._python_apply_general(f)
    326 
    327     def _python_apply_general(self, f):

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in _python_apply_general(self, f)
    326 
    327     def _python_apply_general(self, f):
--> 328         keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
    329 
    330         return self._wrap_applied_output(keys, values,

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, f, data, axis, keep_internal)
    632             # group might be modified
    633             group_axes = _get_axes(group)
--> 634             res = f(group)
    635             if not _is_indexed_like(res, group_axes):
    636                 mutated = True

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in <lambda>(g)
    322         """
    323         func = _intercept_function(func)
--> 324         f = lambda g: func(g, *args, **kwargs)
    325         return self._python_apply_general(f)
    326 

<ipython-input-195-47b6faa3f43c> in demean(arr)
      1 def demean(arr):
----> 2     arr['retlog'] = log(arr['close']/arr['close'].shift(1))
      3     arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
      4     arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
      5     arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))

AttributeError: log

I have tried changing the function to np.log as well as math.log, in which case I get the error

我尝试将函数更改为 np.log 和 math.log,在这种情况下我收到错误

TypeError: only length-1 arrays can be converted to Python scalars

I've tried looking this up, but haven't found anything directly applicable. Any clues?

我试过查找这个,但没有找到任何直接适用的内容。有什么线索吗?

回答by Dan Allan

This happens when the datatype of the column is not numeric. Try

当列的数据类型不是数字时会发生这种情况。尝试

arr['retlog'] = log(arr['close'].astype('float64')/arr['close'].astype('float64').shift(1))

I suspect that the numbers are stored as generic 'object' types, which I know causes log to throw that error. Here is a simple illustration of the problem:

我怀疑这些数字存储为通用的“对象”类型,我知道这会导致日志抛出该错误。这是问题的简单说明:

In [15]: np.log(Series([1,2,3,4], dtype='object'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-25deca6462b7> in <module>()
----> 1 np.log(Series([1,2,3,4], dtype='object'))

AttributeError: log

In [16]: np.log(Series([1,2,3,4], dtype='float64'))
Out[16]: 
0    0.000000
1    0.693147
2    1.098612
3    1.386294
dtype: float64

Your attempt with math.logdid not work because that function is designed for single numbers (scalars) only, not lists or arrays.

您的尝试math.log无效,因为该函数仅适用于单个数字(标量),而不是列表或数组。

For what it's worth, I think this is a confusing error message; it once stumped me for awhile, anyway. I wonder if it can be improved.

就其价值而言,我认为这是一个令人困惑的错误消息;无论如何,它曾经让我难住了一段时间。我想知道是否可以改进。