pandas 即使大部分数据已填充,也无法插入数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34934511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:32:59  来源:igfitidea点击:

Cannot interpolate dataframe even if most of the data is filled

pythonpandas

提问by Mincong Huang

I tried to interpolate the NaN in my DataFrame using interpolate()method. However, the method failed with error :

我尝试使用interpolate()方法在我的 DataFrame 中插入 NaN 。但是,该方法失败并出现错误:

Cannot interpolate with all NaNs.

无法插入所有 NaN。

Here's the code:

这是代码:

try:
    df3.interpolate(method='index', inplace=True)
    processor._arma(df3['TCA'])
except Exception, e:
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, e))
    sys.stderr.write('%s: [%s] len=%d\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, len(df3.index)))
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, df3.to_string()))

This is strange, because most of the data is already filled, as you can see in log 1or log 2. The length of the dataframe is 20, as all the data shown below. Even each cell is filled, I still can't use interpolate method. BTW, df3is a globalvalue, I'm not sure if it would be a problem.

这很奇怪,因为大部分数据已经填满,正如您在日志 1日志 2 中看到的那样。数据帧的长度为 20,所有数据如下所示。即使每个单元格都被填充,我仍然无法使用插值方法。顺便说一句,df3是一个全局值,我不确定它是否会成为问题。



log 1

日志 1

2016-01-21 22:06:11: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:11: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:11: [ESIG_node_003_400585511]
                     TCA TCB TCC
2016-01-21 20:06:22  19  17  18
2016-01-21 20:06:23  19  17  18
2016-01-21 20:06:24  18  18  18
2016-01-21 20:06:25  18  17  18
2016-01-21 20:06:26  18  18  18
2016-01-21 20:06:27  19  18  18
2016-01-21 20:06:28  19  17  18
2016-01-21 20:06:29  18  18  18
2016-01-21 20:06:30  18  17  18
2016-01-21 20:06:31  19  17  18
2016-01-21 20:06:32  18  17  18
2016-01-21 20:06:33  18  18  18
2016-01-21 20:06:34  19  18  18
2016-01-21 20:06:35  18  17  18
2016-01-21 20:06:36  19  18  18
2016-01-21 20:06:37  18  18  18
2016-01-21 20:06:38  18  18  18
2016-01-21 20:06:39  19  18  18
2016-01-21 20:06:40  18  17  18
2016-01-21 20:06:41  18  18  18

log 2

日志 2

2016-01-21 22:06:14: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:14: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:14: [ESIG_node_003_400585511]
                      TCA  TCB  TCC
2016-01-21 20:06:33   18   18   18
2016-01-21 20:06:34   19   18   18
2016-01-21 20:06:35   18   17   18
2016-01-21 20:06:36   19   18   18
2016-01-21 20:06:37   18   18   18
2016-01-21 20:06:38   18   18   18
2016-01-21 20:06:39   19   18   18
2016-01-21 20:06:40   18   17   18
2016-01-21 20:06:41   18   18   18
2016-01-21 20:06:42  NaN  NaN  NaN
2016-01-21 20:06:43  NaN  NaN  NaN
2016-01-21 20:06:44  NaN  NaN  NaN
2016-01-21 20:06:45  NaN  NaN  NaN
2016-01-21 20:06:46   19   18   18
2016-01-21 20:06:47   18   17   18
2016-01-21 20:06:48   18   18   18
2016-01-21 20:06:49   19   18   18
2016-01-21 20:06:50   18   17   18
2016-01-21 20:06:51   18   18   18
2016-01-21 20:06:52   19   17   18

回答by unutbu

Check that your DataFrame has numeric dtypes, not objectdtypes. The TypeError: Cannot interpolate with all NaNscan occur if the DataFrame contains columns of objectdtype. For example, if

检查您的 DataFrame 是否具有数字 dtypes,而不是objectdtypes。该 TypeError: Cannot interpolate with all NaNs如果数据帧中包含的列可发生objectD型。例如,如果

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

then df.interpolate()raises the TypeError.

然后df.interpolate()引发 TypeError。

To check if your DataFrame has columns with object dtype, look at df3.dtypes:

要检查您的 DataFrame 是否具有对象 dtype 的列,请查看df3.dtypes

In [92]: df.dtypes
Out[92]: 
A    object
dtype: object

To fix the problem, you need to ensure the DataFrame has numeric columns with native NumPy dtypes. Obviously, it would be best to build the DataFrame correctly from the very beginning. So the best solution depends on how you are building the DataFrame.

要解决此问题,您需要确保 DataFrame 具有带有本机 NumPy dtypes 的数字列。显然,最好从一开始就正确构建 DataFrame。因此,最佳解决方案取决于您如何构建 DataFrame。

A less appealing patch-up fix would be to use pd.to_numericto convert the object arrays to numeric arrays after-the-fact:

一个不太吸引人的修补程序是在pd.to_numeric事后将对象数组转换为数字数组:

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')

With errors='coerce', any value that could not be converted to a number is converted to NaN. After calling pd.to_numericon each column, notice that the dtype is now float64:

使用errors='coerce',任何无法转换为数字的值都会转换为 NaN。调用pd.to_numeric每一列后,请注意 dtype 现在是float64

In [94]: df.dtypes
Out[94]: 
A    float64
dtype: object

Once the DataFrame has numeric dtypes, and the DataFrame has a DatetimeIndex, then df.interpolate(method='time')will work:

一旦 DataFrame 具有数字 dtypes,并且 DataFrame 具有 DatetimeIndex,df.interpolate(method='time')则将起作用:

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.index = pd.DatetimeIndex(df.index)
df = df.interpolate(method='time')
print(df)

yields

产量

                        A
2016-01-21 20:06:22   1.0
2016-01-21 20:06:23  15.5
2016-01-21 20:06:24  30.0