pandas 即使大部分数据已填充,也无法插入数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34934511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cannot interpolate dataframe even if most of the data is filled
提问by Mincong Huang
I tried to interpolate the NaN in my DataFrame using interpolate()
method. However, the method failed with error :
我尝试使用interpolate()
方法在我的 DataFrame 中插入 NaN 。但是,该方法失败并出现错误:
Cannot interpolate with all NaNs.
无法插入所有 NaN。
Here's the code:
这是代码:
try:
df3.interpolate(method='index', inplace=True)
processor._arma(df3['TCA'])
except Exception, e:
sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, e))
sys.stderr.write('%s: [%s] len=%d\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, len(df3.index)))
sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, df3.to_string()))
This is strange, because most of the data is already filled, as you can see in log 1or log 2. The length of the dataframe is 20, as all the data shown below. Even each cell is filled, I still can't use interpolate method. BTW, df3
is a globalvalue, I'm not sure if it would be a problem.
这很奇怪,因为大部分数据已经填满,正如您在日志 1或日志 2 中看到的那样。数据帧的长度为 20,所有数据如下所示。即使每个单元格都被填充,我仍然无法使用插值方法。顺便说一句,df3
是一个全局值,我不确定它是否会成为问题。
log 1
日志 1
2016-01-21 22:06:11: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:11: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:11: [ESIG_node_003_400585511]
TCA TCB TCC
2016-01-21 20:06:22 19 17 18
2016-01-21 20:06:23 19 17 18
2016-01-21 20:06:24 18 18 18
2016-01-21 20:06:25 18 17 18
2016-01-21 20:06:26 18 18 18
2016-01-21 20:06:27 19 18 18
2016-01-21 20:06:28 19 17 18
2016-01-21 20:06:29 18 18 18
2016-01-21 20:06:30 18 17 18
2016-01-21 20:06:31 19 17 18
2016-01-21 20:06:32 18 17 18
2016-01-21 20:06:33 18 18 18
2016-01-21 20:06:34 19 18 18
2016-01-21 20:06:35 18 17 18
2016-01-21 20:06:36 19 18 18
2016-01-21 20:06:37 18 18 18
2016-01-21 20:06:38 18 18 18
2016-01-21 20:06:39 19 18 18
2016-01-21 20:06:40 18 17 18
2016-01-21 20:06:41 18 18 18
log 2
日志 2
2016-01-21 22:06:14: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:14: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:14: [ESIG_node_003_400585511]
TCA TCB TCC
2016-01-21 20:06:33 18 18 18
2016-01-21 20:06:34 19 18 18
2016-01-21 20:06:35 18 17 18
2016-01-21 20:06:36 19 18 18
2016-01-21 20:06:37 18 18 18
2016-01-21 20:06:38 18 18 18
2016-01-21 20:06:39 19 18 18
2016-01-21 20:06:40 18 17 18
2016-01-21 20:06:41 18 18 18
2016-01-21 20:06:42 NaN NaN NaN
2016-01-21 20:06:43 NaN NaN NaN
2016-01-21 20:06:44 NaN NaN NaN
2016-01-21 20:06:45 NaN NaN NaN
2016-01-21 20:06:46 19 18 18
2016-01-21 20:06:47 18 17 18
2016-01-21 20:06:48 18 18 18
2016-01-21 20:06:49 19 18 18
2016-01-21 20:06:50 18 17 18
2016-01-21 20:06:51 18 18 18
2016-01-21 20:06:52 19 17 18
回答by unutbu
Check that your DataFrame has numeric dtypes, not object
dtypes. The
TypeError: Cannot interpolate with all NaNs
can occur if the DataFrame
contains columns of object
dtype. For example, if
检查您的 DataFrame 是否具有数字 dtypes,而不是object
dtypes。该
TypeError: Cannot interpolate with all NaNs
如果数据帧中包含的列可发生object
D型。例如,如果
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')},
index=['2016-01-21 20:06:22', '2016-01-21 20:06:23',
'2016-01-21 20:06:24'])
then df.interpolate()
raises the TypeError.
然后df.interpolate()
引发 TypeError。
To check if your DataFrame has columns with object dtype, look at df3.dtypes
:
要检查您的 DataFrame 是否具有对象 dtype 的列,请查看df3.dtypes
:
In [92]: df.dtypes
Out[92]:
A object
dtype: object
To fix the problem, you need to ensure the DataFrame has numeric columns with native NumPy dtypes. Obviously, it would be best to build the DataFrame correctly from the very beginning. So the best solution depends on how you are building the DataFrame.
要解决此问题,您需要确保 DataFrame 具有带有本机 NumPy dtypes 的数字列。显然,最好从一开始就正确构建 DataFrame。因此,最佳解决方案取决于您如何构建 DataFrame。
A less appealing patch-up fix would be to use pd.to_numeric
to convert the object arrays to numeric arrays after-the-fact:
一个不太吸引人的修补程序是在pd.to_numeric
事后将对象数组转换为数字数组:
for col in df:
df[col] = pd.to_numeric(df[col], errors='coerce')
With errors='coerce'
, any value that could not be converted to a number is converted to NaN. After calling pd.to_numeric
on each column, notice that the dtype is now float64
:
使用errors='coerce'
,任何无法转换为数字的值都会转换为 NaN。调用pd.to_numeric
每一列后,请注意 dtype 现在是float64
:
In [94]: df.dtypes
Out[94]:
A float64
dtype: object
Once the DataFrame has numeric dtypes, and the DataFrame has a DatetimeIndex, then df.interpolate(method='time')
will work:
一旦 DataFrame 具有数字 dtypes,并且 DataFrame 具有 DatetimeIndex,df.interpolate(method='time')
则将起作用:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')},
index=['2016-01-21 20:06:22', '2016-01-21 20:06:23',
'2016-01-21 20:06:24'])
for col in df:
df[col] = pd.to_numeric(df[col], errors='coerce')
df.index = pd.DatetimeIndex(df.index)
df = df.interpolate(method='time')
print(df)
yields
产量
A
2016-01-21 20:06:22 1.0
2016-01-21 20:06:23 15.5
2016-01-21 20:06:24 30.0