pandas 即使大部分数据已填充，也无法插入数据框

Question

提问by Mincong Huang

I tried to interpolate the NaN in my DataFrame using interpolate()method. However, the method failed with error :

我尝试使用interpolate()方法在我的 DataFrame 中插入 NaN 。但是，该方法失败并出现错误：

Cannot interpolate with all NaNs.

无法插入所有 NaN。

Here's the code:

这是代码：

try:
    df3.interpolate(method='index', inplace=True)
    processor._arma(df3['TCA'])
except Exception, e:
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, e))
    sys.stderr.write('%s: [%s] len=%d\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, len(df3.index)))
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, df3.to_string()))

This is strange, because most of the data is already filled, as you can see in log 1or log 2. The length of the dataframe is 20, as all the data shown below. Even each cell is filled, I still can't use interpolate method. BTW, df3is a globalvalue, I'm not sure if it would be a problem.

这很奇怪，因为大部分数据已经填满，正如您在日志 1或日志 2 中看到的那样。数据帧的长度为 20，所有数据如下所示。即使每个单元格都被填充，我仍然无法使用插值方法。顺便说一句，df3是一个全局值，我不确定它是否会成为问题。

log 1

日志 1

2016-01-21 22:06:11: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:11: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:11: [ESIG_node_003_400585511]
                     TCA TCB TCC
2016-01-21 20:06:22  19  17  18
2016-01-21 20:06:23  19  17  18
2016-01-21 20:06:24  18  18  18
2016-01-21 20:06:25  18  17  18
2016-01-21 20:06:26  18  18  18
2016-01-21 20:06:27  19  18  18
2016-01-21 20:06:28  19  17  18
2016-01-21 20:06:29  18  18  18
2016-01-21 20:06:30  18  17  18
2016-01-21 20:06:31  19  17  18
2016-01-21 20:06:32  18  17  18
2016-01-21 20:06:33  18  18  18
2016-01-21 20:06:34  19  18  18
2016-01-21 20:06:35  18  17  18
2016-01-21 20:06:36  19  18  18
2016-01-21 20:06:37  18  18  18
2016-01-21 20:06:38  18  18  18
2016-01-21 20:06:39  19  18  18
2016-01-21 20:06:40  18  17  18
2016-01-21 20:06:41  18  18  18

log 2

日志 2

2016-01-21 22:06:14: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:14: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:14: [ESIG_node_003_400585511]
                      TCA  TCB  TCC
2016-01-21 20:06:33   18   18   18
2016-01-21 20:06:34   19   18   18
2016-01-21 20:06:35   18   17   18
2016-01-21 20:06:36   19   18   18
2016-01-21 20:06:37   18   18   18
2016-01-21 20:06:38   18   18   18
2016-01-21 20:06:39   19   18   18
2016-01-21 20:06:40   18   17   18
2016-01-21 20:06:41   18   18   18
2016-01-21 20:06:42  NaN  NaN  NaN
2016-01-21 20:06:43  NaN  NaN  NaN
2016-01-21 20:06:44  NaN  NaN  NaN
2016-01-21 20:06:45  NaN  NaN  NaN
2016-01-21 20:06:46   19   18   18
2016-01-21 20:06:47   18   17   18
2016-01-21 20:06:48   18   18   18
2016-01-21 20:06:49   19   18   18
2016-01-21 20:06:50   18   17   18
2016-01-21 20:06:51   18   18   18
2016-01-21 20:06:52   19   17   18

Answer 1

回答by unutbu

Check that your DataFrame has numeric dtypes, not objectdtypes. The TypeError: Cannot interpolate with all NaNscan occur if the DataFrame contains columns of objectdtype. For example, if

检查您的 DataFrame 是否具有数字 dtypes，而不是objectdtypes。该 TypeError: Cannot interpolate with all NaNs如果数据帧中包含的列可发生objectD型。例如，如果

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

then df.interpolate()raises the TypeError.

然后df.interpolate()引发 TypeError。

To check if your DataFrame has columns with object dtype, look at df3.dtypes:

要检查您的 DataFrame 是否具有对象 dtype 的列，请查看df3.dtypes：

In [92]: df.dtypes
Out[92]: 
A    object
dtype: object

To fix the problem, you need to ensure the DataFrame has numeric columns with native NumPy dtypes. Obviously, it would be best to build the DataFrame correctly from the very beginning. So the best solution depends on how you are building the DataFrame.

要解决此问题，您需要确保 DataFrame 具有带有本机 NumPy dtypes 的数字列。显然，最好从一开始就正确构建 DataFrame。因此，最佳解决方案取决于您如何构建 DataFrame。

A less appealing patch-up fix would be to use pd.to_numericto convert the object arrays to numeric arrays after-the-fact:

一个不太吸引人的修补程序是在pd.to_numeric事后将对象数组转换为数字数组：

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')

With errors='coerce', any value that could not be converted to a number is converted to NaN. After calling pd.to_numericon each column, notice that the dtype is now float64:

使用errors='coerce'，任何无法转换为数字的值都会转换为 NaN。调用pd.to_numeric每一列后，请注意 dtype 现在是float64：

In [94]: df.dtypes
Out[94]: 
A    float64
dtype: object

Once the DataFrame has numeric dtypes, and the DataFrame has a DatetimeIndex, then df.interpolate(method='time')will work:

一旦 DataFrame 具有数字 dtypes，并且 DataFrame 具有 DatetimeIndex，df.interpolate(method='time')则将起作用：

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.index = pd.DatetimeIndex(df.index)
df = df.interpolate(method='time')
print(df)

yields

产量

                        A
2016-01-21 20:06:22   1.0
2016-01-21 20:06:23  15.5
2016-01-21 20:06:24  30.0

pandas 即使大部分数据已填充，也无法插入数据框

提问by Mincong Huang

回答by unutbu

相关推荐

最近更新

标签

pandas 即使大部分数据已填充，也无法插入数据框

提问by Mincong Huang

回答by unutbu

相关推荐

pandas 在 np.where 子句之后，熊猫无法识别 NaN。为什么？或者这是一个错误？

pandas 当列中有字符串时，如何从熊猫列中获取最长长度的字符串/整数/浮点数

pandas 用标题将数据框写入excel

Pandas to_sql 如何确定将哪个数据框列放入哪个数据库字段？

相关推荐

最近更新

标签