Python 如何在 matplotlib 中绘制和使用 NaN 值

Question

提问by Kudrat

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is sample points are logged using different time even on hourly basis, so every column has at least a couple of NaN. If I plot up using the first code it works nicely, but I want to have gaps where there no logger data for a day or so and do not want the points to be joined. If I use the second code I can see the gaps but due to NaN points the data points are not getting joined. In the example below, I'm just plotting the first three columns.

我有由许多列组成的每小时数据。第一列是日期 ( date_log)，其余列包含不同的样本点。问题是即使每小时也使用不同的时间记录样本点，因此每列至少有几个NaN. 如果我使用第一个代码绘制它效果很好，但我希望在一天左右没有记录器数据的地方有间隙，并且不希望连接点。如果我使用第二个代码，我可以看到差距，但由于 NaN 点，数据点没有加入。在下面的示例中，我只是绘制前三列。

When there is a big gap like the blue points (01/06-01/07/2015) I want to have a gap then the points getting joined. The second example does not join the points. I like the first chart but I want to create gaps like the second method when there are no sample data points for 24h date range etc. leaving missing data points for longer times as a gap.

当像蓝点（01/06-01/07/2015）这样的差距很大时，我想有一个差距，然后点就会加入。第二个例子没有连接点。我喜欢第一个图表，但是当没有 24 小时日期范围的样本数据点等时，我想像第二种方法一样创建差距。将丢失的数据点作为差距保留更长的时间。

Is there any work around? Thanks

有什么解决办法吗？谢谢

Method-1:

方法一：

Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points
Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points

plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',)
plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b')
plt.show()

Method-2:

方法2：

plt.plot_date(date_log, Log_1a, ‘-r*', markersize=2, markeredgewidth=0, color='r') # Log_1a contains raw data with NaN
plt.plot_date(date_log, Log_1b, ‘-r*', markersize=2, markeredgewidth=0, color='r') # Log_1a contains raw data with NaN
plt.show()

Method-1 output:

方法 1 输出：

Method-2 output:

方法 2 输出：

Answer 1

回答by Joe Kington

If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

如果我对您的理解是正确的，那么您的数据集有很多NaN您想要填充的小间隙（单个s）和您不想填充的较大间隙。

Using `pandas`to "forward-fill" gaps

使用`pandas`以“前进填充”间隙

One option is to use pandasfillnawith a limited amount of fill values.

一种选择是使用pandasfillna有限数量的填充值。

As a quick example of how this works:

作为其工作原理的快速示例：

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])

In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0     1
1     1
2     2
3     2
4   NaN
5     3
6     3
7   NaN
8   NaN
9     4
dtype: float64

In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8   NaN
9     4
dtype: float64

As an example of using this for something similar to your case:

作为将其用于类似于您的情况的示例：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

Using `numpy`to interpolate gaps

使用`numpy`插值差距

Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

或者，我们可以仅使用numpy. 可以（并且更有效）执行与上述 Pandas 方法相同的“前向填充”，但我将展示另一种方法来为您提供更多选项，而不仅仅是重复值。

Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

我们可以对间隙中的值执行线性插值，而不是通过“间隙”重复最后一个值。这在计算上效率较低（我将通过在任何地方进行插值来使其效率更低），但对于大多数数据集，您不会注意到重大差异。

As an example, let's define an interpolate_gapsfunction:

例如，让我们定义一个interpolate_gaps函数：

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

Note that we'll get interpolated value, unlike the previous pandasversion:

请注意，与以前的pandas版本不同，我们将获得内插值：

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]

In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
        3.        ,         nan,         nan,  3.75      ,  4.        ])

In the plotting example, if we replace the line:

在绘图示例中，如果我们替换该行：

filled = pd.Series(x).fillna(limit=2, method='ffill')

With:

和：

filled = interpolate_gaps(x, limit=2)

We'll get a visually identical plot:

我们将得到一个视觉上相同的图：

As a complete, stand-alone example:

作为一个完整的独立示例：

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

Note: I originally completely mis-read the question. See version history for my original answer.

注意：我最初完全误读了这个问题。请参阅我的原始答案的版本历史记录。

Answer 2

回答by Lenar Hoyt

I simply use this function:

我只是使用这个函数：

import math
for i in range(1,len(data)):
  if math.isnan(data[i]):
    data[i] = data[i-1]

Python 如何在 matplotlib 中绘制和使用 NaN 值

提问by Kudrat

回答by Joe Kington

Using `pandas`to "forward-fill" gaps

使用`pandas`以“前进填充”间隙

Using `numpy`to interpolate gaps

使用`numpy`插值差距

回答by Lenar Hoyt

相关推荐

最近更新

标签

Python 如何在 matplotlib 中绘制和使用 NaN 值

提问by Kudrat

回答by Joe Kington

Using pandasto "forward-fill" gaps

使用pandas以“前进填充”间隙

Using numpyto interpolate gaps

使用numpy插值差距

回答by Lenar Hoyt

相关推荐

Python ValueError：未正确调用 DataFrame 构造函数！与熊猫

Python 用另一个数据帧的值替换一个数据帧中的列值

python flask在html页面上显示图像

Python 如何在 Jupyter 笔记本中包装代码/文本

相关推荐

最近更新

标签

Using `pandas`to "forward-fill" gaps

使用`pandas`以“前进填充”间隙

Using `numpy`to interpolate gaps

使用`numpy`插值差距