Python 如何在 matplotlib 中绘制和使用 NaN 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36455083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to plot and work with NaN values in matplotlib
提问by Kudrat
I have hourly data consisting of a number of columns. First column is a date (date_log
), and the rest of columns contain different sample points. The trouble is sample points are logged using different time even on hourly basis, so every column has at least a couple of NaN
. If I plot up using the first code it works nicely, but I want to have gaps where there no logger data for a day or so and do not want the points to be joined. If I use the second code I can see the gaps but due to NaN points the data points are not getting joined. In the example below, I'm just plotting the first three columns.
我有由许多列组成的每小时数据。第一列是日期 ( date_log
),其余列包含不同的样本点。问题是即使每小时也使用不同的时间记录样本点,因此每列至少有几个NaN
. 如果我使用第一个代码绘制它效果很好,但我希望在一天左右没有记录器数据的地方有间隙,并且不希望连接点。如果我使用第二个代码,我可以看到差距,但由于 NaN 点,数据点没有加入。在下面的示例中,我只是绘制前三列。
When there is a big gap like the blue points (01/06-01/07/2015) I want to have a gap then the points getting joined. The second example does not join the points. I like the first chart but I want to create gaps like the second method when there are no sample data points for 24h date range etc. leaving missing data points for longer times as a gap.
当像蓝点(01/06-01/07/2015)这样的差距很大时,我想有一个差距,然后点就会加入。第二个例子没有连接点。我喜欢第一个图表,但是当没有 24 小时日期范围的样本数据点等时,我想像第二种方法一样创建差距。将丢失的数据点作为差距保留更长的时间。
Is there any work around? Thanks
有什么解决办法吗?谢谢
Method-1:
方法一:
Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points
Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points
plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',)
plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b')
plt.show()
Method-2:
方法2:
plt.plot_date(date_log, Log_1a, ‘-r*', markersize=2, markeredgewidth=0, color='r') # Log_1a contains raw data with NaN
plt.plot_date(date_log, Log_1b, ‘-r*', markersize=2, markeredgewidth=0, color='r') # Log_1a contains raw data with NaN
plt.show()
回答by Joe Kington
If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaN
s) that you want filled and larger gaps that you don't.
如果我对您的理解是正确的,那么您的数据集有很多NaN
您想要填充的小间隙(单个s)和您不想填充的较大间隙。
Using pandas
to "forward-fill" gaps
使用pandas
以“前进填充”间隙
One option is to use pandas
fillna
with a limited amount of fill values.
一种选择是使用pandas
fillna
有限数量的填充值。
As a quick example of how this works:
作为其工作原理的快速示例:
In [1]: import pandas as pd; import numpy as np
In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])
In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0 1
1 1
2 2
3 2
4 NaN
5 3
6 3
7 NaN
8 NaN
9 4
dtype: float64
In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0 1
1 1
2 2
3 2
4 2
5 3
6 3
7 3
8 NaN
9 4
dtype: float64
As an example of using this for something similar to your case:
作为将其用于类似于您的情况的示例:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x = np.random.normal(0, 1, 1000).cumsum()
# Set every third value to NaN
x[::3] = np.nan
# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan
# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')
# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')
axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')
plt.show()
Using numpy
to interpolate gaps
使用numpy
插值差距
Alternatively, we can do this using only numpy
. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.
或者,我们可以仅使用numpy
. 可以(并且更有效)执行与上述 Pandas 方法相同的“前向填充”,但我将展示另一种方法来为您提供更多选项,而不仅仅是重复值。
Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.
我们可以对间隙中的值执行线性插值,而不是通过“间隙”重复最后一个值。这在计算上效率较低(我将通过在任何地方进行插值来使其效率更低),但对于大多数数据集,您不会注意到重大差异。
As an example, let's define an interpolate_gaps
function:
例如,让我们定义一个interpolate_gaps
函数:
def interpolate_gaps(values, limit=None):
"""
Fill gaps using linear interpolation, optionally only fill gaps up to a
size of `limit`.
"""
values = np.asarray(values)
i = np.arange(values.size)
valid = np.isfinite(values)
filled = np.interp(i, i[valid], values[valid])
if limit is not None:
invalid = ~valid
for n in range(1, limit+1):
invalid[:-n] &= invalid[n:]
filled[invalid] = np.nan
return filled
Note that we'll get interpolated value, unlike the previous pandas
version:
请注意,与以前的pandas
版本不同,我们将获得内插值:
In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]
In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1. , 1.5 , 2. , nan, 2.66666667,
3. , nan, nan, 3.75 , 4. ])
In the plotting example, if we replace the line:
在绘图示例中,如果我们替换该行:
filled = pd.Series(x).fillna(limit=2, method='ffill')
With:
和:
filled = interpolate_gaps(x, limit=2)
We'll get a visually identical plot:
我们将得到一个视觉上相同的图:
As a complete, stand-alone example:
作为一个完整的独立示例:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
def interpolate_gaps(values, limit=None):
"""
Fill gaps using linear interpolation, optionally only fill gaps up to a
size of `limit`.
"""
values = np.asarray(values)
i = np.arange(values.size)
valid = np.isfinite(values)
filled = np.interp(i, i[valid], values[valid])
if limit is not None:
invalid = ~valid
for n in range(1, limit+1):
invalid[:-n] &= invalid[n:]
filled[invalid] = np.nan
return filled
x = np.random.normal(0, 1, 1000).cumsum()
# Set every third value to NaN
x[::3] = np.nan
# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan
# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)
# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')
axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')
plt.show()
Note: I originally completely mis-read the question. See version history for my original answer.
注意:我最初完全误读了这个问题。请参阅我的原始答案的版本历史记录。
回答by Lenar Hoyt
I simply use this function:
我只是使用这个函数:
import math
for i in range(1,len(data)):
if math.isnan(data[i]):
data[i] = data[i-1]