绘制包含 NaN 的 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13603181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:30:53  来源:igfitidea点击:

Plot pandas dataframe containing NaNs

pandasipythondata-analysis

提问by ajt

I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009).

我有来自三个不同 GPS 接收器的冰速 GPS 数据。数据位于带有 julian day 索引的 pandas 数据框中(从 2009 年初开始递增)。

This is a subset of the data (the main dataset is 3487235 rows...):

这是数据的一个子集(主要数据集是 3487235 行...):

                    R2          R7         R8
1235.000000 116.321959  100.805197  96.519977
1235.000116 NaN         100.771133  96.234957
1235.000231 NaN         100.584559  97.249262
1235.000347 118.823610  100.169055  96.777833
1235.000463 NaN         99.753551   96.598350
1235.000579 NaN         99.338048   95.283989
1235.000694 113.995003  98.922544   95.154067

The dataframe has form:

数据框具有以下形式:

Index: 6071320 entries, 127.67291667 to 1338.51805556
Data columns:
R2    3487235  non-null values
R7    3875864  non-null values
R8    1092430  non-null values
dtypes: float64(3)

R2 sampled at a different rate to R7 and R8 hence the NaNs which appear systematically at that spacing.

R2 以与 R7 和 R8 不同的速率采样,因此 NaN 以该间距系统地出现。

Trying df.plot()to plot the whole dataframe (or indexed row locations thereof) works fine in terms of plotting R7 and R8, but doesn't plot R2. Similarly, just doing df.R2.plot()also doesn't work. The only way to plot R2 is to do df.R2.dropna().plot(), but this also removes NaNs which signify periods of no data (rather than just a coarser sampling frequency than the other receivers).

尝试df.plot()绘制整个数据框(或其索引行位置)在绘制 R7 和 R8 方面效果很好,但不会绘制 R2。同样,光做df.R2.plot()也行不通。绘制 R2 的唯一方法是执行df.R2.dropna().plot(),但这也删除了表示无数据周期的 NaN(而不仅仅是比其他接收器更粗的采样频率)。

Has anyone else come across this? Any ideas on the problem would be gratefully received :)

有没有其他人遇到过这个?对这个问题的任何想法将不胜感激:)

采纳答案by Rutger Kassies

The reason your not seeing anything is because the default plot style is only a line. But the line gets interupted at NaN's so only multiple consequtive values will be plotted. And the latter doesnt happen in your case. You need to change the style of plotting, which depends on what you want to see.

您没有看到任何东西的原因是因为默认的绘图样式只是一条线。但是该线在 NaN 处中断,因此只会绘制多个连续值。而后者在你的情况下不会发生。您需要更改绘图风格,这取决于您想看到的内容。

For starters, try adding:

对于初学者,请尝试添加:

.plot(marker='o')

That should make all data points appear as circles. It easily gets cluttered so adjusting markersize, edgecolor etc might be usefull. Im not fully adjusted to how Pandas is using matplotlib so i often switch to matplotlib myself if plots get more complicated, eg:

这应该使所有数据点显示为圆圈。它很容易变得混乱,因此调整标记大小、边缘颜色等可能会很有用。我没有完全适应 Pandas 使用 matplotlib 的方式,所以如果情节变得更复杂,我经常自己切换到 matplotlib,例如:

plt.plot(df.R2.index.to_pydatetime(), df.R2, 'o-')

回答by Ed Rushton

Given that you want to draw a straight line between the points where you do have data, you can get Pandas to fill in the gaps via interpolation, and then plot:

假设你想在你有数据的点之间画一条直线,你可以让 Pandas 通过插值来填补空白,然后绘制:

.interpolate(method='linear').plot()

回答by kowpow

I found even if the df was indexed as DateTime the same issues occurred. One solution to ensure all data points are respected, with no gaps in between lines, is to plot each df column separately and dropping the NaNs.

我发现即使 df 被索引为 DateTime 也会发生同样的问题。一种确保尊重所有数据点且行之间没有间隙的解决方案是分别绘制每个 df 列并删除 NaN。

    for col in df.columns:
        plot_data = df[col].dropna()
        ax.plot(plot_data.index.values, plot_data.values, label=col)

回答by Vlad Lee

Here is another way:

这是另一种方式:

nan_columns = []
nan_values = []

for column in dataset.columns:
    nan_columns.append(column)
    nan_values.append(dataset[column].isnull().sum())

fig, ax = plt.subplots(figsize=(30,10))
plt.bar(nan_columns, nan_values)