在一个图中绘制来自多个 Pandas 数据框的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44729498/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:52:19  来源:igfitidea点击:

Plotting data from multiple pandas data frames in one plot

pythonpandasplot

提问by K22

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.

我有兴趣使用来自几个不同 Pandas 数据框的数据绘制时间序列。我知道如何为单个时间序列绘制数据,我知道如何绘制子图,但是我如何设法从单个图中的多个不同数据框中进行绘制?我在下面有我的代码。基本上我正在做的是我正在扫描一个包含 json 文件的文件夹并将该 json 文件解析为一个Pandas,以便我可以绘图。当我运行此代码时,它仅从其中一只Pandas而不是创建的十只Pandas进行绘图。我知道创建了 10 个Pandas,因为我有一个打印语句来确保它们都是正确的。

import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json



parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
args = parser.parse_args()


dat = {}
i = 0

direc = args.f
directory = os.fsencode(direc)

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

for files in os.listdir(direc):
    filename = os.fsdecode(files)
    if filename.endswith(".json"):
        path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
        with open(path, 'r') as data_file:
            data = json.load(data_file)
            for r in data["commits"]:
                dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
                          r["num_files_changed"], r["author_date"])
                name = "df" + str(i).zfill(2)
                i = i + 1
                name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
                name.columns = ["index", "author_name", "num_deletions",
                                          "num_insertions", "num_lines_changed",
                                          "num_files_changed",  "author_date"]
                del name['index']
                name['author_date'] = name['author_date'].astype(int)
                name['author_date'] =  pd.to_datetime(name['author_date'], unit='s')
                ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
                print(name)
                continue

    else:
        continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

回答by omdv

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.

其实很直接。不要让Pandas迷惑你。在它下面,每一列都只是一个 numpy 数组。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(df1['A'])
ax1.plot(df2['B'])

enter image description here

在此处输入图片说明

回答by Sergey Sergienko

pd.DataFrame.plot method has an argument axfor this:

pd.DataFrame.plot 方法为此有一个参数ax

fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)

回答by Scott Boston

If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.

如果您使用的是Pandas图,则 datafame.plot 的返回是轴,因此您可以分配下一个 dataframe.plot 等于该轴。

df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))

df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))

ax = df1.plot(label='df1')
df2.plot(ax=ax)

Output: enter image description here

输出: 在此处输入图片说明

Or if your dataframes have the same index, you can use pd.concat:

或者,如果您的数据帧具有相同的索引,您可以使用pd.concat

pd.concat([df1,df2],axis=1).plot()

回答by idleCoder

Trust me. @omdv's answer is the only solution I have found so far. Pandas dataframe plotfunction doesn't show plotting at all when you pass axto it.

相信我。@omdv 的答案是我目前找到的唯一解决方案。当您将ax传递给Pandas 数据框绘图函数时,它根本不显示绘图。

df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])

enter image description here

在此处输入图片说明