Python - 从数据库创建带有图表的 pdf 报告的过程是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51864730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:57:27  来源:igfitidea点击:

Python - What is the process to create pdf reports with charts from a DB?

pythonpandasmatplotlibjinja2plotly

提问by ozo

I have a database generated by a survey to evaluate university professors. What I want is a python script that takes the information from that database, generates a graphing table for each user, creates graphs for each user, and then renders it in a template to export it to a pdf.

我有一个通过调查生成的数据库来评估大学教授。我想要的是一个 python 脚本,它从该数据库中获取信息,为每个用户生成一个图表,为每个用户创建图表,然后将其呈现在模板中以将其导出为 pdf。

What does the database look like?

数据库是什么样子的?

User    Professor_evaluated  Category       Question    Answer
_________________________________________________________________
Mike    Professor Criss       respect           1         3
Mike    Professor Criss       respect           2         4
Mike    Professor Criss       wisdom            3         5
Mike    Professor Criss       wisdom            4         3
Charles Professor Criss       respect           1         3
Charles Professor Criss       respect           2         4
Charles Professor Criss       wisdom            3         5
Charles Professor Criss       wisdom            4         3

Each teacher has several categories assigned to be evaluated (respect, wisdom, etc.) and in turn each category has associated questions. In other words, a category has several questions. Each row of the DB is the answer to a question from a student evaluating a teacher

每个老师都有几个要评估的类别(尊重、智慧等),而每个类别都有相关的问题。换句话说,一个类别有几个问题。DB 的每一行都是学生评价老师的问题的答案

What do I need?

我需要什么?

I need to create a script for automatically generate pdf reports that summarizes this information through charts, for example a chart with the overall score of each teacher, another chart with the score of each teacher by category, another chart with the average of each student, etc..Finally, every teacher would have a report.I want a report like thisExample

我需要创建一个脚本来自动生成 pdf 报告,通过图表总结这些信息,例如一个图表,每个老师的总分,另一个图表,每个老师按类别的分数,另一个图表,每个学生的平均分,等等..最后,每个老师都会有报告。我想要这样的报告例子

What is my question?

我的问题是什么?

my question is about which python packages and modules I would need to do this task. And what would be the general process of doing so. I don't need the code, because I know the answer is very general, but the knowledge of how I could do it.

我的问题是我需要哪些 python 包和模块来完成这项任务。这样做的一般过程是什么。我不需要代码,因为我知道答案很笼统,但我知道如何做到这一点。

For example: you would first need to process the information with pandas, to create a table that summarizes the information you want to graph, then plot it, then create a template of your report with XYZ module and then export it to pdf with XYZ module.

例如:您首先需要使用 Pandas 处理信息,创建一个表格来汇总您想要绘制的信息,然后绘制它,然后使用 XYZ 模块创建报告模板,然后使用 XYZ 模块将其导出为 pdf .

回答by patrickjlong1

There are a lot of options for creating a pdf in python. Some of these options are ReportLab, pydf2, pdfdocument and FPDF.

在 python 中创建 pdf 有很多选项。其中一些选项是 ReportLab、pydf2、pdfdocument 和 FPDF。

The FPDF library is fairly stragihtforward to use and is what I've used in this example. FPDF Documentation can be found here.

FPDF 库使用起来相当简单,这就是我在这个例子中使用的。FPDF 文档可以在这里找到。

It's perhaps also good to think about what python modules you might want to use to create graphs and tables. In my example, I use matplotlib (link to docs) and I also use Pandas to create a dataframe using pandas.dataframe().

考虑一下您可能想使用哪些 Python 模块来创建图形和表格,这也许也很好。在我的示例中,我使用 matplotlib(链接到 docs)并且我还使用 Pandas 创建一个使用pandas.dataframe().

I've posted a rather lengthy but fully reproducible example below, using pandas, matplotlib and fpdf. The data are a subset of what the OP provided in the question. I loop through the dataframe in my example to create the table, but there are alternative and perhaps more efficient ways to do this.

我在下面发布了一个相当冗长但完全可重现的示例,使用 Pandas、matplotlib 和 fpdf。数据是 OP 在问题中提供的内容的子集。我在示例中循环遍历数据框以创建表,但还有其他可能更有效的方法来执行此操作。

import pandas as pd
import matplotlib
from pylab import title, figure, xlabel, ylabel, xticks, bar, legend, axis, savefig
from fpdf import FPDF


df = pd.DataFrame()
df['Question'] = ["Q1", "Q2", "Q3", "Q4"]
df['Charles'] = [3, 4, 5, 3]
df['Mike'] = [3, 3, 4, 4]

title("Professor Criss's Ratings by Users")
xlabel('Question Number')
ylabel('Score')

c = [2.0, 4.0, 6.0, 8.0]
m = [x - 0.5 for x in c]

xticks(c, df['Question'])

bar(m, df['Mike'], width=0.5, color="#91eb87", label="Mike")
bar(c, df['Charles'], width=0.5, color="#eb879c", label="Charles")

legend()
axis([0, 10, 0, 8])
savefig('barchart.png')

pdf = FPDF()
pdf.add_page()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 12)
pdf.cell(60)
pdf.cell(75, 10, "A Tabular and Graphical Report of Professor Criss's Ratings by Users Charles and Mike", 0, 2, 'C')
pdf.cell(90, 10, " ", 0, 2, 'C')
pdf.cell(-40)
pdf.cell(50, 10, 'Question', 1, 0, 'C')
pdf.cell(40, 10, 'Charles', 1, 0, 'C')
pdf.cell(40, 10, 'Mike', 1, 2, 'C')
pdf.cell(-90)
pdf.set_font('arial', '', 12)
for i in range(0, len(df)):
    pdf.cell(50, 10, '%s' % (df['Question'].iloc[i]), 1, 0, 'C')
    pdf.cell(40, 10, '%s' % (str(df.Mike.iloc[i])), 1, 0, 'C')
    pdf.cell(40, 10, '%s' % (str(df.Charles.iloc[i])), 1, 2, 'C')
    pdf.cell(-90)
pdf.cell(90, 10, " ", 0, 2, 'C')
pdf.cell(-30)
pdf.image('barchart.png', x = None, y = None, w = 0, h = 0, type = '', link = '')
pdf.output('test.pdf', 'F')

Expected test.pdf:

预期测试.pdf:

Expected test.pdf

预期测试.pdf

Update (April 2020):I made an edit to the original answer in April 2020 to replace use of pandas.DataFrame.ix()since this is deprecated. In my example I was able to replace it's use with pandas.DataFrame.ilocand the output is the same as before.

更新(2020 年 4 月):我在 2020 年 4 月对原始答案进行了编辑,以替换 的使用,pandas.DataFrame.ix()因为这已被弃用。在我的示例中,我能够替换它的使用,pandas.DataFrame.iloc并且输出与以前相同。

回答by Fernando Garcia

In my case:

就我而言:

  • Connect to Oracle Database and extract data using cx_Oracle library
  • Use Pandas Dataframes for data manipulation
  • Use Matplotlib to generate graphs
  • Use ExcelWriter and ReportLab for output in Excel or PDF format
  • 使用 cx_Oracle 库连接到 Oracle 数据库并提取数据
  • 使用 Pandas Dataframes 进行数据操作
  • 使用 Matplotlib 生成图形
  • 使用 ExcelWriter 和 ReportLab 以 Excel 或 PDF 格式输出

Hope this helps.

希望这可以帮助。