Python 如何在多线程模式下使用 Gunicorn 运行 Flask

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35837786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:02:56  来源:igfitidea点击:

How to run Flask with Gunicorn in multithreaded mode

pythonflaskmachine-learninggunicorn

提问by neel

I have web application written in Flask. As suggested by everyone, I can't use Flask in production. So I thought of Gunicorn with Flask.

我有用 Flask 编写的 Web 应用程序。正如大家所建议的,我不能在生产中使用 Flask。所以我想到了Gunicorn 和 Flask

In Flask application I am loading some Machine Learning models. These are of size 8GB collectively. Concurrency of my web application can go upto 1000 requests. And the RAM of machine is 15GB.
So what is the best way to run this application?

在 Flask 应用程序中,我正在加载一些机器学习模型。它们的总大小为 8GB。我的 Web 应用程序的并发性可以达到1000 个请求。并且机器的RAM是15GB。
那么运行这个应用程序的最佳方式是什么?

回答by molivier

You can start your app with multiple workers or async workers with Gunicorn.

您可以使用 Gunicorn 使用多个工作人员或异步工作人员启动您的应用程序。

Flask server.py

烧瓶服务器.py

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

Gunicorn with gevent async worker

Gunicorn 与 gevent 异步工作者

gunicorn server:app -k gevent --worker-connections 1000

Gunicorn 1 worker 12 threads:

Gunicorn 1 工人 12 线程:

gunicorn server:app -w 1 --threads 12

Gunicorn with 4 workers (multiprocessing):

有 4 个工人的 Gunicorn(多处理):

gunicorn server:app -w 4

More information on Flask concurrency in this post: How many concurrent requests does a single Flask process receive?.

这篇文章中有关 Flask 并发的更多信息:单个 Flask 进程接收多少并发请求?.

回答by slushi

The best thing to do is to use pre-fork mode (preload_app=True). This will initialize your code in a "master" process and then simply fork off worker processes to handle requests. If you are running on linux and assuming your model is read-only, the OS is smart enough to reuse the physical memory amongst all the processes.

最好的办法是使用预分叉模式(preload_app=True)。这将在“主”进程中初始化您的代码,然后简单地分叉工作进程来处理请求。如果您在 linux 上运行并假设您的模型是只读的,则操作系统足够智能,可以在所有进程中重用物理内存。