python 如何分析 SQLAlchemy 驱动的应用程序？

Question

提问by goxe

Does anyone have experience profiling a Python/SQLAlchemy app? And what are the best way to find bottlenecks and design flaws?

有没有人有分析 Python/SQLAlchemy 应用程序的经验？找出瓶颈和设计缺陷的最佳方法是什么？

We have a Python application where the database layer is handled by SQLAlchemy. The application uses a batch design, so a lot of database requests is done sequentially and in a limited timespan. It currently takes a bit too long to run, so some optimization is needed. We don't use the ORM functionality, and the database is PostgreSQL.

我们有一个 Python 应用程序，其中数据库层由 SQLAlchemy 处理。该应用程序使用批处理设计，因此许多数据库请求是在有限的时间跨度内按顺序完成的。目前运行时间有点长，因此需要进行一些优化。我们不使用 ORM 功能，数据库是 PostgreSQL。

Answer 1

回答by zzzeek

Sometimes just plain SQL logging (enabled via python's logging module or via the echo=Trueargument on create_engine()) can give you an idea how long things are taking. For example if you log something right after a SQL operation, you'd see something like this in your log:

有时只是简单的 SQL 日志记录（通过 python 的日志记录模块或通过echo=True参数启用create_engine()）可以让您了解事情需要多长时间。例如，如果您在 SQL 操作后立即记录某些内容，您会在日志中看到如下内容：

17:37:48,325 INFO  [sqlalchemy.engine.base.Engine.0x...048c] SELECT ...
17:37:48,326 INFO  [sqlalchemy.engine.base.Engine.0x...048c] {<params>}
17:37:48,660 DEBUG [myapp.somemessage]

if you logged myapp.somemessageright after the operation, you know it took 334ms to complete the SQL part of things.

如果您myapp.somemessage在操作后立即登录，您就会知道完成 SQL 部分需要 334 毫秒。

Logging SQL will also illustrate if dozens/hundreds of queries are being issued which could be better organized into much fewer queries via joins. When using the SQLAlchemy ORM, the "eager loading" feature is provided to partially (contains_eager()) or fully (eagerload(), eagerload_all()) automate this activity, but without the ORM it just means to use joins so that results across multiple tables can be loaded in one result set instead of multiplying numbers of queries as more depth is added (i.e. r + r*r2 + r*r2*r3...)

记录 SQL 还将说明是否正在发出数十个/数百个查询，这些查询可以通过连接更好地组织成更少的查询。使用 SQLAlchemy ORM 时，提供了“急切加载”功能以部分 ( contains_eager()) 或完全 ( eagerload(), eagerload_all()) 自动化此活动，但没有 ORM 仅意味着使用连接，以便跨多个表的结果可以加载到一个结果集中而不是随着更多深度的增加而乘以查询的数量（即r + r*r2 + r*r2*r3...）

If logging reveals that individual queries are taking too long, you'd need a breakdown of how much time was spent within the database processing the query, sending results over the network, being handled by the DBAPI, and finally being received by SQLAlchemy's result set and/or ORM layer. Each of these stages can present their own individual bottlenecks, depending on specifics.

如果日志显示单个查询花费的时间太长，您需要细分在数据库处理查询、通过网络发送结果、由 DBAPI 处理以及最终由 SQLAlchemy 的结果集接收所花费的时间和/或 ORM 层。根据具体情况，这些阶段中的每一个都可能呈现出各自的瓶颈。

For that you need to use profiling, such as cProfile or hotshot. Here is a decorator I use:

为此，您需要使用分析，例如 cProfile 或 hotshot。这是我使用的装饰器：

import cProfile as profiler
import gc, pstats, time

def profile(fn):
    def wrapper(*args, **kw):
        elapsed, stat_loader, result = _profile("foo.txt", fn, *args, **kw)
        stats = stat_loader()
        stats.sort_stats('cumulative')
        stats.print_stats()
        # uncomment this to see who's calling what
        # stats.print_callers()
        return result
    return wrapper

def _profile(filename, fn, *args, **kw):
    load_stats = lambda: pstats.Stats(filename)
    gc.collect()

    began = time.time()
    profiler.runctx('result = fn(*args, **kw)', globals(), locals(),
                    filename=filename)
    ended = time.time()

    return ended - began, load_stats, locals()['result']

To profile a section of code, place it in a function with the decorator:

要分析一段代码，请将其放置在带有装饰器的函数中：

@profile
def go():
    return Session.query(FooClass).filter(FooClass.somevalue==8).all()
myfoos = go()

The output of profiling can be used to give an idea where time is being spent. If for example you see all the time being spent within cursor.execute(), that's the low level DBAPI call to the database, and it means your query should be optimized, either by adding indexes or restructuring the query and/or underlying schema. For that task I would recommend using pgadmin along with its graphical EXPLAIN utility to see what kind of work the query is doing.

分析的输出可用于提供时间花在何处的想法。例如，如果您看到所有时间都花费在内cursor.execute()，则这是对数据库的低级 DBAPI 调用，这意味着您的查询应该通过添加索引或重构查询和/或基础架构来优化。对于该任务，我建议使用 pgadmin 及其图形 EXPLAIN 实用程序来查看查询正在执行的工作类型。

If you see many thousands of calls related to fetching rows, it may mean your query is returning more rows than expected - a cartesian product as a result of an incomplete join can cause this issue. Yet another issue is time spent within type handling - a SQLAlchemy type such as Unicodewill perform string encoding/decoding on bind parameters and result columns, which may not be needed in all cases.

如果您看到数以千计的与获取行相关的调用，则可能意味着您的查询返回的行比预期的多 - 由于不完整的连接而导致的笛卡尔积可能会导致此问题。另一个问题是在类型处理中花费的时间——SQLAlchemy 类型Unicode将在绑定参数和结果列上执行字符串编码/解码，这可能并非在所有情况下都需要。

The output of a profile can be a little daunting but after some practice they are very easy to read. There was once someone on the mailing list claiming slowness, and after having him post the results of profile, I was able to demonstrate that the speed problems were due to network latency - the time spent within cursor.execute() as well as all Python methods was very fast, whereas the majority of time was spent on socket.receive().

配置文件的输出可能有点令人生畏，但经过一些练习后，它们很容易阅读。曾经有人在邮件列表上声称速度慢，在让他发布配置文件的结果后，我能够证明速度问题是由于网络延迟 - 在 cursor.execute() 以及所有 Python 中花费的时间方法非常快，而大部分时间都花在 socket.receive() 上。

If you're feeling ambitious, there's also a more involved example of SQLAlchemy profiling within the SQLAlchemy unit tests, if you poke around http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/aaa_profiling. There, we have tests using decorators that assert a maximum number of method calls being used for particular operations, so that if something inefficient gets checked in, the tests will reveal it (it is important to note that in Python, function calls have the highest overhead of any operation, and the count of calls is more often than not nearly proportional to time spent). Of note are the the "zoomark" tests which use a fancy "SQL capturing" scheme which cuts out the overhead of the DBAPI from the equation - although that technique isn't really necessary for garden-variety profiling.

如果您有雄心壮志，如果您浏览http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/aaa_profiling，在 SQLAlchemy 单元测试中还有一个更复杂的 SQLAlchemy 分析示例。在那里，我们使用装饰器进行了测试，这些装饰器断言用于特定操作的最大数量的方法调用，以便如果检查到一些低效的东西，测试将揭示它（重要的是要注意，在 Python 中，函数调用具有最高的任何操作的开销，并且调用次数通常与花费的时间几乎成正比）。值得注意的是“zoomark”测试，它使用了一种奇特的“SQL 捕获”方案，该方案从等式中减少了 DBAPI 的开销——尽管该技术不是

Answer 2

回答by dbr

There's an extremely useful profiling recipe on the SQLAlchemy wiki

SQLAlchemy wiki上有一个非常有用的分析方法

With a couple of minor modifications,

通过一些小的修改，

from sqlalchemy import event
from sqlalchemy.engine import Engine
import time
import logging

logging.basicConfig()
logger = logging.getLogger("myapp.sqltime")
logger.setLevel(logging.DEBUG)

@event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement, 
                        parameters, context, executemany):
    context._query_start_time = time.time()
    logger.debug("Start Query:\n%s" % statement)
    # Modification for StackOverflow answer:
    # Show parameters, which might be too verbose, depending on usage..
    logger.debug("Parameters:\n%r" % (parameters,))


@event.listens_for(Engine, "after_cursor_execute")
def after_cursor_execute(conn, cursor, statement, 
                        parameters, context, executemany):
    total = time.time() - context._query_start_time
    logger.debug("Query Complete!")

    # Modification for StackOverflow: times in milliseconds
    logger.debug("Total Time: %.02fms" % (total*1000))

if __name__ == '__main__':
    from sqlalchemy import *

    engine = create_engine('sqlite://')

    m1 = MetaData(engine)
    t1 = Table("sometable", m1, 
            Column("id", Integer, primary_key=True),
            Column("data", String(255), nullable=False),
        )

    conn = engine.connect()
    m1.create_all(conn)

    conn.execute(
        t1.insert(), 
        [{"data":"entry %d" % x} for x in xrange(100000)]
    )

    conn.execute(
        t1.select().where(t1.c.data.between("entry 25", "entry 7800")).order_by(desc(t1.c.data))
    )

Output is something like:

输出类似于：

DEBUG:myapp.sqltime:Start Query:
SELECT sometable.id, sometable.data 
FROM sometable 
WHERE sometable.data BETWEEN ? AND ? ORDER BY sometable.data DESC
DEBUG:myapp.sqltime:Parameters:
('entry 25', 'entry 7800')
DEBUG:myapp.sqltime:Query Complete!
DEBUG:myapp.sqltime:Total Time: 410.46ms

Then if you find an oddly slow query, you could take the query string, format in the parameters (can be done the %string-formatting operator, for psycopg2 at least), prefix it with "EXPLAIN ANALYZE" and shove the query plan output into http://explain.depesz.com/(found via this good article on PostgreSQL performance)

然后，如果你发现一个奇怪的慢查询，你可以在参数中获取查询字符串格式（可以使用%字符串格式化操作符，至少对于 psycopg2），用“EXPLAIN ANALYZE”作为前缀并将查询计划输出推入http://explain.depesz.com/（通过这篇关于 PostgreSQL 性能的好文章找到）

Answer 3

回答by David Raznick

I have had some success in using cprofile and looking at the results in runsnakerun. This at least told me what functions and calls where taking a long time and if the database was the issue. The documentation is here. You need wxpython. The presentationon it is good to get you started.
Its as easy as

我在使用 cprofile 和查看 runnakerun 中的结果方面取得了一些成功。这至少告诉我哪些函数和调用需要很长时间，以及数据库是否是问题所在。文档在这里。你需要 wxpython。上面的介绍很适合让您入门。
就这么简单

import cProfile
command = """foo.run()"""
cProfile.runctx( command, globals(), locals(), filename="output.profile" )

Then

然后

python runsnake.py output.profile

python runnake.py output.profile

If you are looking to optimise your queries you will need postgrsql profiling.

如果您想优化您的查询，您将需要postgrsql 分析。

It is also worth putting logging on to record the queries, but there is no parser for this that I know of to get the long running queries (and it wont be useful for concurrent requests).

还值得登录以记录查询，但我知道没有解析器可以用于获取长时间运行的查询（并且它对并发请求没有用）。

sqlhandler = logging.FileHandler("sql.log")
sqllogger = logging.getLogger('sqlalchemy.engine')
sqllogger.setLevel(logging.info)
sqllogger.addHandler(sqlhandler)

and making sure your create engine statement has echo = True.

并确保您的创建引擎语句具有 echo = True。

When I did it is was actually my code that was the main issue, so the cprofile thing helped.

当我这样做时，实际上是我的代码是主要问题，所以 cprofile 的事情有所帮助。

Answer 4

回答by Daniel Waltrip

I've just discovered the library sqltap(https://github.com/inconshreveable/sqltap). It produces nicely styled HTML pages that help with inspecting and profiling SQL queries that are generated by SQLAlchemy.

我刚刚发现了这个库sqltap（https://github.com/inconshreveable/sqltap）。它生成样式精美的 HTML 页面，有助于检查和分析由 SQLAlchemy 生成的 SQL 查询。

An example usage:

示例用法：

profiler = sqltap.start()
run_some_queries()
statistics = profiler.collect()
sqltap.report(statistics, "report.html")

The library hasn't been updated in 2 years, however, it seemed to be working just fine when I tested it with my application earlier today.

该库已经 2 年没有更新了，但是，当我今天早些时候用我的应用程序测试它时，它似乎工作得很好。

python 如何分析 SQLAlchemy 驱动的应用程序？

提问by goxe

回答by zzzeek

回答by dbr

回答by David Raznick

回答by Daniel Waltrip

相关推荐

最近更新

标签

python 如何分析 SQLAlchemy 驱动的应用程序？

提问by goxe

回答by zzzeek

回答by dbr

回答by David Raznick

回答by Daniel Waltrip

相关推荐

python 生成器线程安全吗？

python 增加 int 对象

python sys.intern 有什么作用，什么时候使用？

python 从python列表中获取第一个非空字符串

相关推荐

最近更新

标签