database 对数据库驱动的应用程序进行单元测试的最佳策略是什么？

Question

提问by friedo

I work with a lot of web applications that are driven by databases of varying complexity on the backend. Typically, there's an ORMlayer separate from the business and presentation logic. This makes unit-testing the business logic fairly straightforward; things can be implemented in discrete modules and any data needed for the test can be faked through object mocking.

我使用了很多 Web 应用程序，这些应用程序由后端不同复杂性的数据库驱动。通常，有一个与业务和表示逻辑分离的ORM层。这使得对业务逻辑进行单元测试变得相当简单；事物可以在离散模块中实现，测试所需的任何数据都可以通过对象模拟来伪造。

But testing the ORM and database itself has always been fraught with problems and compromises.

但是测试 ORM 和数据库本身总是充满问题和妥协。

Over the years, I have tried a few strategies, none of which completely satisfied me.

多年来，我尝试了一些策略，但没有一个完全让我满意。

Load a test database with known data. Run tests against the ORM and confirm that the right data comes back. The disadvantage here is that your test DB has to keep up with any schema changes in the application database, and might get out of sync. It also relies on artificial data, and may not expose bugs that occur due to stupid user input. Finally, if the test database is small, it won't reveal inefficiencies like a missing index. (OK, that last one isn't really what unit testing should be used for, but it doesn't hurt.)
Load a copy of the production database and test against that. The problem here is that you may have no idea what's in the production DB at any given time; your tests may need to be rewritten if data changes over time.

加载具有已知数据的测试数据库。针对 ORM 运行测试并确认返回正确的数据。这里的缺点是您的测试数据库必须跟上应用程序数据库中的任何架构更改，并且可能会不同步。它还依赖于人工数据，并且可能不会暴露由于愚蠢的用户输入而发生的错误。最后，如果测试数据库很小，它不会像缺少索引那样显示效率低下。（好吧，最后一个并不是真正应该使用单元测试的，但它并没有什么坏处。）
加载生产数据库的副本并对其进行测试。这里的问题是您可能不知道在任何给定时间生产数据库中有什么；如果数据随时间发生变化，您的测试可能需要重写。

Some people have pointed out that both of these strategies rely on specific data, and a unit test should test only functionality. To that end, I've seen suggested:

有人指出，这两种策略都依赖于特定的数据，而单元测试应该只测试功能。为此，我看到了建议：

Use a mock database server, and check only that the ORM is sending the correct queries in response to a given method call.

使用模拟数据库服务器，并仅检查 ORM 是否正在发送正确的查询以响应给定的方法调用。

What strategies have you used for testing database-driven applications, if any? What has worked the best for you?

您使用了哪些策略来测试数据库驱动的应用程序（如果有）？什么对你最有效？

Answer 1

采纳答案by Mark Roddy

I've actually used your first approach with quite some success, but in a slightly different ways that I think would solve some of your problems:

我实际上已经使用了你的第一种方法并取得了相当大的成功，但我认为会以稍微不同的方式解决你的一些问题：

Keep the entire schema and scripts for creating it in source control so that anyone can create the current database schema after a check out. In addition, keep sample data in data files that get loaded by part of the build process. As you discover data that causes errors, add it to your sample data to check that errors don't re-emerge.
Use a continuous integration server to build the database schema, load the sample data, and run tests. This is how we keep our test database in sync (rebuilding it at every test run). Though this requires that the CI server have access and ownership of its own dedicated database instance, I say that having our db schema built 3 times a day has dramatically helped find errors that probably would not have been found till just before delivery (if not later). I can't say that I rebuild the schema before every commit. Does anybody? With this approach you won't have to (well maybe we should, but its not a big deal if someone forgets).
For my group, user input is done at the application level (not db) so this is tested via standard unit tests.

在源代码管理中保留用于创建它的整个架构和脚本，以便任何人在签出后都可以创建当前的数据库架构。此外，将示例数据保存在由构建过程的一部分加载的数据文件中。当您发现导致错误的数据时，将其添加到您的示例数据中以检查错误不会再次出现。
使用持续集成服务器构建数据库架构、加载示例数据并运行测试。这就是我们如何保持我们的测试数据库同步（在每次测试运行时重建它）。虽然这要求 CI 服务器有权访问和拥有自己的专用数据库实例，但我说每天构建 3 次 db 模式极大地帮助发现了可能直到交付前（如果不是更晚）才会发现的错误）。我不能说我在每次提交之前重建架构。有人吗？使用这种方法，您将不必（也许我们应该这样做，但如果有人忘记了，这没什么大不了的）。
对于我的小组，用户输入是在应用程序级别（而不是数据库）完成的，因此这是通过标准单元测试进行测试的。

Loading Production Database Copy:
This was the approach that was used at my last job. It was a huge pain cause of a couple of issues:

加载生产数据库副本：
这是我上一份工作中使用的方法。这是几个问题的巨大痛苦原因：

The copy would get out of date from the production version
Changes would be made to the copy's schema and wouldn't get propagated to the production systems. At this point we'd have diverging schemas. Not fun.

该副本将从生产版本中过时
将对副本的架构进行更改，并且不会传播到生产系统。在这一点上，我们会有不同的模式。不好玩。

Mocking Database Server:
We also do this at my current job. After every commit we execute unit tests against the application code that have mock db accessors injected. Then three times a day we execute the full db build described above. I definitely recommend both approaches.

模拟数据库服务器：
我们在我目前的工作中也这样做。每次提交后，我们对注入了模拟数据库访问器的应用程序代码执行单元测试。然后我们每天执行 3 次上述的完整数据库构建。我绝对推荐这两种方法。

Answer 2

回答by Aaron Digulla

I'm always running tests against an in-memory DB (HSQLDB or Derby) for these reasons:

由于以下原因，我总是针对内存数据库（HSQLDB 或 Derby）运行测试：

It makes you think which data to keep in your test DB and why. Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why and if something breaks, it wasn't me!!" ;)
It makes sure the database can be recreated with little effort in a new place (for example when we need to replicate a bug from production)
It helps enormously with the quality of the DDL files.

它让您思考将哪些数据保留在测试数据库中以及为什么。只需将您的生产数据库拖入测试系统即可转化为“我不知道我在做什么或为什么，如果出现问题，那不是我！” ;)
它确保可以在新位置轻松地重新创建数据库（例如，当我们需要从生产中复制错误时）
它对 DDL 文件的质量有很大帮助。

The in-memory DB is loaded with fresh data once the tests start and after most tests, I invoke ROLLBACK to keep it stable. ALWAYSkeep the data in the test DB stable! If the data changes all the time, you can't test.

一旦测试开始，内存数据库就会加载新数据，并且在大多数测试之后，我调用 ROLLBACK 以保持其稳定。始终保持测试数据库中的数据稳定！如果数据一直在变化，你就无法测试。

The data is loaded from SQL, a template DB or a dump/backup. I prefer dumps if they are in a readable format because I can put them in VCS. If that doesn't work, I use a CSV file or XML. If I have to load enormous amounts of data ... I don't. You never have to load enormous amounts of data :) Not for unit tests. Performance tests are another issue and different rules apply.

数据从 SQL、模板数据库或转储/备份加载。如果转储是可读格式，我更喜欢转储，因为我可以将它们放在 VCS 中。如果这不起作用，我会使用 CSV 文件或 XML。如果我必须加载大量数据......我不会。您永远不必加载大量数据 :) 不适用于单元测试。性能测试是另一个问题，适用不同的规则。

Answer 3

回答by kolrie

I have been asking this question for a long time, but I think there is no silver bullet for that.

我问这个问题很久了，但我认为没有灵丹妙药。

What I currently do is mocking the DAO objects and keeping a in memory representation of a good collection of objects that represent interesting cases of data that could live on the database.

我目前所做的是模拟 DAO 对象并在内存中保留一个良好对象集合的表示，这些对象表示可以存在于数据库中的有趣数据案例。

The main problem I see with that approach is that you're covering only the code that interacts with your DAO layer, but never testing the DAO itself, and in my experience I see that a lot of errors happen on that layer as well. I also keep a few unit tests that run against the database (for the sake of using TDD or quick testing locally), but those tests are never run on my continuous integration server, since we don't keep a database for that purpose and I think tests that run on CI server should be self-contained.

我发现这种方法的主要问题是您只覆盖了与 DAO 层交互的代码，但从未测试 DAO 本身，而且根据我的经验，我发现该层上也发生了很多错误。我还保留了一些针对数据库运行的单元测试（为了使用 TDD 或在本地进行快速测试），但这些测试从未在我的持续集成服务器上运行，因为我们没有为此目的保留数据库，我认为在 CI 服务器上运行的测试应该是独立的。

Another approach I find very interesting, but not always worth since is a little time consuming, is to create the same schema you use for production on an embedded database that just runs within the unit testing.

我发现另一种非常有趣但并不总是值得的方法，因为它有点耗时，是在仅在单元测试中运行的嵌入式数据库上创建用于生产的相同模式。

Even though there's no question this approach improves your coverage, there are a few drawbacks, since you have to be as close as possible to ANSI SQL to make it work both with your current DBMS and the embedded replacement.

尽管毫无疑问，这种方法可以提高您的覆盖率，但也存在一些缺点，因为您必须尽可能接近 ANSI SQL，才能使其同时与您当前的 DBMS 和嵌入式替代品一起使用。

No matter what you think is more relevant for your code, there are a few projects out there that may make it easier, like DbUnit.

无论您认为什么与您的代码更相关，都有一些项目可以使它更容易，例如DbUnit。

Answer 4

回答by Lukas Eder

Even if there are tools that allow you to mock your database in one way or another (e.g. jOOQ's MockConnection, which can be seen in this answer- disclaimer, I work for jOOQ's vendor), I would advise notto mock larger databases with complex queries.

即使有工具允许您以一种或另一种方式模拟数据库（例如jOOQ的MockConnection，可以在此答案中看到- 免责声明，我为 jOOQ 的供应商工作），我建议不要模拟具有复杂查询。

Even if you just want to integration-test your ORM, beware that an ORM issues a very complex series of queries to your database, that may vary in

即使您只想对您的 ORM 进行集成测试，也要注意 ORM 向您的数据库发出一系列非常复杂的查询，这些查询可能会有所不同

syntax
complexity
order (!)

句法
复杂
命令（！）

Mocking all that to produce sensible dummy data is quite hard, unless you're actually building a little database inside your mock, which interprets the transmitted SQL statements. Having said so, use a well-known integration-test database that you can easily reset with well-known data, against which you can run your integration tests.

模拟所有这些以生成合理的虚拟数据非常困难，除非您实际上是在模拟中构建一个小数据库，它解释传输的 SQL 语句。话虽如此，请使用众所周知的集成测试数据库，您可以使用众所周知的数据轻松重置该数据库，您可以针对这些数据运行集成测试。

Answer 5

回答by Dave Sherohman

I use the first (running the code against a test database). The only substantive issue I see you raising with this approach is the possibilty of schemas getting out of sync, which I deal with by keeping a version number in my database and making all schema changes via a script which applies the changes for each version increment.

我使用第一个（针对测试数据库运行代码）。我看到你用这种方法提出的唯一实质性问题是模式不同步的可能性，我通过在我的数据库中保留一个版本号并通过一个脚本进行所有模式更改来处理这个问题，该脚本应用每个版本增量的更改。

I also make all changes (including to the database schema) against my test environment first, so it ends up being the other way around: After all tests pass, apply the schema updates to the production host. I also keep a separate pair of testing vs. application databases on my development system so that I can verify there that the db upgrade works properly before touching the real production box(es).

我还首先针对我的测试环境进行了所有更改（包括对数据库架构），因此结果相反：在所有测试通过后，将架构更新应用到生产主机。我还在我的开发系统上保留了一对单独的测试数据库和应用程序数据库，以便我可以在接触真正的生产设备之前验证数据库升级是否正常工作。

Answer 6

回答by Roman Konoval

I'm using the first approach but a bit different that allows to address the problems you mentioned.

我正在使用第一种方法，但有点不同，可以解决您提到的问题。

Everything that is needed to run tests for DAOs is in source control. It includes schema and scripts to create the DB (docker is very good for this). If the embedded DB can be used - I use it for speed.

运行 DAO 测试所需的一切都在源代码控制中。它包括用于创建数据库的模式和脚本（docker 对此非常有用）。如果可以使用嵌入式数据库 - 我使用它来提高速度。

The important difference with the other described approaches is that the data that is required for test is not loaded from SQL scripts or XML files. Everything (except some dictionary data that is effectively constant) is created by application using utility functions/classes.

与其他描述的方法的重要区别在于，测试所需的数据不是从 SQL 脚本或 XML 文件加载的。一切（除了一些有效常量的字典数据）都是由应用程序使用实用函数/类创建的。

The main purpose is to make data used by test

主要目的是让测试使用的数据

very close to the test
explicit (using SQL files for data make it very problematic to see what piece of data is used by what test)
isolate tests from the unrelated changes.

非常接近测试
显式（对数据使用 SQL 文件使得查看什么测试使用什么数据变得非常有问题）
将测试与无关的更改隔离开来。

It basically means that these utilities allow to declaratively specify only things essential for the test in test itself and omit irrelevant things.

这基本上意味着这些实用程序允许声明性地指定测试本身中对测试必不可少的东西，而忽略不相关的东西。

To give some idea of what it means in practice, consider the test for some DAO which works with Comments to Posts written by Authors. In order to test CRUD operations for such DAO some data should be created in the DB. The test would look like:

要了解它在实践中的含义，请考虑对某些 DAO 进行测试，该测试适用Comment于Post由Authors. 为了测试此类 DAO 的 CRUD 操作，应在 DB 中创建一些数据。测试看起来像：

@Test
public void savedCommentCanBeRead() {
    // Builder is needed to declaratively specify the entity with all attributes relevant
    // for this specific test
    // Missing attributes are generated with reasonable values
    // factory's responsibility is to create entity (and all entities required by it
    //  in our example Author) in the DB
    Post post = factory.create(PostBuilder.post());

    Comment comment = CommentBuilder.comment().forPost(post).build();

    sut.save(comment);

    Comment savedComment = sut.get(comment.getId());

    // this checks fields that are directly stored
    assertThat(saveComment, fieldwiseEqualTo(comment));
    // if there are some fields that are generated during save check them separately
    assertThat(saveComment.getGeneratedField(), equalTo(expectedValue));        
}

This has several advantages over SQL scripts or XML files with test data:

与带有测试数据的 SQL 脚本或 XML 文件相比，这有几个优点：

Maintaining the code is much easier (adding a mandatory column for example in some entity that is referenced in many tests, like Author, does not require to change lots of files/records but only a change in builder and/or factory)
The data required by specific test is described in the test itself and not in some other file. This proximity is very important for test comprehensibility.

维护代码要容易得多（例如，在许多测试中引用的某些实体中添加强制列，例如 Author，不需要更改大量文件/记录，而只需更改构建器和/或工厂）
特定测试所需的数据在测试本身中描述，而不是在其他文件中。这种接近性对于测试的可理解性非常重要。

Rollback vs Commit

回滚与提交

I find it more convenient that tests do commit when they are executed. Firstly, some effects (for example DEFERRED CONSTRAINTS) cannot be checked if commit never happens. Secondly, when a test fails the data can be examined in the DB as it is not reverted by the rollback.

我发现测试在执行时提交更方便。首先，DEFERRED CONSTRAINTS如果提交从未发生，则无法检查某些效果（例如）。其次，当测试失败时，可以在数据库中检查数据，因为它不会被回滚恢复。

Of cause this has a downside that test may produce a broken data and this will lead to the failures in other tests. To deal with this I try to isolate the tests. In the example above every test may create new Authorand all other entities are created related to it so collisions are rare. To deal with the remaining invariants that can be potentially broken but cannot be expressed as a DB level constraint I use some programmatic checks for erroneous conditions that may be run after every single test (and they are run in CI but usually switched off locally for performance reasons).

这当然有一个缺点，即测试可能会产生损坏的数据，这将导致其他测试失败。为了解决这个问题，我尝试隔离测试。在上面的示例中，每个测试都可能会创建新的，Author并且创建的所有其他实体都与之相关，因此很少发生冲突。为了处理可能被破坏但不能表示为 DB 级别约束的剩余不变量，我使用一些程序检查来检查可能在每次测试后运行的错误条件（它们在 CI 中运行，但通常在本地关闭以提高性能）原因）。

Answer 7

回答by cchantep

For JDBC based project (directly or indirectly, e.g. JPA, EJB, ...) you can mockup not the entire database (in such case it would be better to use a test db on a real RDBMS), but only mockup at JDBC level.

对于基于 JDBC 的项目（直接或间接，例如 JPA、EJB 等），您不能模拟整个数据库（在这种情况下，最好在真正的 RDBMS 上使用测试数据库），而只能模拟 JDBC 级别.

Advantage is abstraction which comes with that way, as JDBC data (result set, update count, warning, ...) are the same whatever is the backend: your prod db, a test db, or just some mockup data provided for each test case.

优点是这种方式带来的抽象，因为 JDBC 数据（结果集、更新计数、警告等）无论后端是什么都是相同的：您的生产数据库、测试数据库或只是为每个测试提供的一些模型数据案件。

With JDBC connection mocked up for each case there is no need to manage test db (cleanup, only one test at time, reload fixtures, ...). Every mockup connection is isolated and there is no need to clean up. Only minimal required fixtures are provided in each test case to mock up JDBC exchange, which help to avoid complexity of managing a whole test db.

通过为每种情况模拟 JDBC 连接，无需管理测试数据库（清理，一次只有一个测试，重新加载装置，...）。每个模型连接都是隔离的，无需清理。在每个测试用例中只提供最少的固定装置来模拟 JDBC 交换，这有助于避免管理整个测试数据库的复杂性。

Acolyte is my framework which includes a JDBC driver and utility for this kind of mockup: http://acolyte.eu.org.

Acolyte 是我的框架，其中包括用于此类模型的 JDBC 驱动程序和实用程序：http: //acolyte.eu.org。

database 对数据库驱动的应用程序进行单元测试的最佳策略是什么？

提问by friedo

采纳答案by Mark Roddy

回答by Aaron Digulla

回答by kolrie

回答by Lukas Eder

回答by Dave Sherohman

回答by Roman Konoval

Rollback vs Commit

回滚与提交

回答by cchantep

相关推荐

最近更新

标签

database 对数据库驱动的应用程序进行单元测试的最佳策略是什么？

提问by friedo

采纳答案by Mark Roddy

回答by Aaron Digulla

回答by kolrie

回答by Lukas Eder

回答by Dave Sherohman

回答by Roman Konoval

Rollback vs Commit

回滚与提交

回答by cchantep

相关推荐

database 在数据库设计中真的需要外键吗？

database 数据库字段的标准长度列表

database 数据库：什么是多版本并发控制 (MVCC)，谁支持它？

database MS Access 的良好免费替代品

相关推荐

最近更新

标签