Python 在烧瓶迁移或蒸馏迁移中创建种子数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19334604/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating seed data in a flask-migrate or alembic migration
提问by Mark Richman
How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice?
如何在第一次迁移中插入一些种子数据?如果迁移不是最好的地方,那么最佳实践是什么?
"""empty message
Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069
"""
# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None
from alembic import op
import sqlalchemy as sa
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.create_table('list_type',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('name')
)
op.create_table('job',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('list_type_id', sa.Integer(), nullable=False),
sa.Column('record_count', sa.Integer(), nullable=False),
sa.Column('status', sa.Integer(), nullable=False),
sa.Column('sf_job_id', sa.Integer(), nullable=False),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
sa.PrimaryKeyConstraint('id')
)
### end Alembic commands ###
# ==> INSERT SEED DATA HERE <==
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table('job')
op.drop_table('list_type')
### end Alembic commands ###
采纳答案by Mark Hildreth
Alembic has, as one of its operation, bulk_insert()
. The documentation gives the following example (with some fixes I've included):
作为其操作之一,Alembic 具有bulk_insert()
. 文档给出了以下示例(我已经包含了一些修复程序):
from datetime import date
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Date
from alembic import op
# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
column('id', Integer),
column('name', String),
column('create_date', Date)
)
op.bulk_insert(accounts_table,
[
{'id':1, 'name':'John Smith',
'create_date':date(2010, 10, 5)},
{'id':2, 'name':'Ed Williams',
'create_date':date(2007, 5, 27)},
{'id':3, 'name':'Wendy Jones',
'create_date':date(2008, 8, 15)},
]
)
Note too that the alembic has an execute()
operation, which is just like the normal execute()
function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:
还要注意,alembic 有一个execute()
操作,就像execute()
SQLAlchemy 中的普通函数一样:您可以运行任何您希望的 SQL,如文档示例所示:
from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op
account = table('account',
column('name', String)
)
op.execute(
account.update().\
where(account.c.name==op.inline_literal('account 1')).\
values({'name':op.inline_literal('account 2')})
)
Notice that the table that is being used to create the metadata that is used in the update
statement is defined directly in the schema. This might seem like it breaks DRY(isn't the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.
请注意,用于创建update
语句中使用的元数据的表是直接在架构中定义的。这似乎打破了DRY(该表不是已经在您的应用程序中定义的),但实际上是非常必要的。如果您尝试使用作为应用程序一部分的表或模型定义,则在对应用程序中的表/模型进行更改时会中断此迁移。您的迁移脚本应该一成不变:对模型未来版本的更改不应更改迁移脚本。使用应用程序模型意味着定义将根据您检出的模型版本(很可能是最新的)而改变。因此,您需要在迁移脚本中独立包含表定义。
Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you're loading is test data, then that's one thing. But I've understood "seed data" to mean data that is required for the application to work correctly. For example, if you need to set up records for "admin" and "user" in the "roles" table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the "roles" table.
要讨论的另一件事是您是否应该将种子数据放入作为其自己的命令运行的脚本中(例如使用 Flask-Script 命令,如另一个答案所示)。这可以使用,但你应该小心。如果您加载的数据是测试数据,那是一回事。但我已经将“种子数据”理解为应用程序正常工作所需的数据。例如,如果您需要在“roles”表中为“admin”和“user”设置记录。此数据应该作为迁移的一部分插入。请记住,脚本仅适用于最新版本的数据库,而迁移适用于您要迁移到或从中迁移的特定版本。如果你想要一个脚本来加载角色信息,
Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic's default way of running to run these scripts. And that's still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.
此外,通过依赖脚本,您将更难在迁移之间运行脚本(例如迁移 3->4 要求初始迁移中的种子数据位于数据库中)。您现在需要修改 Alembic 的默认运行方式来运行这些脚本。而且这仍然没有忽略这些脚本必须随着时间的推移而改变的问题,谁知道您从源代码管理中检出的应用程序的哪个版本。
回答by Miguel
Migrations should be limited to schema changes only, and not only that, it is important that when a migration up or down is applied that data that existed in the database from before is preserved as much as possible. Inserting seed data as part of a migration may mess up pre-existing data.
迁移应仅限于模式更改,不仅如此,重要的是在应用向上或向下迁移时,尽可能多地保留数据库中以前存在的数据。在迁移过程中插入种子数据可能会弄乱预先存在的数据。
As most things with Flask, you can implement this in many ways. Adding a new command to Flask-Script is a good way to do this, in my opinion. For example:
与 Flask 的大多数事情一样,您可以通过多种方式实现这一点。在我看来,向 Flask-Script 添加一个新命令是一个很好的方法。例如:
@manager.command
def seed():
"Add seed data to the database."
db.session.add(...)
db.session.commit()
So then you run:
那么你运行:
python manager.py seed
回答by melchtheitroad55
MarkHildreth has supplied an excellent explanation of how alembic can handle this. However, the OP was specifically about how to modify a flask-migration migration script. I'm going to post an answer to that below to save people the time of having to look into alembic at all.
MarkHildreth 对 alembic 如何处理这个问题提供了一个很好的解释。但是,OP 专门针对如何修改烧瓶迁移迁移脚本。我将在下面发布一个答案,以节省人们根本不必研究 alembic 的时间。
WarningMiguel's answer is accurate with respect to normal database information. That is to say, one should follow his advice and absolutely not use this approach to populate a database with "normal" rows. This approach is specifically for database rows which are required for the application to function, a kind of data which I think of as "seed" data.
警告Miguel 的回答对于正常的数据库信息是准确的。也就是说,应该听从他的建议,绝对不要使用这种方法用“正常”行填充数据库。这种方法专门用于应用程序运行所需的数据库行,一种我认为是“种子”数据的数据。
OP's script modified to seed data:
OP 的脚本修改为种子数据:
"""empty message
Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069
"""
# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None
from alembic import op
import sqlalchemy as sa
def upgrade():
### commands auto generated by Alembic - please adjust! ###
list_type_table = op.create_table('list_type',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('name', sa.String(length=80), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('name')
)
op.create_table('job',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('list_type_id', sa.Integer(), nullable=False),
sa.Column('record_count', sa.Integer(), nullable=False),
sa.Column('status', sa.Integer(), nullable=False),
sa.Column('sf_job_id', sa.Integer(), nullable=False),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
sa.PrimaryKeyConstraint('id')
)
### end Alembic commands ###
op.bulk_insert(
list_type_table,
[
{'name':'best list'},
{'name': 'bester list'}
]
)
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table('job')
op.drop_table('list_type')
### end Alembic commands ###
Context for those new to flask_migrate
那些不熟悉flask_migrate 的人的上下文
Flask migrate generates migration scripts at migrations/versions
. These scripts are run in order on a database in order to bring it up to the latest version. The OP includes an example of one of these auto-generated migration scripts. In order to add seed data, one must manually modify the appropriate auto-generated migration file. The code I have posted above is an example of that.
Flask migrate 在migrations/versions
. 这些脚本在数据库上按顺序运行,以便将其更新到最新版本。OP 包括这些自动生成的迁移脚本之一的示例。为了添加种子数据,必须手动修改适当的自动生成的迁移文件。我上面发布的代码就是一个例子。
What changed?
发生了什么变化?
Very little. You will note that in the new file I am storing the table returned from create_table
for list_type
in a variable called list_type_table
. We then operate on that table using op.bulk_insert
to create a few example rows.
很少。您会注意到,在新文件中,我将从create_table
for返回的表存储在list_type
名为list_type_table
. 然后我们对该表进行操作op.bulk_insert
以创建一些示例行。
回答by Braden Holt
You can also use Python's faker library which may be a bit quicker as you don't need to come up with any data yourself. One way of configuring it would be to put a method in a class that you wanted to generate data for as shown below.
您还可以使用 Python 的 faker 库,它可能会更快一些,因为您不需要自己提供任何数据。配置它的一种方法是将一个方法放在您想要为其生成数据的类中,如下所示。
from extensions import bcrypt, db
class User(db.Model):
# this config is used by sqlalchemy to store model data in the database
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(150))
email = db.Column(db.String(100), unique=True)
password = db.Column(db.String(100))
def __init__(self, name, email, password, fav_movie):
self.name = name
self.email = email
self.password = password
@classmethod
def seed(cls, fake):
user = User(
name = fake.name(),
email = fake.email(),
password = cls.encrypt_password(fake.password()),
)
user.save()
@staticmethod
def encrypt_password(password):
return bcrypt.generate_password_hash(password).decode('utf-8')
def save(self):
db.session.add(self)
db.session.commit()
And then implement a method that calls the seed method which could look something like this:
然后实现一个调用种子方法的方法,它看起来像这样:
from faker import Faker
from users.models import User
fake = Faker()
for _ in range(100):
User.seed(fake)