如何将 Pandas Dataframe 写入 Django 模型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34425607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:25:29  来源:igfitidea点击:

How to write a Pandas Dataframe to Django model

pythondjangopostgresqlpandasdataframe

提问by Avagut

I have been using pandas in python and I usually write a dataframe to my db table as below. I am now now migrating to Django, how can I write the same dataframe to a table through a model called MyModel? Assistance really appreciated.

我一直在 python 中使用 Pandas,我通常将数据帧写入我的数据库表,如下所示。我现在正在迁移到 Django,如何通过名为 MyModel 的模型将相同的数据帧写入表中?帮助真的很感激。

# Original pandas code
    engine = create_engine('postgresql://myuser:mypassword@localhost:5432/mydb', echo=False)
    mydataframe.to_sql('mytable', engine,if_exists='append',index=True)

回答by Jon Hannah

I'm just going through the same exercise at the moment. The approach I've taken is to create a list of new objects from the DataFrame and then bulk createthem:

我现在正在做同样的练习。我采用的方法是从 DataFrame 创建一个新对象列表,然后批量创建它们:

bulk_create(objs, batch_size=None)

This method inserts the provided list of objects into the database in an efficient manner (generally only 1 query, no matter how many objects there are)

批量创建(对象,batch_size=无)

该方法以高效的方式将提供的对象列表插入到数据库中(一般只有 1 个查询,无论有多少个对象)

An example might look like this:

一个示例可能如下所示:

# Not able to iterate directly over the DataFrame
df_records = df.to_dict('records')

model_instances = [MyModel(
    field_1=record['field_1'],
    field_2=record['field_2'],
) for record in df_records]

MyModel.objects.bulk_create(model_instances)

回答by bakkal

Use your own pandas code along side a Django model that is mapped to the same SQL table

将您自己的 Pandas 代码与映射到同一 SQL 表的 Django 模型一起使用

I am not aware of any explicit support to write a pandas dataframe to a Django model. However, in a Django app, you can still use your own code to read or write to the database, in addition to using the ORM (e.g. through your Django model)

我不知道有任何明确支持将 Pandas 数据帧写入 Django 模型。但是,在 Django 应用程序中,除了使用 ORM(例如,通过 Django 模型)之外,您仍然可以使用自己的代码来读取或写入数据库

And given that you most likely have data in the database previously written by pandas' to_sql, you can keep using the same database and the same pandas code and simply create a Django model that can access that table

鉴于您很可能在先前由 pandas' 编写的数据库中拥有数据to_sql,您可以继续使用相同的数据库和相同的 Pandas 代码,只需创建一个可以访问该表Django 模型

e.g. if your pandas code was writing to SQL table mytable, simply create a model like this:

例如,如果您的 Pandas 代码正在写入 SQL 表mytable,只需创建一个像这样的模型:

class MyModel(Model):
    class Meta:
        db_table = 'mytable' # This tells Django where the SQL table is
        managed = False # Use this if table already exists
                        # and doesn't need to be managed by Django

    field_1 = ...
    field_2 = ...

Now you can use this model from Django simultaneously with your existing pandas code (possibly in a single Django app)

现在,您可以将 Django 中的这个模型与现有的 Pandas 代码同时使用(可能在单个 Django 应用程序中)

Django database settings

Django 数据库设置

To get the same DB credentials into the pandas SQL functions simply read the fields from Django settings, e.g.:

要在 Pandas SQL 函数中获取相同的数据库凭据,只需从 Django 设置中读取字段,例如:

from django.conf import settings

user = settings.DATABASES['default']['USER']
password = settings.DATABASES['default']['PASSWORD']
database_name = settings.DATABASES['default']['NAME']
# host = settings.DATABASES['default']['HOST']
# port = settings.DATABASES['default']['PORT']

database_url = 'postgresql://{user}:{password}@localhost:5432/{database_name}'.format(
    user=user,
    password=password,
    database_name=database_name,
)

engine = create_engine(database_url, echo=False)

The alternative is not recommended as it's inefficient

不推荐替代方案,因为它效率低下

I don't really see a way beside reading the dataframe row by row and then creating a model instance, and saving it, which is really slow. You might get away with some batch insert operation, but why bother since pandas' to_sqlalready does that for us. And reading Django querysets into a pandas dataframe is just inefficient when pandas can do that faster for us too.

除了逐行读取数据帧然后创建模型实例并保存它之外,我真的没有看到其他方法,这真的很慢。您可能会逃脱一些批量插入操作,但是为什么要麻烦,因为大Pandasto_sql已经为我们做了这件事。将 Django 查询集读入 Pandas 数据帧是低效的,因为 Pandas 也可以为我们更快地做到这一点。

# Doing it like this is slow
for index, row in df.iterrows():
     model = MyModel()
     model.field_1 = row['field_1']
     model.save()