如何将 Pandas Dataframe 写入现有的 Django 模型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41507845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:43:57  来源:igfitidea点击:

How to write a Pandas Dataframe to existing Django model

pythondjangosqlitepandas

提问by Greg Brown

I am trying to insert data in a Pandas DataFrame into an existing Django model, Agency, that uses a SQLite backend. However, following the answers on How to write a Pandas Dataframe to Django modeland Saving a Pandas DataFrame to a Django Modelleads to the whole SQLite table being replaced and breaking the Django code. Specifically, it is the Django auto-generated idprimary key column that is replaced by indexthat causes the errors when rendering templates (no such column: agency.id).

我正在尝试将 Pandas DataFrame 中的数据插入到Agency使用 SQLite 后端的现有 Django 模型中。但是,遵循如何将 Pandas 数据帧写入 Django 模型将 Pandas 数据帧保存到 Django 模型的答案会导致整个 SQLite 表被替换并破坏 Django 代码。具体来说,是 Django 自动生成的id主键列被替换为index导致渲染模板时出错的 ( no such column: agency.id)。

Here is the code and the result of using Pandas to_sql on the SQLite table, agency.

下面是在 SQLite 表上使用 Pandas to_sql 的代码和结果agency

In models.py:

models.py

class Agency(models.Model):
    name = models.CharField(max_length=128)

In myapp/management/commands/populate.py:

myapp/management/commands/populate.py

class Command(BaseCommand):

def handle(self, *args, **options):

    # Open ModelConnection
    from django.conf import settings
    database_name = settings.DATABASES['default']['NAME']
    database_url = 'sqlite:///{}'.format(database_name)
    engine = create_engine(database_url, echo=False)

    # Insert data data
    agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
    agencies.to_sql("agency", con=engine, if_exists="replace")

Calling 'python manage.py populate' successfully adds the three agencies into the table:

调用“ python manage.py populate”成功将三个代理添加到表中:

index    name
0        Agency 1
1        Agency 2
2        Agency 3

However, doing so has changed the DDL of the table from:

但是,这样做已将表的 DDL 更改为:

CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)

to:

到:

CREATE TABLE agency (
  "index" BIGINT, 
  name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")

How can I add the DataFrame to the model managed by Django and keep the Django ORM intact?

如何将 DataFrame 添加到由 Django 管理的模型并保持 Django ORM 完好无损?

采纳答案by Greg Brown

To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:

为了回答我自己的问题,当我现在经常使用 Pandas 将数据导入 Django 时,我犯的错误是尝试使用 Pandas 内置的 Sql Alchemy DB ORM,它正在修改底层数据库表定义。在上面的上下文中,您可以简单地使用 Django ORM 来连接和插入数据:

from myapp.models import Agency

class Command(BaseCommand):

    def handle(self, *args, **options):

        # Process data with Pandas
        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})

        # iterate over DataFrame and create your objects
        for agency in agencies.itertuples():
            agency = Agency.objects.create(name=agency.name)

However, you may often want to import data using an external script rather than using a management command, as above, or using Django's shell. In this case you must first connect to the Django ORM by calling the setupmethod:

但是,您可能经常希望使用外部脚本而不是使用管理命令(如上所述)或使用 Django 的 shell 导入数据。在这种情况下,您必须首先通过调用setup方法连接到 Django ORM :

import os, sys

import django
import pandas as pd

sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"

# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"

# Connect to Django ORM
django.setup()

# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
  • Here I have exported my settings module myproject.settingsto the DJANGO_SETTINGS_MODULEso that django.setup()can pick up the project settings.

  • Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.

  • You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

  • 在这里,我已将我的设置模块导出myproject.settings到 ,DJANGO_SETTINGS_MODULE以便django.setup()可以选择项目设置。

  • 根据您运行脚本的位置,您可能需要指定系统路径的路径,以便 Django 可以找到设置模块。在这种情况下,我在项目根目录下的两个目录中运行我的脚本。

  • 您可以在调用之前修改任何设置setup。如果您的脚本需要以不同于settings. 例如,在本地针对 Django/postgres Docker 容器运行脚本时。

Note, the above example was using the django-environto specify DB settings.

请注意,上面的示例使用django-environ来指定数据库设置。

回答by Thomas

There is a syntax error in the itertuples, it is missing round brackets.

itertuples 中存在语法错误,缺少圆括号。

Should be

应该

for agency in agencies.itertuples():
    agency = Agency.objects.create(name=agency.name)

Thank you for sharing your answer.

感谢您分享您的答案。

Reference to pandas 0.22.0 documentation, Link to pandas.DataFrame.itertuples

参考 pandas 0.22.0 文档,链接到 pandas.DataFrame.itertuples

回答by Jeff Browning

For those looking for a more performant and up-to-date solution, I would suggest using manager.bulk_createand instantiating the django model instances, but not creating them.

对于那些寻求更高性能和最新解决方案的人,我建议使用manager.bulk_create和实例化 django 模型实例,但不要创建它们。

model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)

Note that bulk_createdoes not run signals or custom saves, so if you have custom saving logic or signal hooks for Agencymodel, that will not be triggered. Full list of caveats below.

请注意,bulk_create它不会运行信号或自定义保存,因此如果您有自定义保存逻辑或Agency模型的信号挂钩,则不会触发。以下是注意事项的完整列表。

Documentation: https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create

文档:https: //docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create