Pandas to_sql 在重复主键上失败

Question

提问by Ryan Tuck

I'd like to append to an existing table, using pandas df.to_sql()function.

我想使用 pandasdf.to_sql()函数附加到现有表。

I set if_exists='append', but my table has primary keys.

我设置了if_exists='append'，但我的表有主键。

I'd like to do the equivalent of insert ignorewhen trying to appendto the existing table, so I would avoid a duplicate entry error.

我想insert ignore在尝试到append现有表时做等效的操作，这样我就可以避免重复输入错误。

Is this possible with pandas, or do I need to write an explicit query?

大Pandas可以做到这一点，还是我需要编写显式查询？

Answer 1

回答by NFern

There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)

不幸的是，没有选项可以指定“插入忽略”。这就是我绕过该限制的方法，将不重复的行插入到该数据库中（数据框名称为 df）

for i in range(len(df)):
    try:
        df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
    except IntegrityError:
        pass #or any other action

Answer 2

回答by user8557323

please note that the "if_exists='append'"related to the existing of the table and what to do in case the tablenot exists. The if_exists don't related to the content of the table. see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

请注意"if_exists='append'"与表的现有相关以及如果表不存在该怎么办。if_exists 与表的内容无关。请参阅此处的文档：https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

if_exists : {‘fail', ‘replace', ‘append'}, default ‘fail' fail: If table exists, do nothing. replace: If table exists, drop it, recreate it, and insert data. append: If table exists, insert data. Create if does not exist.

if_exists : {'fail', 'replace', 'append'}, 默认 'fail' 失败：如果表存在，什么都不做。替换：如果表存在，删除它，重新创建它，然后插入数据。append：如果表存在，插入数据。如果不存在则创建。

Answer 3

回答by gies0r

Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.

Pandas 目前没有选择，但这里是Github 问题。如果您也需要此功能，请为它点赞。

Answer 4

回答by Al-Mothafar

In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.

就我而言，我试图在一个空表中插入新数据，但有些行是重复的，这里几乎是同样的问题，我“可能”考虑获取现有数据并与我获得的新数据合并并继续过程，但这不是最佳的，并且可能仅适用于小数据，而不适用于大表。

As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sqlso if any error happens, I know more about my data and make sure I know what is going on:

由于Pandas现在不提供任何处理这种情况的方法，我正在寻找合适的解决方法，所以我自己做了，不确定这是否适合你，但我决定先控制我的数据如果成功与否，我没有等待运气，所以我所做的是在调用之前删除重复项，.to_sql因此如果发生任何错误，我会更多地了解我的数据并确保我知道发生了什么：

import pandas as pd


def write_to_table(table_name, data):
    df = pd.DataFrame(data)
    # Sort by price, so we remove the duplicates after keeping the lowest only
    data.sort(key=lambda row: row['price'])
    df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
    #
    df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')

So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dictfor data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.

所以在我的例子中，我想保持行的最低价格（顺便说一句，我传递了一个dictfor数组data），为此，我首先进行了排序，没有必要，但这是我控制数据的一个例子我想保留。

I hope this will help someone who got almost the same as my situation.

我希望这会帮助那些和我的情况几乎一样的人。

Answer 5

回答by manglano

Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPANand use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.

Pandas 不支持编辑 .to_sql 方法的实际 SQL 语法，因此您可能不走运。有一些实验性的编程解决方法（例如，CALCHIPAN使用 SQLAlchemy将 Dataframe 读取到 SQLAlchemy 对象并使用 SQLAlchemy 进行事务），但通过将 DataFrame 写入 CSV 并使用显式 MySQL 函数加载它可能会更好地服务。

CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/

CALCHIPAN 存储库：https://bitbucket.org/zzzeek/calchipan/

Answer 6

回答by kztd

I had trouble where I was still getting the IntegrityError

我在仍然收到 IntegrityError 的地方遇到了麻烦

...strange but I just took the above and worked it backwards:

...奇怪，但我只是把上面的内容倒过来处理：

for i, row in df.iterrows():
    sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
    found = pd.read_sql(sql, con=Engine)
    if len(found) == 0:
        df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)

Pandas to_sql 在重复主键上失败

提问by Ryan Tuck

回答by NFern

回答by user8557323

回答by gies0r

回答by Al-Mothafar

回答by manglano

回答by kztd

相关推荐

最近更新

标签

Pandas to_sql 在重复主键上失败

提问by Ryan Tuck

回答by NFern

回答by user8557323

回答by gies0r

回答by Al-Mothafar

回答by manglano

回答by kztd

相关推荐

pandas 在现有数据框中添加多行

pandas 读取 ASCII 格式的表格

Pandas：无法写入excel文件

pandas 用无限上/下限切割的熊猫

相关推荐

最近更新

标签