Pandas to_sql 在重复主键上失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30337394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas to_sql fails on duplicate primary key
提问by Ryan Tuck
I'd like to append to an existing table, using pandas df.to_sql()function.
我想使用 pandasdf.to_sql()函数附加到现有表。
I set if_exists='append', but my table has primary keys.
我设置了if_exists='append',但我的表有主键。
I'd like to do the equivalent of insert ignorewhen trying to appendto the existing table, so I would avoid a duplicate entry error.
我想insert ignore在尝试到append现有表时做等效的操作,这样我就可以避免重复输入错误。
Is this possible with pandas, or do I need to write an explicit query?
大Pandas可以做到这一点,还是我需要编写显式查询?
回答by NFern
There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)
不幸的是,没有选项可以指定“插入忽略”。这就是我绕过该限制的方法,将不重复的行插入到该数据库中(数据框名称为 df)
for i in range(len(df)):
try:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
except IntegrityError:
pass #or any other action
回答by user8557323
please note that the "if_exists='append'"related to the existing of the table and what to do in case the tablenot exists.
The if_exists don't related to the content of the table.
see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
请注意"if_exists='append'"与表的现有相关以及如果表不存在该怎么办。if_exists 与表的内容无关。请参阅此处的文档:https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
if_exists : {‘fail', ‘replace', ‘append'}, default ‘fail' fail: If table exists, do nothing. replace: If table exists, drop it, recreate it, and insert data. append: If table exists, insert data. Create if does not exist.
if_exists : {'fail', 'replace', 'append'}, 默认 'fail' 失败:如果表存在,什么都不做。替换:如果表存在,删除它,重新创建它,然后插入数据。append:如果表存在,插入数据。如果不存在则创建。
回答by gies0r
Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.
Pandas 目前没有选择,但这里是Github 问题。如果您也需要此功能,请为它点赞。
回答by Al-Mothafar
In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.
就我而言,我试图在一个空表中插入新数据,但有些行是重复的,这里几乎是同样的问题,我“可能”考虑获取现有数据并与我获得的新数据合并并继续过程,但这不是最佳的,并且可能仅适用于小数据,而不适用于大表。
As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sqlso if any error happens, I know more about my data and make sure I know what is going on:
由于Pandas现在不提供任何处理这种情况的方法,我正在寻找合适的解决方法,所以我自己做了,不确定这是否适合你,但我决定先控制我的数据如果成功与否,我没有等待运气,所以我所做的是在调用之前删除重复项,.to_sql因此如果发生任何错误,我会更多地了解我的数据并确保我知道发生了什么:
import pandas as pd
def write_to_table(table_name, data):
df = pd.DataFrame(data)
# Sort by price, so we remove the duplicates after keeping the lowest only
data.sort(key=lambda row: row['price'])
df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
#
df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')
So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dictfor data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.
所以在我的例子中,我想保持行的最低价格(顺便说一句,我传递了一个dictfor数组data),为此,我首先进行了排序,没有必要,但这是我控制数据的一个例子我想保留。
I hope this will help someone who got almost the same as my situation.
我希望这会帮助那些和我的情况几乎一样的人。
回答by manglano
Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPANand use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.
Pandas 不支持编辑 .to_sql 方法的实际 SQL 语法,因此您可能不走运。有一些实验性的编程解决方法(例如,CALCHIPAN使用 SQLAlchemy将 Dataframe 读取到 SQLAlchemy 对象并使用 SQLAlchemy 进行事务),但通过将 DataFrame 写入 CSV 并使用显式 MySQL 函数加载它可能会更好地服务。
CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/
CALCHIPAN 存储库:https://bitbucket.org/zzzeek/calchipan/
回答by kztd
I had trouble where I was still getting the IntegrityError

我在仍然收到 IntegrityError 的地方遇到了麻烦

...strange but I just took the above and worked it backwards:
...奇怪,但我只是把上面的内容倒过来处理:
for i, row in df.iterrows():
sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
found = pd.read_sql(sql, con=Engine)
if len(found) == 0:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)

