pandas 使用pandas插入或更新mysql中是否存在

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40223927/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:16:46  来源:igfitidea点击:

Insert or update if exists in mysql using pandas

mysqlpython-3.xpandas

提问by r0xette

I am trying to insert data from xlsx file into mysqdl table. I want to insert data in table and if there is a duplicate on primary keys, I want to update the existing data otherwise insert. I have written the script already but I realized it is too much work and using pandas it is quick. How can I achieve it in pandas?

我正在尝试将数据从 xlsx 文件插入到 mysqdl 表中。我想在表中插入数据,如果主键有重复,我想更新现有数据,否则插入。我已经编写了脚本,但我意识到它的工作量太大,使用 Pandas 很快。我怎样才能在Pandas中实现它?

#!/usr/bin/env python3

import pandas as pd
import sqlalchemy

engine_str = 'mysql+pymysql://admin:mypass@localhost/mydb'
engine = sqlalchemy.create_engine(engine_str, echo=False, encoding='utf-8')\

file_name = "tmp/results.xlsx"
df = pd.read_excel(file_name)

回答by cd98

I can think of two options, but number 1 might be cleaner/faster:

我可以想到两种选择,但第 1 种可能更清洁/更快:

1) Make SQL decide on the update/insert. Check this other question. You can iterate by rows of your 'df', from i=1to n. Inside the loop for the insertion you can write something like:

1) 让 SQL 决定更新/插入。检查这个其他问题。您可以按 'df' 的行进行迭代,从i=1n。在插入的循环内,您可以编写如下内容:

query = """INSERT INTO table (id, name, age) VALUES(%s, %s, %s)
ON DUPLICATE KEY UPDATE name=%s, age=%s"""
engine.execute(query, (df.id[i], df.name[i], df.age[i], df.name[i], df.age[i]))

2) Define a pythonfunction that returns Trueor Falsewhen the record exists and then use it in your loop:

2)定义一个python返回TrueFalse记录存在时的函数,然后在循环中使用它:

def check_existence(user_id):
    query = "SELECT EXISTS (SELECT 1 FROM your_table where user_id_str = %s);"
    return list(engine.execute(query,  (user_id, ) ) )[0][0] == 1

You could iterate over rows and do this check before inserting

您可以遍历行并在插入之前进行此检查

Please also check the solution in this questionand this one toowhich might work in your case.

另外,也请在这个问题上的解决方案这一个了这可能会在你的情况下工作。

回答by Wim Folkerts

When using Pandas no iteration is needed. Isn't that faster?

使用 Pandas 时不需要迭代。那不是更快吗?

df = pd.read_csv(csv_file,sep=';',names=['column'])

df.to_sql('table', con=con, if_exists='overwrite', index=False, chunksize=20000)