用 Pandas DataFrame 替换 mysql 数据库表中的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34661318/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
REPLACE rows in mysql database table with pandas DataFrame
提问by Yogesh Yadav
Python Version - 2.7.6
Python 版本 - 2.7.6
Pandas Version - 0.17.1
Pandas版本 - 0.17.1
MySQLdb Version - 1.2.5
MySQLdb 版本 - 1.2.5
In my database ( PRODUCT
) , I have a table ( XML_FEED
). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF
). The dataframe has thousands of rows.
在我的数据库 ( PRODUCT
) 中,我有一个表 ( XML_FEED
)。表 XML_FEED 很大(数百万条记录)我有一个 pandas.DataFrame() ( PROCESSED_DF
)。数据框有数千行。
Now I need to run this
现在我需要运行这个
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
题:-
Is there a way to run REPLACE INTO TABLE
in pandas? I already checked pandas.DataFrame.to_sql()
but that is not what I need. I do not prefer to read XML_FEED
table in pandas because it very huge.
有没有办法REPLACE INTO TABLE
在Pandas中运行?我已经检查过了,pandas.DataFrame.to_sql()
但这不是我需要的。我不喜欢XML_FEED
在Pandas中阅读表格,因为它非常大。
采纳答案by Yogesh Yadav
Till this version (0.17.1)
I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb
and then using DataFrame.to_sql(if_exists='append')
在这个版本之前,(0.17.1)
我无法在Pandas中找到任何直接的方法来做到这一点。我报告了相同的功能请求。我在我的项目中执行了一些查询MySQLdb
,然后使用DataFrame.to_sql(if_exists='append')
Suppose
认为
1) product_id is my primary key in table PRODUCT
1) product_id 是我在表 PRODUCT 中的主键
2) feed_id is my primary key in table XML_FEED.
2) feed_id 是我在表 XML_FEED 中的主键。
SIMPLE VERSION
简单版
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO]
syntax allows us to INSERT
a row into a table, except that if a UNIQUE KEY
(including PRIMARY KEY
) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
请注意:-REPLACE [INTO]
语法允许我们将INSERT
一行放入表中,除非发生UNIQUE KEY
(包括PRIMARY KEY
)违规,旧行在新 INSERT 之前被删除,因此没有违规。
回答by dbc
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO
statement with df.to_sql()
.
我需要一个通用的解决方案来解决这个问题,所以我以 shiva 的答案为基础——也许它会对其他人有所帮助。这在您从 MySQL 数据库(整个或过滤的)中获取一个表、更新/添加一些行并希望REPLACE INTO
使用df.to_sql()
.
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
它找到表的主键,使用来自 Pandas 数据帧的所有键对 MySQL 表执行删除语句,然后将数据帧插入到 MySQL 表中。
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(name, engine, schema=schema, if_exists='append', index=False)
回答by devnull
With the release of pandas 0.24.0, there is now an official wayto achieve this by passing a custom insert method to the to_sql
function.
随着 pandas 0.24.0 的发布,现在有一种官方方法可以通过将自定义插入方法传递给to_sql
函数来实现此目的。
I was able to achieve the behavior of REPLACE INTO
by passing this callable to to_sql
:
我是能够实现的行为,REPLACE INTO
通过传递这个调用到to_sql
:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
@compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
你会像这样传递它:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ...
instead, you can use this:
或者,如果你想要的行为INSERT ... ON DUPLICATE KEY UPDATE ...
,你可以使用这个:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794for the compile method.
回答by Imran Malek
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
如果您使用 to_sql,您应该能够定义它,以便您替换存在的值,因此对于名为“mydb”的表和名为“df”的数据框,您可以使用:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.
如果它们已经存在,那应该替换值,但我不能 100% 确定这是否是您要查找的内容。