pandas 的 read_sql 带有 WHERE 条件的值列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28735213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:59:16  来源:igfitidea点击:

pandas' read_sql with a list of values for WHERE condition

pythonmysqlpandas

提问by Om Nom

Suppose a dataframe scoreDF:

假设一个数据框scoreDF

          date       time      score
sec_code
1048      2015-02-25 09:21:00     28
2888      2015-02-25 09:21:00     25
945       2015-02-25 09:21:00     23
4         2015-02-25 09:21:00     22
669       2015-02-25 09:21:00     15

I need to make a MySQL query to retrieve all rows matching the values in scoreDF.indexi.e. sec_codecolumn.

我需要进行一个 MySQL 查询来检索与scoreDF.indexiesec_code列中的值匹配的所有行。

Normally I'd go for a loop:

通常我会去循环:

    finalResultDF = DataFrame()

    queryString = 'SELECT * FROM tableA WHERE sec_code = ' + code

    for code in scoreDF.index:
        queryResultDF = sql.read_sql(queryString, con)
        finalResultDF.append(queryResultDF)

Would it be possible to do this differently without a loop passing a list of values i.e. scoreDF.indexas WHERE condition? I Googled for hours and some mentions 'parameter' to read_sqlbut I couldn't figure it out.

是否有可能在没有循环传递值列表的情况下以不同的方式执行此操作,即scoreDF.index作为 WHERE 条件?我在谷歌上搜索了几个小时,有些人提到了“参数”,read_sql但我无法弄清楚。

回答by vks

You can actually do this without any loop.

您实际上可以在没有任何循环的情况下执行此操作。

queryString = 'SELECT * FROM tableA WHERE sec_code in '+tuple(scoreDF.index)

This will give the results directly.This is assuming scoreDF.indexis a list.If it is already a tuplethen no typecasting is required.

这将直接给出结果。这是假设scoreDF.index是 a list。如果它已经是 atuple则不需要类型转换。

回答by LiamKelly

As bolec_kolec suggested, I think best practice is to use paramswhen calling read_sql. Here's how I generally do it (Python 3.7):

正如 bolec_kolec 建议的那样,我认为最佳实践是params在调用 read_sql 时使用。这是我通常的做法(Python 3.7):

scoreIndex = scoreDF.index.tolist() 
queryString = 'SELECT * FROM tableA WHERE sec_code = ANY(%(scoreIndex)s)'

queryParams = {'scoreIndex': scoreIndex}
queryResultDF = sql.read_sql(sql = queryString, con, params = queryParams)