使用参数从 SQL Server 读取:pandas(或 pyodbc)无法正常运行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37051297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading from SQL Server with params: pandas (or pyodbc) not functioning properly
提问by mburke05
I'm using a query in SQL Server that requires a range to check whether a number is in that range (e.g. in the below to check if DemographicGroupDimID
is either (1,2 or 3) . After doing some googling the only way I found to be able to do that was the below:
我在 SQL Server 中使用一个查询,它需要一个范围来检查一个数字是否在该范围内(例如,在下面检查是否DemographicGroupDimID
是 (1,2 或 3) 。在做了一些谷歌搜索之后我发现了唯一的方法能够做到这一点如下:
SQL
查询语句
DECLARE @adults table (Id int)
INSERT INTO @adults VALUES (1), (2), (3)
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]
If I need to dynamically do this, with different ranges, this fails, like so:
如果我需要使用不同的范围动态执行此操作,则会失败,如下所示:
Pandas query
Pandas查询
from queries import DB_CREDENTIALS
import pyodbc
import pandas as pd
sql_ = """DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]"""
with pyodbc.connect(DB_CREDENTIALS) as cnxn:
df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
The error:
错误:
---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
<ipython-input-5-4b63847d007f> in <module>()
1 with pyodbc.connect(DB_CREDENTIALS) as cnxn:
----> 2 df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
497 sql, index_col=index_col, params=params,
498 coerce_float=coerce_float, parse_dates=parse_dates,
--> 499 chunksize=chunksize)
500
501 try:
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1593
1594 args = _convert_params(sql, params)
-> 1595 cursor = self.execute(*args)
1596 columns = [col_desc[0] for col_desc in cursor.description]
1597
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1570 ex = DatabaseError(
1571 "Execution failed on sql '%s': %s" % (args[0], exc))
-> 1572 raise_with_traceback(ex)
1573
1574 @staticmethod
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1558 cur.execute(*args, **kwargs)
1559 else:
-> 1560 cur.execute(*args)
1561 return cur
1562 except Exception as exc:
DatabaseError: Execution failed on sql 'DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]': ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '@P1'. (102) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")
Is this because the declare statement needs to be within the bounds of the select statement itself? I'm not sure how pandas
handles the pyodbc
cursor object so I'm unsure where this error stems from.
这是因为声明语句需要在选择语句本身的范围内吗?我不确定如何pandas
处理pyodbc
游标对象,所以我不确定这个错误的根源。
Edit: Just to note, the param I passed in this instance was (30)
just to use the simple case of when there is only one number in the range which fails. It of course also fails for more complex strings like (1), (2), (3)
as was the case with the example above.
编辑:请注意,我在此实例中传递的参数(30)
只是为了使用范围内只有一个数字失败的简单情况。对于更复杂的字符串,它当然也会失败,就像(1), (2), (3)
上面的例子一样。
采纳答案by MaxU
If you use prepared statementsin your SQL, you can't put multiple values for one placeholder/parameter/bind variable!
如果您在 SQL 中使用准备好的语句,则不能为一个占位符/参数/绑定变量放置多个值!
Beside this you can use placeholders/parameters/bind variables only in place of literals, you can't use it for part of SQL statement which is nota literal.
除此之外,您只能使用占位符/参数/绑定变量代替文字,您不能将它用于不是文字的 SQL 语句的一部分。
In your case you tried to put (
and )
which is part of SQL, but nota literal as parameters.
在您的情况下,您尝试将(
和放入)
SQL 的一部分,而不是将文字作为参数。
Using parameters/prepared statements/bind variable will also protect you from some SQL injections.
使用参数/准备好的语句/绑定变量也将保护您免受一些 SQL 注入。
that said, try to change your code as follows:
也就是说,尝试按如下方式更改您的代码:
change
改变
INSERT INTO @adults VALUES ?
to
到
INSERT INTO @adults VALUES (?)
and
和
df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
to
到
df = pd.read_sql(sql=sql_, con=cnxn, params=['30'])
UPDATE:
更新:
you can prepare your SQL this way:
您可以通过以下方式准备 SQL:
In [9]: vals = [20,30,40]
In [32]: vals
Out[32]: [20, 30, 40]
In [33]: ' (?)' * len(vals)
Out[33]: ' (?) (?) (?)'
then:
然后:
In [14]: sql_ = """DECLARE @adults table (Id int)
....: INSERT INTO @adults VALUES {}
....:
....: SELECT [date],
....: """
In [15]: sql_.format(' (?)' * len(vals))
Out[15]: 'DECLARE @adults table (Id int)\nINSERT INTO @adults VALUES (?) (?) (?)\n\nSELECT [date],\n'
Pay attention at generated (?) (?) (?)
注意生成 (?) (?) (?)
and finally call your SQL:
最后调用你的 SQL:
df = pd.read_sql(sql=sql_.format(' (?)' * len(vals)), con=cnxn, params=vals)