正则表达式匹配常见的 SQL 语法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/139926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular expression to match common SQL syntax?
提问by Omar Kooheji
I was writing some Unit tests last week for a piece of code that generated some SQL statements.
上周我正在为一段生成一些 SQL 语句的代码编写一些单元测试。
I was trying to figure out a regex to match SELECT
, INSERT
and UPDATE
syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.
我试图找出一个正则表达式匹配SELECT
,INSERT
和UPDATE
语法,所以我可以确认我的方法是产生有效的SQL,和3-4小时的搜索和插科打诨的各种正则表达式的编辑后,我放弃了。
I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.
我设法获得了部分匹配,但由于引号中的部分可以包含任何字符,因此它会快速扩展以匹配整个语句。
Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.
任何帮助将不胜感激,我不太擅长正则表达式,但我想了解更多关于它们的信息。
By the way it's C# RegEx that I'm after.
顺便说一下,我追求的是 C# RegEx。
Clarification
澄清
I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.
我不想访问数据库,因为这是单元测试的一部分,我不想维护数据库来测试我的代码。这可能比项目寿命更长。
回答by Pablo Marambio
Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex.So, you can stop trying.
正则表达式只能匹配有限状态自动机可以解析的语言,这是非常有限的,而 SQL 是一种语法。可以证明您无法使用正则表达式验证 SQL。所以,你可以停止尝试。
回答by Constantin
SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.
SQL 是一种类型 2 语法,它太强大了,无法用正则表达式来描述。这与您决定生成 C# 代码然后在不调用编译器的情况下对其进行验证是一样的。数据库引擎一般来说太复杂了,不容易被截断。
That said, you may try ANTLR's SQL grammars.
也就是说,您可以尝试ANTLR 的 SQL 语法。
回答by George Mauer
I had the same problem - an approach that would work for all the more standard sql statements would be to spin up an in-memory Sqlite database and issue the query against it, if you get back a "table does not exist" error, then your query parsed properly.
我遇到了同样的问题——一种适用于所有更标准的 sql 语句的方法是启动一个内存中的 Sqlite 数据库并针对它发出查询,如果你得到一个“表不存在”的错误,那么您的查询已正确解析。
回答by jason saldo
As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.
据我所知,这超出了正则表达式和您接近 BnF 和编译器的黑暗艺术的范围。
Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...
想要进行正确语法突出显示的人也会发生同样的事情。你开始把东西塞进正则表达式,然后你最终编写了一个编译器......
回答by JeeBee
Off the top of my head: Couldn't you pass the generated SQL to a database and use EXPLAIN on them and catch any exceptions which would indicate poorly formed SQL?
在我的脑海里:你不能将生成的 SQL 传递到数据库并在它们上使用 EXPLAIN 并捕获任何表明 SQL 格式不佳的异常吗?
回答by Pop Catalin
To validate the queries, just run them with SET NOEXEC ON, that is how Entreprise Manager does it when you parse a query without executing it.
要验证查询,只需使用SET NOEXEC ON运行它们,这就是企业管理器在您解析查询而不执行查询时所做的。
Besides if you are using regex to validate sql queries, you can be almost certain that you will miss some corner cases, or that the query is not valid from other reasons, even if it's syntactically correct.
此外,如果您使用正则表达式来验证 sql 查询,您几乎可以肯定会遗漏一些极端情况,或者查询由于其他原因无效,即使它在语法上是正确的。
回答by Marcin
I suggest creating a database with the same schema, possibly using an embedded sql engine, and passing the sql to that.
我建议创建一个具有相同架构的数据库,可能使用嵌入式 sql 引擎,并将 sql 传递给它。
回答by Orion Adrian
Have you tried the lazy selectors. Rather than match as much as possible, they match as little as possible which is probably what you need for quotes.
您是否尝试过惰性选择器。它们不是尽可能多地匹配,而是尽可能少地匹配,这可能是您需要的引号。
回答by MattMcKnight
There are ANTLR grammarsto parse SQL. It's really a better idea to use an in memory databaseor a very lightweight database such as sqlite. It seems wasteful to me to test whether the SQL is valid from a parsing standpoint, and much more useful to check the table and column names and the specifics of your query.
有ANTLR 语法来解析 SQL。使用内存数据库或非常轻量级的数据库(例如sqlite )确实是一个更好的主意。从解析的角度测试 SQL 是否有效对我来说似乎很浪费,而检查表和列名称以及查询的细节更有用。
回答by David Aldridge
I don't think that you even need to have the schema created to be able to validate the statement, because the system will not try to resolve object_name etc until it has successfully parsed the statement.
我认为您甚至不需要创建架构来验证语句,因为系统在成功解析语句之前不会尝试解析 object_name 等。
With Oracle as an example, you would certainly get an error if you did:
以 Oracle 为例,如果你这样做,你肯定会得到一个错误:
select * from non_existant_table;
In this case, "ORA-00942: table or view does not exist".
在这种情况下,“ORA-00942:表或视图不存在”。
However if you execute:
但是,如果您执行:
select * frm non_existant_table;
Then you'll get a syntax error, "ORA-00923: FROM keyword not found where expected".
然后你会得到一个语法错误,“ORA-00923: FROM 关键字未在预期的地方找到”。
It ought to be possible to classify errors into syntax parsing errors that indicate incorrect syntax and errors relating to tables name and permissions etc..
应该可以将错误分类为表示语法不正确的语法解析错误和与表名称和权限等相关的错误。
Add to that the problem of different RDBMSs and even different versions allowing different syntaxes and I think you really have to go to the db engine for this task.
再加上不同 RDBMS 甚至允许不同语法的不同版本的问题,我认为您真的必须使用 db 引擎来完成这项任务。