正则表达式来验证 JSON

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2583472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:27:01  来源:igfitidea点击:

Regex to validate JSON

regexjson

提问by Shard

I am looking for a Regex that allows me to validate json.

我正在寻找允许我验证 json 的正则表达式。

I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

我对 Regex 很陌生,我很清楚用 Regex 解析很糟糕,但它可以用来验证吗?

回答by mario

Yes, a complete regex validation is possible.

是的,可以进行完整的正则表达式验证。

Most modern regex implementations allow for recursive regexpressions, which can verify a complete JSON serialized structure. The json.org specificationmakes it quite straightforward.

大多数现代正则表达式实现都允许递归正则表达式,它可以验证完整的 JSON 序列化结构。该json.org规范使得它非常简单。

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " ([^"\\]* | \\ ["\\bfnrt\/] | \\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \Z
  /six   
';

It works quite well in PHP with the PCRE functions . Should work unmodified in Perl; and can certainly be adapted for other languages. Also it succeeds with the JSON test cases.

它在带有PCRE 函数的PHP 中工作得很好。应该在 Perl 中不加修改地工作;并且当然可以适用于其他语言。它也成功使用JSON 测试用例

Simpler RFC4627 verification

更简单的 RFC4627 验证

A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:

一种更简单的方法是RFC4627 第 6 节中指定的最小一致性检查。然而,它只是作为安全测试和基本的非有效性预防措施:

  var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
         text.replace(/"(\.|[^"\])*"/g, ''))) &&
     eval('(' + text + ')');

回答by Hrant Khachatrian

Yes, it's a common misconception that Regular Expressions can match only regular languages. In fact, the PCRE functions can match much more than regular languages, they can match even some non-context-free languages! Wikipedia's article on RegExpshas a special section about it.

是的,正则表达式只能匹配正则语言是一个常见的误解。事实上,PCRE 函数可以匹配的比常规语言多得多,它们甚至可以匹配一些非上下文无关的语言!维基百科关于 RegExps 的文章有一个专门的部分。

JSON can be recognized using PCRE in several ways!@mario showed one great solution using named subpatterns and back-references. Then he noted that there should be a solution using recursive patterns(?R). Here is an example of such regexp written in PHP:

可以通过多种方式使用 PCRE 识别 JSON!@mario 展示了使用命名子模式和反向引用的一种很好的解决方案。然后他指出应该有一个使用递归模式的解决方案(?R)。这是用 PHP 编写的此类正则表达式的示例:

$regexString = '"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"';
$regexNumber = '-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?';
$regexBoolean= 'true|false|null'; // these are actually copied from Mario's answer
$regex = '/\A('.$regexString.'|'.$regexNumber.'|'.$regexBoolean.'|';    //string, number, boolean
$regex.= '\[(?:(?1)(?:,(?1))*)?\s*\]|'; //arrays
$regex.= '\{(?:\s*'.$regexString.'\s*:(?1)(?:,\s*'.$regexString.'\s*:(?1))*)?\s*\}';    //objects
$regex.= ')\Z/is';

I'm using (?1)instead of (?R)because the latter references the entirepattern, but we have \Aand \Zsequences that should not be used inside subpatterns. (?1)references to the regexp marked by the outermost parentheses (this is why the outermost ( )does not start with ?:). So, the RegExp becomes 268 characters long :)

我使用(?1)的,而不是(?R)因为后者引用的整个格局,但我们有\A\Z不应里面的子模式使用序列。(?1)对最外层括号标记的正则表达式的引用(这就是最外层( )不以 开头的原因?:)。所以,正则表达式变成了 268 个字符长:)

/\A("([^"\]*|\["\bfnrt\/]|\u[0-9a-f]{4})*"|-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?|true|false|null|\[(?:(?1)(?:,(?1))*)?\s*\]|\{(?:\s*"([^"\]*|\["\bfnrt\/]|\u[0-9a-f]{4})*"\s*:(?1)(?:,\s*"([^"\]*|\["\bfnrt\/]|\u[0-9a-f]{4})*"\s*:(?1))*)?\s*\})\Z/is

Anyway, this should be treated as a "technology demonstration", not as a practical solution. In PHP I'll validate the JSON string with calling the json_decode()function (just like @Epcylon noted). If I'm going to usethat JSON (if it's validated), then this is the best method.

无论如何,这应该被视为“技术演示”,而不是实际的解决方案。在 PHP 中,我将通过调用json_decode()函数来验证 JSON 字符串(就像@Epcylon 指出的那样)。如果我要使用该 JSON(如果它经过验证),那么这是最好的方法。

回答by Bart Kiers

Because of the recursive nature of JSON (nested {...}-s), regex is not suited to validate it. Sure, some regex flavours can recursively match patterns*(and can therefor match JSON), but the resulting patterns are horrible to look at, and should never ever be used in production code IMO!

由于 JSON(嵌套{...}-s)的递归性质,正则表达式不适合验证它。当然,一些正则表达式风格可以递归匹配模式*(因此可以匹配 JSON),但结果模式看起来很可怕,并且永远不应该在 IMO 生产代码中使用!

*Beware though, many regex implementations do notsupport recursive patterns. Of the popular programming languages, these support recursive patterns: Perl, .NET, PHP and Ruby 1.9.2

*但请注意,许多正则表达式实现支持递归模式。在流行的编程语言中,这些支持递归模式:Perl、.NET、PHP 和 Ruby 1.9.2

回答by Gino Pane

I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).

我尝试了@mario 的答案,但它对我不起作用,因为我已经从 JSON.org(存档)下载了测试套件,并且有 4 个失败的测试(fail1.json、fail18.json、fail25.json、fail27。 json)。

I've investigated the errors and found out, that fail1.jsonis actually correct (according to manual's noteand RFC-7159valid string is also a valid JSON). File fail18.jsonwas not the case either, cause it contains actually correct deeply-nested JSON:

我调查了错误并发现,这fail1.json实际上是正确的(根据手册的说明RFC-7159有效字符串也是有效的 JSON)。文件fail18.json也不是这种情况,因为它包含实际上正确的深层嵌套 JSON:

[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]

So two files left: fail25.jsonand fail27.json:

所以剩下两个文件:fail25.jsonfail27.json

["  tab character   in  string  "]

and

["line
break"]

Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):

两者都包含无效字符。所以我更新了这样的模式(更新了字符串子模式):

$pcreRegex = '/
          (?(DEFINE)
             (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
             (?<boolean>   true | false | null )
             (?<string>    " ([^"\n\r\t\\]* | \\ ["\\bfnrt\/] | \\ u [0-9a-f]{4} )* " )
             (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
             (?<pair>      \s* (?&string) \s* : (?&json)  )
             (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
             (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
          )
          \A (?&json) \Z
          /six';

So now all legal tests from json.orgcan be passed.

所以现在所有来自json.org 的合法测试都可以通过。

回答by cjbarth

Looking at the documentation for JSON, it seems that the regex can simply be three parts if the goal is just to check for fitness:

查看JSON的文档,如果目标只是检查适合度,则正则表达式似乎可以简单地分为三个部分:

  1. The string starts andends with either []or {}
    • [{\[]{1}...[}\]]{1}
  2. and
    1. The character is an allowed JSON control character (just one)
      • ...[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
    2. orThe set of characters contained in a ""
      • ...".*?"...
  1. 字符串以或开始结束[]{}
    • [{\[]{1}...[}\]]{1}
    1. 该字符是允许的 JSON 控制字符(仅一个)
      • ... [,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
    2. 包含在 a 中的字符集""
      • ... ".*?"...

All together: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

全部一起: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

If the JSON string contains newlinecharacters, then you should use the singlelineswitch on your regex flavor so that .matches newline. Please note that this will not fail on all bad JSON, but it will fail if the basic JSON structure is invalid, which is a straight-forward way to do a basic sanity validation before passing it to a parser.

如果 JSON 字符串包含newline字符,那么您应该使用singleline正则表达式风格的开关,以便.匹配newline. 请注意,这不会在所有错误的 JSON 上失败,但如果基本 JSON 结构无效,它将失败,这是在将其传递给解析器之前进行基本健全性验证的直接方法。

回答by pmarreck

I created a Ruby implementation of Mario's solution, which does work:

我创建了 Mario 解决方案的 Ruby 实现,它确实有效:

# encoding: utf-8

module Constants
  JSON_VALIDATOR_RE = /(
         # define subtypes and build up the json syntax, BNF-grammar-style
         # The {0} is a hack to simply define them as named groups here but not match on them yet
         # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
         (?<number>  -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
         (?<boolean> true | false | null ){0}
         (?<string>  " (?>[^"\\]* | \\ ["\\bfnrt\/] | \\ u [0-9a-f]{4} )* " ){0}
         (?<array>   \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
         (?<pair>    \s* \g<string> \s* : \g<json> ){0}
         (?<object>  \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
         (?<json>    \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
       )
    \A \g<json> \Z
    /uix
end

########## inline test running
if __FILE__==$PROGRAM_NAME

  # support
  class String
    def unindent
      gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
    end
  end

  require 'test/unit' unless defined? Test::Unit
  class JsonValidationTest < Test::Unit::TestCase
    include Constants

    def setup

    end

    def test_json_validator_simple_string
      assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
    end

    def test_json_validator_deep_string
      long_json = <<-JSON.unindent
      {
          "glossary": {
              "title": "example glossary",
          "GlossDiv": {
                  "id": 1918723,
                  "boolean": true,
                  "title": "S",
            "GlossList": {
                      "GlossEntry": {
                          "ID": "SGML",
                "SortAs": "SGML",
                "GlossTerm": "Standard Generalized Markup Language",
                "Acronym": "SGML",
                "Abbrev": "ISO 8879:1986",
                "GlossDef": {
                              "para": "A meta-markup language, used to create markup languages such as DocBook.",
                  "GlossSeeAlso": ["GML", "XML"]
                          },
                "GlossSee": "markup"
                      }
                  }
              }
          }
      }
      JSON

      assert_not_nil long_json.match(JSON_VALIDATOR_RE)
    end

  end
end

回答by user117529

A trailing comma in a JSON array caused my Perl 5.16 to hang, possibly because it kept backtracking. I had to add a backtrack-terminating directive:

JSON 数组中的尾随逗号导致我的 Perl 5.16 挂起,可能是因为它一直在回溯。我不得不添加一个回溯终止指令:

(?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) )(*PRUNE) \s* )
                                                                                   ^^^^^^^^

This way, once it identifies a construct that is not 'optional' (*or ?), it shouldn't try backtracking over it to try to identify it as something else.

这样,一旦它识别出一个不是“可选”(*?)的构造,它就不应该尝试回溯它以尝试将其识别为其他东西。

回答by Mikaeru

For "strings and numbers", I think that the partial regular expression for numbers:

对于“字符串和数字”,我认为数字的部分正则表达式:

-?(?:0|[1-9]\d*)(?:\.\d+)(?:[eE][+-]\d+)?

should be instead:

应该是:

-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?

since the decimal part of the number is optional, and also it is probably safer to escape the -symbol in [+-]since it has a special meaning between brackets

由于数字的小数部分是可选的,而且将-符号转义可能更安全,[+-]因为它在括号之间具有特殊含义

回答by Ravi Nandasana



Regex that validate simple JSON not JSONArray

验证简单 JSON 而非 JSONArray 的正则表达式

it validate key(string):value(string,integer,[{key:value},{key:value}],{key:value})

它验证 key(string):value(string,integer,[{key:value},{key:value}],{key:value})

^\{(\s|\n\s)*(("\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))*(\s|\n)*\}$

sample data that validate by this JSON

通过此 JSON 验证的示例数据

{
"key":"string",
"key": 56,
"key":{
        "attr":"integer",
        "attr": 12
        },
"key":{
        "key":[
            {
                "attr": 4,
                "attr": "string"
            }
        ]
     }
}

回答by exside

As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):

如上所述,如果您使用的语言附带一个 JSON 库,请使用它来尝试解码字符串并在失败时捕获异常/错误!如果语言没有(只有 FreeMarker 有这样的情况),以下正则表达式至少可以提供一些非常基本的验证(它是为 PHP/PCRE 编写的,可供更多用户测试/使用)。它不像公认的解决方案那么万无一失,但也不那么可怕 =):

~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s

short explanation:

简短说明:

// we have two possibilities in case the string is JSON
// 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
// this can be matched by the following regex which makes sure there is at least a {" at the
// beginning of the string and a } at the end of the string, whatever is inbetween is not checked!

^\{\s*\".*\}$

// OR (character "|" in the regex pattern)
// 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
// which would be matched by the second part of the pattern above

^\[\n?\{\s*\".*\}\n?\]$

// the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)

if I missed something that would break this unintentionally, I'm grateful for comments!

如果我错过了会无意中破坏它的东西,我很感激评论!