C# 解析JSON对象数组的正则表达式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/408570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 02:14:17  来源:igfitidea点击:

Regular expression to parse an array of JSON objects?

c#.netregexjson

提问by Dan Herbert

I'm trying to parse an array of JSON objects into an array of strings in C#. I can extract the array from the JSON object, but I can't split the array string into an array of individual objects.

我正在尝试将 JSON 对象数组解析为 C# 中的字符串数组。我可以从 JSON 对象中提取数组,但我无法将数组字符串拆分为单个对象的数组。

What I have is this test string:

我所拥有的是这个测试字符串:

string json = "{items:[{id:0,name:\"Lorem Ipsum\"},{id:1,name" 
            + ":\"Lorem Ipsum\"},{id:2,name:\"Lorem Ipsum\"}]}";

Right now I'm using the following regular expressions right now to split the items into individual objects. For now they're 2 separate regular expressions until I fix the problem with the second one:

现在我正在使用以下正则表达式将项目拆分为单个对象。现在它们是 2 个单独的正则表达式,直到我用第二个来解决问题:

Regex arrayFinder = new Regex(@"\{items:\[(?<items>[^\]]*)\]\}"
                                 , RegexOptions.ExplicitCapture);
Regex arrayParser = new Regex(@"((?<items>\{[^\}]\}),?)+"
                                 , RegexOptions.ExplicitCapture);

The arrayFinderregex works the way I'd expect it but, for reasons I don't understand, the arrayParserregex doesn't work at all. All I want it to do is split the individual items into their own strings so I get a list like this:

arrayFinder正则表达式的工作,我期望它,但对于原因,我不明白的方式,arrayParser正则表达式不会在所有的工作。我想要它做的就是将单个项目拆分成它们自己的字符串,所以我得到一个这样的列表:

{id:0,name:"Lorem Ipsum"}
{id:1,name:"Lorem Ipsum"}
{id:2,name:"Lorem Ipsum"}

{id:0,name:"Lorem Ipsum"}
{id:1,name:"Lorem Ipsum"}
{id:2,name:"Lorem Ipsum"}

Whether this list is a string[]array or a Groupor Matchcollection doesn't matter, but I'm stumped as to how to get the objects split. Using the arrayParserand the jsonstring declared above, I've tried this code which I assumed would work with no luck:

这个列表是一个string[]数组还是一个GroupMatch集合并不重要,但我对如何拆分对象感到困惑。使用上面声明的arrayParserjson字符串,我尝试了这段代码,我认为它没有运气:

string json = "{items:[{id:0,name:\"Lorem Ipsum\"},{id:1,name" 
            + ":\"Lorem Ipsum\"},{id:2,name:\"Lorem Ipsum\"}]}";

Regex arrayFinder = new Regex(@"\{items:\[(?<items>[^\]]*)\]\}"
                                 , RegexOptions.ExplicitCapture);
Regex arrayParser = new Regex(@"((?<items>\{[^\}]\}),?)+"
                                 , RegexOptions.ExplicitCapture);

string array = arrayFinder.Match(json).Groups["items"].Value;
// At this point the 'array' variable contains: 
// {id:0,name:"Lorem Ipsum"},{id:1,name:"Lorem Ipsum"},{id:2,name:"Lorem Ipsum"}

// I would have expected one of these 2 lines to return 
// the array of matches I'm looking for
CaptureCollection c = arrayParser.Match(array).Captures;
GroupCollection g = arrayParser.Match(array).Groups;

Can anybody see what it is I'm doing wrong? I'm totally stuck on this.

任何人都可以看到我做错了什么吗?我完全坚持这一点。

采纳答案by user51099

Balanced parentheses are literally a textbook example of a language that cannot be processed with regular expressions. JSON is essentially balanced parentheses plus a bunch of other stuff, with the braces replaced by parens. In the hierarchy of formal languages, JSON is a context-free language. Regular expressions can't parse context-free languages.

平衡括号实际上是无法用正则表达式处理的语言的教科书示例。JSON 本质上是平衡括号加上一堆其他东西,用括号代替大括号。在形式语言层次结构中,JSON 是一种上下文无关语言。正则表达式无法解析上下文无关语言。

Some systems offer extensions to regular expressions that kinda-sorta handle balanced expressions. However they're all ugly hacks, they're all unportable, and they're all ultimately the wrong tool for the job.

一些系统提供对正则表达式的扩展,有点像处理平衡表达式。然而,它们都是丑陋的黑客,它们都是不可移植的,而且它们最终都是错误的工作工具。

In professional work, you would almost always use an existing JSON parser. If you want to roll your own for educational purposes then I'd suggest starting with a simple arithmetic grammar that supports + - * / ( ). (JSON has some escaping rules which, while not complex, will make your first attempt harder than it needs to be.) Basically, you'll need to:

在专业工作中,您几乎总是会使用现有的 JSON 解析器。如果您想自己动手用于教育目的,那么我建议从支持 + - * / ( ) 的简单算术语法开始。(JSON 有一些转义规则,虽然并不复杂,但会使您的第一次尝试比实际需要的更难。)基本上,您需要:

  1. Decompose the language into an alphabet of symbols
  2. Write a context-free grammar in terms of those symbols thatrecognizes the language
  3. Convert the grammar into Chomsky normal form, or near enough to make step 5 easy
  4. Write a lexer that converts raw text into your input alphabet
  5. Write a recursive descent parser that takes your lexer's output, parses it, and produces some kind of output
  1. 将语言分解为符号字母表
  2. 根据识别语言的符号编写上下文无关文法
  3. 将语法转换为乔姆斯基范式,或接近足以使步骤 5 变得容易
  4. 编写一个词法分析器,将原始文本转换为您输入的字母表
  5. 编写一个递归下降解析器,它接受你的词法分析器的输出,解析它,并产生某种输出

This is a typical third-year CS assignment at just about any university.

这是几乎任何大学的典型三年级 CS 作业。

The next step is to find out how complex a JSON string you need to trigger a stack overflow in your recursive parser. Then look at the other types of parsers that can be written, and you'll understand why anyone who has to parse a context-free language in the real world uses a tool like yacc or antlr instead of writing a parser by hand.

下一步是找出在递归解析器中触发堆栈溢出所需的 JSON 字符串有多复杂。然后看看其他类型的可以编写的解析器,你就会明白为什么在现实世界中必须解析上下文无关语言的任何人都使用 yacc 或 antlr 之类的工具,而不是手动编写解析器。

If that's more learning than you were looking for then you should feel free to go use an off-the-shelf JSON parser, satisified that you learned something important and useful: the limits of regular expressions.

如果这比您想要的更多,那么您应该随意使用现成的 JSON 解析器,满足于您学到了一些重要且有用的东西:正则表达式的限制。

回答by casperOne

Are you using .NET 3.5? If so, you can use the DataContractJsonSerializerto parse this out. There is no reason to do this yourself.

您在使用 .NET 3.5 吗?如果是这样,您可以使用DataContractJsonSerializer来解析它。没有理由自己这样做。

If you are not using .NET 3.5, you can use Jayrock.

如果您不使用 .NET 3.5,则可以使用Jayrock

回答by yfeldblum

JSON cannot typically be parsed with regular expressions (certain extremely simplified variants of JSON can, but then they are not JSON but something else).

JSON 通常不能用正则表达式解析(某些极其简化的 JSON 变体可以,但它们不是 JSON 而是其他东西)。

You need an actual parser to properly parse JSON.

您需要一个实际的解析器来正确解析 JSON。

And anyway, why are you trying to parse JSON at all? There are numerous libraries out there which can do it for you, and much better than your code would. Why reinvent the wheel, when there's a wheel factory around the corner with the words FOSS over the door?

无论如何,您为什么要尝试解析 JSON?有许多库可以为您完成,而且比您的代码要好得多。为什么要重新发明轮子,当拐角处有一家车轮工厂,门上写着 FOSS 字样?

回答by yfeldblum

Balanced parentheses are literally a textbook example of a language that cannot be processed with regular expressions

平衡括号字面上是无法用正则表达式处理的语言的教科书示例

bla bla bla ... check this out:

bla bla bla ... 看看这个:

arrayParser = "(?<Key>[\w]+)":"?(?<Value>([\s\w\d\.\\-/:_]+(,[,\s\w\d\.\\-/:_]+)?)+)"?

this works for me

这对我有用

if you want to match empty values change last '+' to '*'

如果要匹配空值,请将最后一个 '+' 更改为 '*'

回答by Брайков

public Dictionary<string, string> ParseJSON(string s)
{
    Regex r = new Regex("\"(?<Key>[\w]*)\":\"?(?<Value>([\s\w\d\.\\\-/:_\+]+(,[,\s\w\d\.\\\-/:_\+]*)?)*)\"?");
    MatchCollection mc = r.Matches(s);

    Dictionary<string, string> json = new Dictionary<string, string>();

    foreach (Match k in mc)
    {
        json.Add(k.Groups["Key"].Value, k.Groups["Value"].Value);

    }
    return json;
}

This function implement Lukasz regular expression. I only add to inclide + char to value group (because i am using that to parse live connect auth token)

这个函数实现了 Lukasz 正则表达式。我只将 inclide + char 添加到值组(因为我使用它来解析实时连接身份验证令牌)