从文本中提取 JSON 字符串的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20916699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 20:08:52  来源:igfitidea点击:

regular expression to extract JSON string from text

regexjsonexpression

提问by isaaijan

I'm looking for regex to extract json string from text. I have the text below, which contains

我正在寻找正则表达式来从文本中提取 json 字符串。我有下面的文字,其中包含

JSON string(mTitle, mPoster, mYear, mDate)

like that:

像那样:

{"999999999":"138138138","020202020202":{"846":{"mTitle":"\u0430","mPoster":{"
small":"\/upload\/ms\/b_248.jpg","middle":"600.jpg","big":"400.jpg"},"mYear"
:"2013","mDate":"2014-01-01"},"847":{"mTitle":"\u043a","mPoster":"small":"\/upload\/ms\/241.jpg","middle":"600.jpg","big":"
138.jpg"},"mYear":"2013","mDate":"2013-12-26"},"848":{"mTitle":"\u041f","mPoster":{"small":"\/upload\/movies\/2
40.jpg","middle":"138.jpg","big":"131.jpg"},"mYear":"2013","mDate":"2013-12-19"}}}

In order to parse JSON string I should extract JSON string from the text. That is why, my question: Could you help me to get only JSON string from text? Please help.

为了解析 JSON 字符串,我应该从文本中提取 JSON 字符串。这就是为什么,我的问题是:你能帮我从文本中只获取 JSON 字符串吗?请帮忙。

I've tried this regular expression with no success:

我试过这个正则表达式但没有成功:

{"mTitle":(\w|\W)*"mDate":(\w|\W)*}

回答by acdcjunior

The following regex should work:

以下正则表达式应该有效:

\{\s*"mTitle"\s*:\s*(.+?)\s*,\s*"mPoster":\s*(.+?)\s*,\s*"mYear"\s*:\s*(.+?)\s*,\s*"mDate"\s*:\s*(.+?)\s*\}

Check demohere.

在此处查看演示

The main difference from your regex is the .+?part, that, broken down, means:

与您的正则表达式的主要区别在于.+?部分,分解后意味着:

  • Match any character (.)
  • One or more times (+)
  • As little as possible (?)
  • 匹配任何字符 ( .)
  • 一次或多次 ( +)
  • 尽量少(?

The ?operator after the +is very important here --- because if you removed it, the first .+(in \{\s*"mTitle"\s*:\s*(.+?)) would match the whole text, notthe text up to the "mPoster"word, that is what you want.

?操作后,+在这里非常重要---因为如果你删除它,第一个.+(在\{\s*"mTitle"\s*:\s*(.+?))将匹配整个文本,没有文本到"mPoster"的话,那就是你想要的。

Notice it is just a more complicated version of \{"mTitle":(.+?),"mPoster":(.+?),"mYear":(.+?),"mDate":(.+?)\}(with \s*to match spaces, allowed by the JSON notation).

请注意,它只是一个更复杂的版本\{"mTitle":(.+?),"mPoster":(.+?),"mYear":(.+?),"mDate":(.+?)\}\s*用于匹配空格,JSON 表示法允许)。