Java JsonParseException:无法识别的标记“http”:期待(“真”、“假”或“空”)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28604363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 06:29:46  来源:igfitidea点击:

JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')

javaHymansoncloudera

提问by Fanooos

We have the following string which is a valid JSON written to a file on HDFS.

我们有以下字符串,它是写入 HDFS 文件的有效 JSON。

{  
  "id":"tag:search.twitter.com,2005:564407444843950080",
  "objectType":"activity",
  "actor":{  
    "objectType":"person",
    "id":"id:twitter.com:2302910022",
    "link":"http%3A%2F%2Fwww.twitter.com%2Fme7me4610012",
    "displayName":"",
    "postedTime":"2014-01-21T11:06:06.000Z",
    "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F563125491159162881%2FfypkHK3M_normal.jpeg",
    "summary":"???????????????????? ????????????? ??????????????? ???? ???????????????? ??????  ????? ??????????? lloooo_20",
    "links":[  
      {  
        "href":null,
        "rel":"me"
      }
    ],
    "friendsCount":10503,
    "followersCount":10325,
    "listedCount":12,
    "statusesCount":84957,
    "twitterTimeZone":null,
    "verified":false,
    "utcOffset":null,
    "preferredUsername":"me7me4610012",
    "languages":[  
      "ar"
    ],
    "favoritesCount":17
  },
  "verb":"share",
  "postedTime":"2015-02-08T12:56:35.000Z",
  "generator":{  
    "displayName":"Twitter for Android",
    "link":"http%3A%2F%2Ftwitter.com%2Fdownload%2Fandroid"
  },
  "provider":{  
    "objectType":"service",
    "displayName":"Twitter",
    "link":"http%3A%2F%2Fwww.twitter.com"
  },
  "link":"http%3A%2F%2Ftwitter.com%2Fme7me4610012%2Fstatuses%2F564407444843950080",
  "body":"RT @sckud1: ?????: ???? ???? ???? ?????? ??? ??? ???? ??? ???? ?? ????? ???? ????? ?????: ?????  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIln…",
  "object":{  
    "id":"tag:search.twitter.com,2005:564407126526013440",
    "objectType":"activity",
    "actor":{  
      "objectType":"person",
      "id":"id:twitter.com:462268717",
      "link":"http%3A%2F%2Fwww.twitter.com/sckud1",
      "displayName":"??? ?????",
      "postedTime":"2012-01-12T19:24:17.000Z",
      "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F508424482885615616%2FmPBGZBPx_normal.jpeg",
      "summary":"?????? ?? ??? ?????? ???? ?? ?????? ??? ???? ?? ????? ????? ?????  http%3A%2F%2Fmarketgulf.com",
      "links":[  
        {  
          "href":"http%3A%2F%2Fmarketgulf.com",
          "rel":"me"
        }
      ],
      "friendsCount":435237,
      "followersCount":464951,
      "listedCount":708,
      "statusesCount":1071685,
      "twitterTimeZone":"Riyadh",
      "verified":false,
      "utcOffset":"10800",
      "preferredUsername":"sckud1",
      "languages":[  
        "ar"
      ],
      "location":{  
        "objectType":"place",
        "displayName":"Made in K S A"
      },
      "favoritesCount":77
    },
    "verb":"post",
    "postedTime":"2015-02-08T12:55:19.000Z",
    "generator":{  
      "displayName":"Tweet Old Post",
      "link":"http%3A%2F%2Fwww.ajaymatharu.com%2F"
    },
    "provider":{  
      "objectType":"service",
      "displayName":"Twitter",
      "link":"http%3A%2F%2Fwww.twitter.com"
    },
    "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
    "body":"?????: ???? ???? ???? ?????? ??? ??? ???? ??? ???? ?? ????? ???? ????? ?????: ?????  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
    "object":{  
      "objectType":"note",
      "id":"object:search.twitter.com,2005:564407126526013440",
      "summary":"?????: ???? ???? ???? ?????? ??? ??? ???? ??? ???? ?? ????? ???? ????? ?????: ?????  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
      "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
      "postedTime":"2015-02-08T12:55:19.000Z"
    },
    "favoritesCount":0,
    "twitter_entities":{  
      "hashtags":[  

      ],
      "trends":[  

      ],
      "urls":[  
        {  
          "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
          "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
          "display_url":"hasterya.com/archives/34688…",
          "indices":[  
            85,
            107
          ]
        }
      ],
      "user_mentions":[  

      ],
      "symbols":[  

      ],
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_extended_entities":{  
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_filter_level":"low",
    "twitter_lang":"ar"
  },
  "favoritesCount":0,
  "twitter_entities":{  
    "hashtags":[  

    ],
    "trends":[  

    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "display_url":"hasterya.com/archives/34688…",
        "indices":[  
          97,
          119
        ]
      }
    ],
    "user_mentions":[  
      {  
        "screen_name":"sckud1",
        "name":"??? ?????",
        "id":462268717,
        "id_str":"462268717",
        "indices":[  
          3,
          10
        ]
      }
    ],
    "symbols":[  

    ],
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_extended_entities":{  
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_filter_level":"low",
  "twitter_lang":"ar",
  "retweetCount":1,
  "gnip":{  
    "matching_rules":[  
      {  
        "tag":"ISIS66"
      }
    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "expanded_status":200
      },
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "expanded_status":200
      }
    ],
    "klout_score":50,
    "language":{  
      "value":"ar"
    }
  }
}

EDIT

编辑

We configure a flume agent that reads the data from that file and pass it to Solr sink but unfortunately this exception in the title is throw.

我们配置了一个从该文件读取数据并将其传递给 Solr 接收器的水槽代理,但不幸的是,标题中的这个异常是抛出。

and here is the stack trace

这是堆栈跟踪

org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.Hymanson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:98)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:120)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:128)
    at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:141)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.fasterxml.Hymanson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at com.fasterxml.Hymanson.core.JsonParser._constructError(JsonParser.java:1524)
    at com.fasterxml.Hymanson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
    at com.fasterxml.Hymanson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3095)
    at com.fasterxml.Hymanson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2340)
    at com.fasterxml.Hymanson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:818)
    at com.fasterxml.Hymanson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:698)
    at com.fasterxml.Hymanson.databind.MappingIterator.hasNextValue(MappingIterator.java:159)
    at org.kitesdk.morphline.json.ReadJsonBuilder$ReadJson.doProcess(ReadJsonBuilder.java:109)
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
    ... 10 more

采纳答案by Stephen C

We have the following string which is a valid JSON ...

我们有以下字符串,它是一个有效的 JSON ...

Clearly the JSON parser disagrees!

显然 JSON 解析器不同意!

However, the exception says that the error is at "line 1: column 9", and there is no "http" token near the beginning of the JSON. So I suspect that the parser is trying to parse something different than this string when the error occurs.

但是,异常表示错误位于“第 1 行:第 9 列”,并且 JSON 开头附近没有“http”标记。所以我怀疑解析器在错误发生时试图解析与这个字符串不同的东西。

You need to find what JSON is actuallybeing parsed. Run the application within a debugger, set a breakpoint on the relevant constructor for JsonParseException... then find out what is in the ByteArrayInputStreamthat it is attempting to parse.

您需要找到实际解析的JSON 。在调试器中运行应用程序,在相关构造函数上设置断点JsonParseException...然后找出ByteArrayInputStream它试图解析的内容。

回答by Fanooos

I faced this exception for a long time and was not able to pinpoint the problem. The exception says line 1 column 9. The mistake I did is to get the first line of the file which flume is processing.

我很长一段时间都面临这个异常,但无法查明问题所在。例外是第 1 行第 9 列。我犯的错误是获取flume 正在处理的文件的第一行。

Apache flume process the content of the file in patches. So, when flume throws this exception and says line 1, it means the first line in the current patch.

Apache flume 以补丁的形式处理文件的内容。所以,当flume抛出这个异常并说第1行时,它意味着当前补丁中的第一行。

If your flume agent is configured to use batch size = 100, and (for example) the file contains 400 lines, this means the exception is thrown in one of the following lines 1, 101, 201,301.

如果您的水槽代理配置为使用批处理大小 = 100,并且(例如)文件包含 400 行,这意味着在以下第 1、101、201,301 行之一中抛出异常。

How to discover the line which causes the problem?

如何发现导致问题的线路?

You have three ways to do that.

你有三种方法可以做到这一点。

1- pull the source code and run the agent in debug mode. If you are an average developer like me and do not know how to make this, check the other two options.

1- 拉取源代码并在调试模式下运行代理。如果您是像我这样的普通开发人员并且不知道如何进行此操作,请检查其他两个选项。

2- Try to split the file based on the batch size and run the flume agent again. If you split the file into 4 files, and the invalid json exists between lines 301 and 400, the flume agent will process the first 3 files and stop at the fourth file. Take the fourth file and again split it into more smaller files. continue the process until you reach a file with only one line and flume fails while processing it.

2- 尝试根据批量大小拆分文件并再次运行水槽代理。如果将文件拆分为 4 个文件,并且第 301 行和第 400 行之间存在无效的 json,flume 代理将处理前 3 个文件并在第四个文件处停止。取第四个文件并再次将其拆分为更小的文件。继续这个过程,直到你得到一个只有一行的文件,并且在处理它时水槽失败。

3- Reduce the batch size of the flume agent to only one and compare the number of processed events in the output of the sink you are using. For example, in my case I am using Solr sink. The file contains 400 lines. The flume agent is configured with batch size=100. When I run the flume agent, it fails at some point and throw that exception. At this point check how many documents are ingested in Solr. If the invalid json exists at line 346, the number of documents indexed into Solr will be 345, so the next line is the line which causes the problem.

3- 将水槽代理的批量大小减少到只有一个,并比较您正在使用的接收器输出中已处理事件的数量。例如,就我而言,我使用的是 Solr 接收器。该文件包含 400 行。水槽代理配置为批量大小=100。当我运行水槽代理时,它在某个时候失败并抛出该异常。此时检查在 Solr 中摄取了多少文档。如果第 346 行存在无效的 json,则索引到 Solr 的文档数将为 345,因此下一行是导致问题的行。

In my case I followed the third option and fortunately I pinpoint the line which causes the problem.

在我的情况下,我遵循了第三个选项,幸运的是我查明了导致问题的行。

This is a long answer but it actually does not solve the exception. How I overcome this exception?

这是一个很长的答案,但实际上并没有解决异常。我如何克服这个异常?

I have no idea why Hymanson library complain while parsing a json string contains escaped characters \n \r \t. I think (but I am not sure) the Hymanson parser is by default escaping these characters which cases the json string to be split into two lines (in case of \n) and then it deals each line as a separate json string.

我不知道为什么 Hymanson 库在解析包含转义字符的 json 字符串时会抱怨\n \r \t。我认为(但我不确定)Hymanson 解析器默认会转义这些字符,这\n会将json 字符串分成两行(在 的情况下),然后将每一行作为单独的 json 字符串处理。

In my case we used a customized interceptor to remove these characters before being processed by the flume agent. This is the way we solved this problem.

在我的例子中,我们使用定制的拦截器在被水槽代理处理之前删除这些字符。这就是我们解决这个问题的方法。

回答by Loch

It might be obvious, but make sure that you are sending to the parser URL object not a String containing www adress. This will notwork:

这可能很明显,但请确保您发送到解析器 URL 对象而不是包含 www 地址的字符串。这将工作:

    ObjectMapper mapper = new ObjectMapper();
    String www = "www.sample.pl";
    Weather weather = mapper.readValue(www, Weather.class);

But this will:

但这将:

    ObjectMapper mapper = new ObjectMapper();
    URL www = new URL("http://www.oracle.com/");
    Weather weather = mapper.readValue(www, Weather.class);