Java 奇怪的杰克逊非法字符((CTRL-CHAR,代码 0))Map Reduce 组合器中的异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24832614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 15:04:39  来源:igfitidea点击:

Strange Hymanson Illegal character ((CTRL-CHAR, code 0)) Exception in Map Reduce Combiner

javajsonhadoopHymansonmarshalling

提问by mle

I have a Map-Reduce job with a mapper which takes a record and converts it into an object, an instance of MyObject, which is marshalled to JSON using Hymanson. The value is just another Text field in the record.

我有一个带有映射器的 Map-Reduce 作业,它获取记录并将其转换为一个对象,即 MyObject 的一个实例,它使用 Hymanson 编组为 JSON。该值只是记录中的另一个文本字段。

The relevant piece of the mapper is something like the following:

映射器的相关部分如下所示:

ObjectMapper mapper = new ObjectMapper();
MyObject val = new MyObject();
val.setA(stringA);
val.setB(stringB);
Writer strWriter = new StringWriter();
mapper.writeValue(strWriter, val);
key.set(strWriter.toString());

The outputs of the mapper are sent to a Combiner which unmarshalls the JSON object and aggregates key-value pairs. It is conceptually very simple and is something like:

映射器的输出被发送到组合器,组合器解组 JSON 对象并聚合键值对。它在概念上非常简单,类似于:

public void reduce(Text key, Iterable<IntWritable> values, Context cxt) 
    throws IOException, InterruptedException {
    int count = 0;
    TermIndex x = _mapper.readValue(key.toString(), MyObject.class);
    for (IntWritable int : values) ++count;
    ...
    emit (key, value)
}

The MyObject class consists of two fields (both strings), get/set methods and a default constructor. One of the fields stores snippets of text based on a web crawl, but is always a string.

MyObject 类由两个字段(都是字符串)、get/set 方法和一个默认构造函数组成。其中一个字段存储基于网络爬行的文本片段,但始终是字符串。

public class MyObject {
  private String A;
  private String B;

  public MyObject() {}

  public String getA() {
    return A;
  }
  public void setA(String A) {
    this.A = A;
  }
  public String getB() {
    return B;
  } 
  public void setIdx(String B) {
    this.B = B;
  }
}

My MapReduce job appears to be running fine until it reaches certain records, which I cannot easily access (because the mapper is generating the records from a crawl), and the following exception is being thrown:

我的 MapReduce 作业似乎运行良好,直到它到达某些我无法轻松访问的记录(因为映射器正在从爬行中生成记录),并且抛出以下异常:

Error: com.fasterxml.Hymanson.core.JsonParseException: 

    Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
     at [Source: java.io.StringReader@5ae2bee7; line: 1, column: 3]

Would anyone have any suggestions about the cause of this?

有人对造成这种情况的原因有什么建议吗?

回答by Abdul Rahman

You can use StringUtils from apache commons to escape the string - https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/src-html/org/apache/commons/lang/StringEscapeUtils.html#line.89

您可以使用 apache commons 中的 StringUtils 来转义字符串 - https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/src-html/org/apache/commons/lang/StringEscapeUtils.html# 89行

or you can replace selectively the control characters from the string before json marshaling.

或者您可以在 json 封送处理之前有选择地替换字符串中的控制字符。

you can also refer to this post - Illegal character - CTRL-CHAR

你也可以参考这篇文章 -非法字符 - CTRL-CHAR