Java 奇怪的杰克逊非法字符((CTRL-CHAR,代码 0))Map Reduce 组合器中的异常
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24832614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Strange Hymanson Illegal character ((CTRL-CHAR, code 0)) Exception in Map Reduce Combiner
提问by mle
I have a Map-Reduce job with a mapper which takes a record and converts it into an object, an instance of MyObject, which is marshalled to JSON using Hymanson. The value is just another Text field in the record.
我有一个带有映射器的 Map-Reduce 作业,它获取记录并将其转换为一个对象,即 MyObject 的一个实例,它使用 Hymanson 编组为 JSON。该值只是记录中的另一个文本字段。
The relevant piece of the mapper is something like the following:
映射器的相关部分如下所示:
ObjectMapper mapper = new ObjectMapper();
MyObject val = new MyObject();
val.setA(stringA);
val.setB(stringB);
Writer strWriter = new StringWriter();
mapper.writeValue(strWriter, val);
key.set(strWriter.toString());
The outputs of the mapper are sent to a Combiner which unmarshalls the JSON object and aggregates key-value pairs. It is conceptually very simple and is something like:
映射器的输出被发送到组合器,组合器解组 JSON 对象并聚合键值对。它在概念上非常简单,类似于:
public void reduce(Text key, Iterable<IntWritable> values, Context cxt)
throws IOException, InterruptedException {
int count = 0;
TermIndex x = _mapper.readValue(key.toString(), MyObject.class);
for (IntWritable int : values) ++count;
...
emit (key, value)
}
The MyObject class consists of two fields (both strings), get/set methods and a default constructor. One of the fields stores snippets of text based on a web crawl, but is always a string.
MyObject 类由两个字段(都是字符串)、get/set 方法和一个默认构造函数组成。其中一个字段存储基于网络爬行的文本片段,但始终是字符串。
public class MyObject {
private String A;
private String B;
public MyObject() {}
public String getA() {
return A;
}
public void setA(String A) {
this.A = A;
}
public String getB() {
return B;
}
public void setIdx(String B) {
this.B = B;
}
}
My MapReduce job appears to be running fine until it reaches certain records, which I cannot easily access (because the mapper is generating the records from a crawl), and the following exception is being thrown:
我的 MapReduce 作业似乎运行良好,直到它到达某些我无法轻松访问的记录(因为映射器正在从爬行中生成记录),并且抛出以下异常:
Error: com.fasterxml.Hymanson.core.JsonParseException:
Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: java.io.StringReader@5ae2bee7; line: 1, column: 3]
Would anyone have any suggestions about the cause of this?
有人对造成这种情况的原因有什么建议吗?
回答by Abdul Rahman
You can use StringUtils from apache commons to escape the string - https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/src-html/org/apache/commons/lang/StringEscapeUtils.html#line.89
您可以使用 apache commons 中的 StringUtils 来转义字符串 - https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/src-html/org/apache/commons/lang/StringEscapeUtils.html# 89行
or you can replace selectively the control characters from the string before json marshaling.
或者您可以在 json 封送处理之前有选择地替换字符串中的控制字符。
you can also refer to this post - Illegal character - CTRL-CHAR
你也可以参考这篇文章 -非法字符 - CTRL-CHAR