java 为什么 DataOutputStream.writeUTF() 在开头添加额外的 2 个字节?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7630242/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does DataOutputStream.writeUTF() add additional 2 bytes at the beginning?
提问by Xeno Lupus
When I was trying to parse xml using sax over sockets I came across a strange occurence. Upon analysing I noticed that DataOutputStream adds 2 bytes in front of my data.
当我试图通过套接字使用 sax 解析 xml 时,我遇到了一个奇怪的现象。在分析时,我注意到 DataOutputStream 在我的数据前面添加了 2 个字节。
Message send by DataOutputStream:
DataOutputStream 发送的消息:
0020 50 18 00 20 0f df 00 00 00 9d 3c 3f 78 6d 6c 20 P.. .... ..<?xml
0030 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 3f 3e 3c version= "1.0"?><
0040 63 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c company> <staff><
0050 66 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f firstnam e>yong</
0060 66 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e firstnam e><lastn
0070 61 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 ame>mook kim</la
0080 73 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 stname>< nickname
0090 3e c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 >..</nic kname><s
00a0 61 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 alary>10 0000</sa
00b0 6c 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 lary></s taff></c
00c0 6f 6d 70 61 6e 79 3e ompany>
Message send using Transformer:
使用 Transformer 发送消息:
0020 50 18 00 20 b6 b1 00 00 3c 3f 78 6d 6c 20 76 65 P.. .... <?xml ve
0030 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f rsion="1 .0" enco
0040 64 69 6e 67 3d 22 75 74 66 2d 38 22 3f 3e 3c 63 ding="ut f-8"?><c
0050 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c 66 ompany>< staff><f
0060 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f 66 irstname >yong</f
0070 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e 61 irstname ><lastna
0080 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 73 me>mook kim</las
0090 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 3e tname><n ickname>
00a0 c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 61 ..</nick name><sa
00b0 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 6c lary>100 000</sal
00c0 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 6f ary></st aff></co
00d0 6d 70 61 6e 79 3e mpany>
As one might notice DataOutputStream adds two bytes in front of the message. Thus the sax parser throws the exception "org.xml.sax.SAXParseException: Content is not allowed in prolog.". However when I skip over these 2 bytes the sax parser works just fine. Additional I noticed that DataInputStream is unable to read the Transformer message.
人们可能会注意到 DataOutputStream 在消息前面添加了两个字节。因此 sax 解析器会抛出异常“org.xml.sax.SAXParseException: Content is not allowed in prolog.”。但是,当我跳过这 2 个字节时,sax 解析器工作得很好。另外我注意到 DataInputStream 无法读取 Transformer 消息。
My question is: Why does DataOutputStream adds these bytes and why doesn't the Transformer?
我的问题是:为什么 DataOutputStream 添加这些字节而 Transformer 为什么不添加?
For those who are interested in replicating the problem here is some code:
对于那些有兴趣复制问题的人,这里有一些代码:
Server using DataInputStream:
使用 DataInputStream 的服务器:
String data = "<?xml version=\"1.0\"?><company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>§</nickname><salary>100000</salary></staff></company>";
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.writeUTF(data);
os.close();
socket.close();
Server using Transformer:
使用 Transformer 的服务器:
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
Document doc = createDocument();
printXML(doc, os);
os.close();
socket.close();
public synchronized static void printXML(Document document, OutputStream stream) throws TransformerException
{
DOMSource domSource = new DOMSource(document);
StreamResult streamResult = new StreamResult(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "no");
serializer.transform(domSource, streamResult);
}
private static Document createDocument() throws ParserConfigurationException
{
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element company = document.createElement("company");
Element staff = document.createElement("staff");
Element firstname = document.createElement("firstname");
Element lastname = document.createElement("lastname");
Element nickname = document.createElement("nickname");
Element salary = document.createElement("salary");
Text firstnameText = document.createTextNode("yong");
Text lastnameText = document.createTextNode("mook kim");
Text nicknameText = document.createTextNode("§");
Text salaryText = document.createTextNode("100000");
document.appendChild(company);
company.appendChild(staff);
staff.appendChild(firstname);
staff.appendChild(lastname);
staff.appendChild(nickname);
staff.appendChild(salary);
firstname.appendChild(firstnameText);
lastname.appendChild(lastnameText);
nickname.appendChild(nicknameText);
salary.appendChild(salaryText);
return document;
}
Client using SAX Parser:
使用 SAX 解析器的客户端:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new MyHandler();
Socket socket = new Socket("localhost", 60000);
InputSource is = new InputSource(new InputStreamReader(socket.getInputStream()));
is.setEncoding("UTF-8");
//socket.getInputStream().skip(2); // skip over the 2 bytes from the DataInputStream
saxParser.parse(is, handler);
Client using DataInputStream:
使用 DataInputStream 的客户端:
Socket socket = new Socket("localhost", 60000);
DataInputStream os = new DataInputStream(socket.getInputStream());
while(true) {
String data = os.readUTF();
System.out.println("Data: " + data);
}
回答by Stephen Denne
The output of DataOutputStream.writeUTF()
is a custom format, intended to be read by DataInputStream.readUTF()
.
的输出DataOutputStream.writeUTF()
是自定义格式,旨在由DataInputStream.readUTF()
.
The javadocs of the writeUTF
method you are calling say:
writeUTF
您正在调用的方法的 javadocs说:
Writes a string to the underlying output stream using modified UTF-8 encoding in a machine-independent manner.
First, two bytes are written to the output stream as if by the
writeShort
method giving the number of bytes to follow.This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for the character. If no exception is thrown, the counterwritten
is incremented by the total number of bytes written to the output stream. This will be at least two plus the length ofstr
, and at most two plus thrice the length ofstr
.
以独立于机器的方式使用修改后的 UTF-8 编码将字符串写入底层输出流。
首先,两个字节被写入输出流,就像通过
writeShort
给出要跟随的字节数的方法一样。该值是实际写出的字节数,而不是字符串的长度。在长度之后,使用修改后的 UTF-8 字符编码顺序输出字符串的每个字符。如果没有抛出异常,则计数器written
按写入输出流的总字节数递增。这将至少是 2 加 的长度str
,最多是 2 加 3 的长度str
。
回答by MeBigFatGuy
Always use the same type of stream when reading and writing data. If you are feeding the stream directly into a sax parser, then you should not use a DataOutputStream.
读取和写入数据时始终使用相同类型的流。如果您将流直接提供给 sax 解析器,则不应使用 DataOutputStream。
Just use
只需使用
BufferedOutputStream bos = new BufferedOutputStream(socket.getOutputStream());
bos.write(os.getBytes("UTF-8"));