在 text/xml 值中编码 CR-LF 换行符的正确方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15016004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:53:43  来源:igfitidea点击:

What's the correct way to encode CR-LF line breaks in text/xml values?

xmlxml-serialization

提问by AlwaysLearning

As opposed to application/xml files which could do anything, or normalizedString values which convert all whitespace sequences to a single space character, I'm asking here specifically in the context of text/xml files with string values. For the sake of simplicity, let's say I'm only using ASCII characters with a UTF8 encoded file.

与可以做任何事情的 application/xml 文件或将所有空白序列转换为单个空格字符的 normalizedString 值相反,我在这里专门在具有字符串值的 text/xml 文件的上下文中询问。为简单起见,假设我仅将 ASCII 字符与 UTF8 编码文件一起使用。

Given the following two-line text string I wish to represent in XML:

给定以下我希望用 XML 表示的两行文本字符串:

Hello
World!

Which is the following bytes in memory:

这是内存中的以下字节:

0000: 48 65 6c 6c 6f 0d 0a 57 6f 72 6c 64 21 Hello..World!

According to RFC 2046, any text/* MIME type MUST (not should) represent a line break using Carriage Return followed by Linefeed character sequence. In that light, the following XML fragment should be right:

根据 RFC 2046,任何 text/* MIME 类型必须(不应该)使用回车符后跟换行符字符序列来表示换行符。有鉴于此,以下 XML 片段应该是正确的:

<tag>Hello
World!</tag>

or

或者

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 0d 0a 57 6f 72 6c <tag>Hello..Worl
0010: 64 21 3c 2f 74 61 67 3c                         d!</tag>

But I regularly see files like the following:

但我经常看到如下文件:

<tag><![CDATA[Hello
World!]]></tag>

Or, even stranger:

或者,甚至是陌生人:

<tag>Hello&xD;
World!</tag>

Where the &0xD; sequence is followed by a single Linefeed character:

其中 &0xD; 序列后跟一个换行符:

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 26 78 44 3b 0a 57 <tag>Hello&xD;.W
0010: 6f 72 6c 64 21 3c 2f 74 61 67 3c                orld!</tag>

What am I missing here? What's the correct way to represent multiple lines of text in an XML string value so that it can come out the other end unmolested?

我在这里缺少什么?在 XML 字符串值中表示多行文本以便它可以不受干扰地从另一端出来的正确方法是什么?

采纳答案by AlwaysLearning

After writing NUnit tests in Mono and JUnit tests in Java, the answer would appear to be to use either <tag>Hello&#13;\nWorld!</tag> or <tag>Hello&#xd;\nWorld!</tag> as below...

在 Mono 中编写 NUnit 测试并在 Java 中编写 JUnit 测试后,答案似乎是使用 <tag>Hello \nWorld!</tag> 或 <tag>Hello \nWorld!</tag>如下...

Foo.cs:

Foo.cs:

using System.IO;
using System.Text;
using System.Xml.Serialization;

namespace XmlStringTests
{
    public class Foo
    {
        public string greeting;

        public static Foo DeserializeFromXmlString (string xml)
        {
            Foo result;
            using (MemoryStream memoryStream = new MemoryStream()) {
                byte[] buffer = Encoding.UTF8.GetBytes (xml);
                memoryStream.Write (buffer, 0, buffer.Length);
                memoryStream.Seek (0, SeekOrigin.Begin);
                XmlSerializer xs = new XmlSerializer (typeof(Foo));
                result = (Foo)xs.Deserialize (memoryStream);
            }
            return result;
        }
    }
}

XmlStringTests.cs:

XmlStringTests.cs:

using NUnit.Framework;

namespace XmlStringTests
{
    [TestFixture]
    public class XmlStringTests
    {
        const string expected = "Hello\u000d\u000aWorld!";

        [Test(Description="Fails")]
        public void Cdata ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello\u000d\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void CdataWithHash13 ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello&#13;\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void CdataWithHashxD ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello&#xd;\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void Simple ()
        {
            const string test = "<Foo><greeting>Hello\u000d\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Passes")]
        public void SimpleWithHash13 ()
        {
            const string test = "<Foo><greeting>Hello&#13;\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Passes")]
        public void SimpleWithHashxD ()
        {
            const string test = "<Foo><greeting>Hello&#xd;\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }
    }
}

Foo.java:

Foo.java:

import java.io.StringReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement(name = "Foo")
@XmlType(propOrder = { "greeting" })
public class Foo {
    public String greeting;

    public static Foo DeserializeFromXmlString(String xml) {
        try {
            JAXBContext context = JAXBContext.newInstance(Foo.class);
            Unmarshaller unmarshaller = context.createUnmarshaller();
            Foo foo = (Foo) unmarshaller.unmarshal(new StringReader(xml));
            return foo;
        } catch (JAXBException e) {
            e.printStackTrace();
            return null;
        }
    }
}

XmlStringTests.java:

XmlStringTests.java:

import static org.junit.Assert.*;
import org.junit.Test;


public class XmlStringTests {
    String expected = "Hello\r\nWorld!";

    @Test //Fails
    public void testCdata ()
    {
        String test = "<Foo><greeting><![CDATA[Hello\r\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testCdataWithHash13 ()
    {
        String test = "<Foo><greeting><![CDATA[Hello&#13;\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testCdataWithHashxD ()
    {
        String test = "<Foo><greeting><![CDATA[Hello&#xd;\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testSimple ()
    {
        String test = "<Foo><greeting>Hello\r\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Passes
    public void testSimpleWithHash13 ()
    {
        String test = "<Foo><greeting>Hello&#13;\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Passes
    public void testSimpleWithHashxD ()
    {
        String test = "<Foo><greeting>Hello&#xd;\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }
}

I hope this saves some people some time.

我希望这可以为一些人节省一些时间。

回答by Eric Galluzzo

CR (&x0D;), LF (&x0A;), CRLF, or a few other combinations are all valid. As noted in the spec, all of these are translated to a single &x0A; character.

CR (&x0D;)、LF (&x0A;)、CRLF 或其他一些组合都是有效的。如规范所述,所有这些都被转换为单个 &x0A; 特点。