vb.net 编写没有字节顺序标记 (BOM) 的文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2437666/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Write text files without Byte Order Mark (BOM)?
提问by Vijay Balkawade
I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this?
I can write file with UTF8 encoding but, how to remove Byte Order Mark from it?
我正在尝试使用带有 UTF8 编码的 VB.Net 创建一个文本文件,没有 BOM。任何人都可以帮助我,如何做到这一点?
我可以用 UTF8 编码写入文件,但是,如何从中删除字节顺序标记?
edit1: I have tried code like this;
edit1:我试过这样的代码;
Dim utf8 As New UTF8Encoding()
Dim utf8EmitBOM As New UTF8Encoding(True)
Dim strW As New StreamWriter("c:\temp\bom.html", True, utf8EmitBOM)
strW.Write(utf8EmitBOM.GetPreamble())
strW.WriteLine("hi there")
strW.Close()
Dim strw2 As New StreamWriter("c:\temp\bom.html", True, utf8)
strw2.Write(utf8.GetPreamble())
strw2.WriteLine("hi there")
strw2.Close()
1.html get created with UTF8 encoding only and 2.html get created with ANSI encoding format.
1.html 仅使用 UTF8 编码创建,2.html 使用 ANSI 编码格式创建。
Simplified approach - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html
简化方法 - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html
回答by stakx - no longer contributing
In order to omit the byte order mark (BOM), your stream must use an instance of UTF8Encoding
other than System.Text.Encoding.UTF8
(which is configured to generate a BOM). There are two easy ways to do this:
为了省略字节顺序标记 (BOM),您的流必须使用UTF8Encoding
other than System.Text.Encoding.UTF8
(配置为生成 BOM)的实例。有两种简单的方法可以做到这一点:
1. Explicitly specifying a suitable encoding:
1. 明确指定合适的编码:
Call the
UTF8Encoding
constructorwithFalse
for theencoderShouldEmitUTF8Identifier
parameter.Pass the
UTF8Encoding
instance to the stream constructor.
调用
UTF8Encoding
构造函数用False
的encoderShouldEmitUTF8Identifier
参数。将
UTF8Encoding
实例传递给流构造函数。
' VB.NET:
Dim utf8WithoutBom As New System.Text.UTF8Encoding(False)
Using sink As New StreamWriter("Foobar.txt", False, utf8WithoutBom)
sink.WriteLine("...")
End Using
// C#:
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
using (var sink = new StreamWriter("Foobar.txt", false, utf8WithoutBom))
{
sink.WriteLine("...");
}
2. Using the default encoding:
2.使用默认编码:
If you do not supply an Encoding
to StreamWriter
's constructor at all, StreamWriter
will by default use an UTF8 encoding without BOM, so the following should work just as well:
如果您根本不提供Encoding
toStreamWriter
的构造函数,StreamWriter
则默认情况下将使用不带 BOM 的 UTF8 编码,因此以下内容应该也能正常工作:
' VB.NET:
Using sink As New StreamWriter("Foobar.txt")
sink.WriteLine("...")
End Using
// C#:
using (var sink = new StreamWriter("Foobar.txt"))
{
sink.WriteLine("...");
}
Finally, note that omitting the BOM is only permissible for UTF-8, not for UTF-16.
最后,请注意省略 BOM 仅适用于 UTF-8,不适用于 UTF-16。
回答by Roman Nikitin
Try this:
尝试这个:
Encoding outputEnc = new UTF8Encoding(false); // create encoding with no BOM
TextWriter file = new StreamWriter(filePath, false, outputEnc); // open file with encoding
// write data here
file.Close(); // save and close it
回答by Joe.wang
Just Simply use the method WriteAllText
from System.IO.File
.
只需简单地使用的方法WriteAllText
从System.IO.File
。
Please check the sample from File.WriteAllText.
请检查File.WriteAllText 中的示例。
This method uses UTF-8 encoding without a Byte-Order Mark (BOM), so using the GetPreamble method will return an empty byte array. If it is necessary to include a UTF-8 identifier, such as a byte order mark, at the beginning of a file, use the WriteAllText(String, String, Encoding) method overload with UTF8 encoding.
此方法使用没有字节顺序标记 (BOM) 的 UTF-8 编码,因此使用 GetPreamble 方法将返回一个空字节数组。如果需要在文件开头包含 UTF-8 标识符(例如字节顺序标记),请使用 WriteAllText(String, String, Encoding) 方法重载和 UTF8 编码。
回答by JG in SD
If you do not specify an Encoding
when creating a new StreamWriter
the default Encoding
object used is UTF-8 No BOM
which is created via new UTF8Encoding(false, true)
.
如果Encoding
在创建新对象时未指定,则使用StreamWriter
的默认Encoding
对象UTF-8 No BOM
是通过new UTF8Encoding(false, true)
.
So to create a text file without the BOM use of of the constructors that do not require you to provide an encoding:
因此,要创建一个不使用不需要您提供编码的构造函数的 BOM 的文本文件:
new StreamWriter(Stream)
new StreamWriter(String)
new StreamWriter(String, Boolean)
回答by Tao
Interesting note with respect to this: strangely, the static "CreateText()" method of the System.IO.File class creates UTF-8 files withoutBOM.
关于这一点的有趣说明:奇怪的是,System.IO.File 类的静态“CreateText()”方法创建没有BOM 的UTF-8 文件。
In general this the source of bugs, but in your case it could have been the simplest workaround :)
一般来说,这是错误的来源,但在您的情况下,它可能是最简单的解决方法:)
回答by jos
I think Roman Nikitin is right. The meaning of the constructor argument is flipped. False means no BOM and true means with BOM.
我认为罗曼尼基廷是对的。构造函数参数的含义被颠倒了。False 表示没有 BOM,true 表示有 BOM。
You get an ANSI encoding because a file without a BOM that does not contain non-ansi characters is exactly the same as an ANSI file. Try some special characters in you "hi there" string and you'll see the ANSI encoding change to without-BOM.
您获得 ANSI 编码,因为没有 BOM 且不包含非 ansi 字符的文件与 ANSI 文件完全相同。在“hi there”字符串中尝试一些特殊字符,您将看到 ANSI 编码更改为 without-BOM。
回答by Jerry Banasik
XML Encoding UTF-8 without BOM
We need to submit XML data to the EPA and their application that takes our input requires UTF-8 without BOM. Oh yes, plain UTF-8 should be acceptable for everyone, but not for the EPA. The answer to doing this is in the above comments. Thank you Roman Nikitin.
没有 BOM 的 XML 编码 UTF-8
我们需要向 EPA 提交 XML 数据,他们接受我们输入的应用程序需要没有 BOM 的 UTF-8。哦,是的,普通的 UTF-8 应该为每个人都可以接受,但不适用于 EPA。这样做的答案在上面的评论中。谢谢罗马尼基京。
Here is a C# snippet of the code for the XML encoding:
这是 XML 编码代码的 C# 片段:
Encoding utf8noBOM = new UTF8Encoding(false);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = utf8noBOM;
…
using (XmlWriter xw = XmlWriter.Create(filePath, settings))
{
xDoc.WriteTo(xw);
xw.Flush();
}
To see if this actually removes the three leading character from the output file can be misleading. For example, if you use Notepad++(www.notepad-plus-plus.org), it will report “Encode in ANSI”. I guess most text editors are counting on the BOM characters to tell if it is UTF-8. The way to clearly see this is with a binary tool like WinHex(www.winhex.com). Since I was looking for a before and after difference I used the Microsoft WinDiffapplication.
查看这是否真的从输出文件中删除了三个前导字符可能会产生误导。例如,如果您使用Notepad++(www.notepad-plus-plus.org),它会报告“Encode in ANSI”。我猜大多数文本编辑器都依靠 BOM 字符来判断它是否是 UTF-8。清楚地看到这一点的方法是使用像WinHex(www.winhex.com)这样的二进制工具。由于我正在寻找前后差异,因此我使用了 Microsoft WinDiff应用程序。
回答by Mwenyeji
Dim sWriter As IO.StreamWriter = New IO.StreamWriter(shareworklist & "\" & getfilename() & ".txt", False, Encoding.Default)
Gives you results as those you want(I think).
给你你想要的结果(我认为)。
回答by Mwenyeji
It might be that your input text contains a byte order mark. In that case, you should remove it before writing.
可能是您的输入文本包含字节顺序标记。在这种情况下,您应该在写入之前将其删除。