C# 从 File.ReadAllBytes (byte[]) 中删除字节顺序标记
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/288111/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove Byte Order Mark from a File.ReadAllBytes (byte[])
提问by JC Grubbs
I have an HTTPHandler that is reading in a set of CSS files and combining them and then GZipping them. However, some of the CSS files contain a Byte Order Mark (due to a bug in TFS 2005 auto merge) and in FireFox the BOM is being read as part of the actual content so it's screwing up my class names etc. How can I strip out the BOM characters? Is there an easy way to do this without manually going through the byte array looking for "???"?
我有一个 HTTPHandler 正在读取一组 CSS 文件并将它们组合起来,然后对它们进行 GZipping。但是,一些 CSS 文件包含一个字节顺序标记(由于 TFS 2005 自动合并中的错误),并且在 FireFox 中,BOM 被作为实际内容的一部分读取,因此它搞砸了我的类名等。我该如何剥离出 BOM 字符?有没有一种简单的方法来做到这一点,而无需手动通过字节数组寻找“???”?
回答by JaredPar
Expanding on Jon's commentwith a sample.
使用示例扩展Jon 的评论。
var name = GetFileName();
var bytes = System.IO.File.ReadAllBytes(name);
System.IO.File.WriteAllBytes(name, bytes.Skip(3).ToArray());
回答by Tim Bailey
Another way, assuming UTF-8 to ASCII.
另一种方式,假设 UTF-8 为 ASCII。
File.WriteAllText(filename, File.ReadAllText(filename, Encoding.UTF8), Encoding.ASCII);
回答by Tim Bailey
var text = File.ReadAllText(args.SourceFileName);
var streamWriter = new StreamWriter(args.DestFileName, args.Append, new UTF8Encoding(false));
streamWriter.Write(text);
streamWriter.Close();
回答by Olivier de Rivoyre
Expanding JaredPar sample to recurse over sub-directories:
扩展 JaredPar 示例以递归子目录:
using System.Linq;
using System.IO;
namespace BomRemover
{
/// <summary>
/// Remove UTF-8 BOM (EF BB BF) of all *.php files in current & sub-directories.
/// </summary>
class Program
{
private static void removeBoms(string filePattern, string directory)
{
foreach (string filename in Directory.GetFiles(directory, file Pattern))
{
var bytes = System.IO.File.ReadAllBytes(filename);
if(bytes.Length > 2 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF)
{
System.IO.File.WriteAllBytes(filename, bytes.Skip(3).ToArray());
}
}
foreach (string subDirectory in Directory.GetDirectories(directory))
{
removeBoms(filePattern, subDirectory);
}
}
static void Main(string[] args)
{
string filePattern = "*.php";
string startDirectory = Directory.GetCurrentDirectory();
removeBoms(filePattern, startDirectory);
}
}
}
I had need that C# piece of code after discovering that the UTF-8 BOM corrupts file when you try to do a basic PHP download file.
在您尝试执行基本的 PHP 下载文件时发现 UTF-8 BOM 损坏文件后,我需要该 C# 代码。
回答by Ashokan Sivapragasam
For larger file, use the following code; memory efficient!
对于较大的文件,请使用以下代码;内存效率!
StreamReader sr = new StreamReader(path: @"<Input_file_full_path_with_byte_order_mark>",
detectEncodingFromByteOrderMarks: true);
StreamWriter sw = new StreamWriter(path: @"<Output_file_without_byte_order_mark>",
append: false,
encoding: new UnicodeEncoding(bigEndian: false, byteOrderMark: false));
var lineNumber = 0;
while (!sr.EndOfStream)
{
sw.WriteLine(sr.ReadLine());
lineNumber += 1;
if (lineNumber % 100000 == 0)
Console.Write("\rLine# " + lineNumber.ToString("000000000000"));
}
sw.Flush();
sw.Close();