C# 从 File.ReadAllBytes (byte[]) 中删除字节顺序标记

Question

提问by JC Grubbs

I have an HTTPHandler that is reading in a set of CSS files and combining them and then GZipping them. However, some of the CSS files contain a Byte Order Mark (due to a bug in TFS 2005 auto merge) and in FireFox the BOM is being read as part of the actual content so it's screwing up my class names etc. How can I strip out the BOM characters? Is there an easy way to do this without manually going through the byte array looking for "???"?

我有一个 HTTPHandler 正在读取一组 CSS 文件并将它们组合起来，然后对它们进行 GZipping。但是，一些 CSS 文件包含一个字节顺序标记（由于 TFS 2005 自动合并中的错误），并且在 FireFox 中，BOM 被作为实际内容的一部分读取，因此它搞砸了我的类名等。我该如何剥离出 BOM 字符？有没有一种简单的方法来做到这一点，而无需手动通过字节数组寻找“???”？

Answer 1

回答by JaredPar

Expanding on Jon's commentwith a sample.

使用示例扩展Jon 的评论。

var name = GetFileName();
var bytes = System.IO.File.ReadAllBytes(name);
System.IO.File.WriteAllBytes(name, bytes.Skip(3).ToArray());

Answer 2

回答by Tim Bailey

Another way, assuming UTF-8 to ASCII.

另一种方式，假设 UTF-8 为 ASCII。

File.WriteAllText(filename, File.ReadAllText(filename, Encoding.UTF8), Encoding.ASCII);

Answer 3

回答by Tim Bailey

var text = File.ReadAllText(args.SourceFileName);
var streamWriter = new StreamWriter(args.DestFileName, args.Append, new UTF8Encoding(false));
streamWriter.Write(text);
streamWriter.Close();

Answer 4

回答by Olivier de Rivoyre

Expanding JaredPar sample to recurse over sub-directories:

扩展 JaredPar 示例以递归子目录：

using System.Linq;
using System.IO;
namespace BomRemover
{
    /// <summary>
    /// Remove UTF-8 BOM (EF BB BF) of all *.php files in current & sub-directories.
    /// </summary>
    class Program
    {
        private static void removeBoms(string filePattern, string directory)
        {
            foreach (string filename in Directory.GetFiles(directory, file  Pattern))
            {
                var bytes = System.IO.File.ReadAllBytes(filename);
                if(bytes.Length > 2 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF)
                {
                    System.IO.File.WriteAllBytes(filename, bytes.Skip(3).ToArray()); 
                }
            }
            foreach (string subDirectory in Directory.GetDirectories(directory))
            {
                removeBoms(filePattern, subDirectory);
            }
        }
        static void Main(string[] args)
        {
            string filePattern = "*.php";
            string startDirectory = Directory.GetCurrentDirectory();
            removeBoms(filePattern, startDirectory);            
        }       
    }
}

I had need that C# piece of code after discovering that the UTF-8 BOM corrupts file when you try to do a basic PHP download file.

在您尝试执行基本的 PHP 下载文件时发现 UTF-8 BOM 损坏文件后，我需要该 C# 代码。

Answer 5

回答by Ashokan Sivapragasam

For larger file, use the following code; memory efficient!

对于较大的文件，请使用以下代码；内存效率！

StreamReader sr = new StreamReader(path: @"<Input_file_full_path_with_byte_order_mark>", 
                    detectEncodingFromByteOrderMarks: true);

StreamWriter sw = new StreamWriter(path: @"<Output_file_without_byte_order_mark>", 
                    append: false, 
                    encoding: new UnicodeEncoding(bigEndian: false, byteOrderMark: false));

var lineNumber = 0;
while (!sr.EndOfStream)
{
    sw.WriteLine(sr.ReadLine());
    lineNumber += 1;
    if (lineNumber % 100000 == 0)
        Console.Write("\rLine# " + lineNumber.ToString("000000000000"));
}

sw.Flush();
sw.Close();

C# 从 File.ReadAllBytes (byte[]) 中删除字节顺序标记

提问by JC Grubbs

回答by JaredPar

回答by Tim Bailey

回答by Tim Bailey

回答by Olivier de Rivoyre

回答by Ashokan Sivapragasam

相关推荐

最近更新

标签

C# 从 File.ReadAllBytes (byte[]) 中删除字节顺序标记

提问by JC Grubbs

回答by JaredPar

回答by Tim Bailey

回答by Tim Bailey

回答by Olivier de Rivoyre

回答by Ashokan Sivapragasam

相关推荐

C# 启动时将程序放入系统托盘

在 C# 2.0 中使用 Console.Write 在同一位置写入字符串

C# 构建十六进制表示法字符串

在 C# 中迭代​​泛型列表

相关推荐

最近更新

标签

在 C# 中迭代泛型列表