c# 和 Encoding.ASCII.GetString

Question

提问by iasksillyquestions

byte[] header = new byte[]{255, 216}; 

string ascii =  Encoding.ASCII.GetString(header);

I expect ASCII to be equal to be FFD8 (JPEG SOI marker)

我希望 ASCII 等于 FFD8（JPEG SOI 标记）

Instead I get "????"

相反，我得到“？？？？”

Answer 1

采纳答案by Joe

In this case you'd be better to compare the byte arrays rather than converting to string.

在这种情况下，您最好比较字节数组而不是转换为字符串。

If you must convert to string, I suggest using the encoding Latin-1 aka ISO-8859-1 aka Code Page 28591 encoding, as this encoding will map all bytes with hex values are in the range 0-255 to the Unicode character with the same hex value - convenient for this scenario. Any of the following will get this encoding:

如果您必须转换为字符串，我建议使用编码 Latin-1 aka ISO-8859-1 aka Code Page 28591 编码，因为这种编码会将所有十六进制值在 0-255 范围内的字节映射到 Unicode 字符相同的十六进制值 - 方便这种情况。以下任何一项都将获得此编码：

Encoding.GetEncoding(28591)
Encoding.GetEncoding("Latin1")
Encoding.GetEncoding("ISO-8859-1")

Answer 2

回答by Jon Skeet

Yes, that's because ASCIIis only 7-bit - it doesn't define any values above 127. Encodings typically decode unknown binary values to '?' (although this can be changed using DecoderFallback).

是的，这是因为ASCII只有 7 位 - 它没有定义任何大于 127 的值。编码通常将未知的二进制值解码为“？” （虽然这可以使用DecoderFallback进行更改）。

If you're about to mention "extended ASCII" I suspect you actually want Encoding.Defaultwhich is "the default code page for the operating system"... code page 1252on most Western systems, I believe.

如果您要提及“扩展 ASCII”，我怀疑您实际上想要Encoding.Default哪个是“操作系统的默认代码页”……我相信大多数西方系统上的代码页 1252。

What characters were you expecting?

你期待什么角色？

EDIT: As per the accepted answer (I suspect the question was edited after I added my answer; I don't recall seeing anything about JPEG originally) you shouldn't convert binary data to text unless it's genuinely encoded text data. JPEG data is binarydata - so you should be checking the actual bytes against the expected bytes.

编辑：根据接受的答案（我怀疑在我添加答案后对问题进行了编辑；我不记得最初看到任何关于 JPEG 的内容）您不应该将二进制数据转换为文本，除非它是真正编码的文本数据。JPEG 数据是二进制数据 - 因此您应该根据预期字节检查实际字节。

Any time you convert arbitrary binary data (such as images, music or video) into text using a "plain" text encoding (such as ASCII, UTF-8 etc) you risk data loss. If you haveto convert it to text, use Base64 which is nice and safe. If you just want to compare it with expected binary data, however, it's best not to convert it to text at all.

任何时候您使用“纯”文本编码（例如 ASCII、UTF-8 等）将任意二进制数据（例如图像、音乐或视频）转换为文本时，您都有数据丢失的风险。如果您必须将其转换为文本，请使用 Base64，它既好又安全。但是，如果您只想将其与预期的二进制数据进行比较，最好根本不要将其转换为文本。

EDIT: Okay, here's a class to help image detection method for a given byte array. I haven't made it HTTP-specific; I'm not entirely sure whether you should really fetch the InputStream, read just a bit of it, and then fetch the stream again. I've ducked the issue by sticking to byte arrays :)

编辑：好的，这是一个帮助给定字节数组的图像检测方法的类。我还没有使它特定于 HTTP；我不完全确定您是否真的应该获取InputStream，只读取一点，然后再次获取流。我通过坚持字节数组来回避这个问题:)

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;

public sealed class SignatureDetector
{
    public static readonly SignatureDetector Png =
        new SignatureDetector(0x89, 0x50, 0x4e, 0x47);

    public static readonly SignatureDetector Bmp =
        new SignatureDetector(0x42, 0x4d);

    public static readonly SignatureDetector Gif =
        new SignatureDetector(0x47, 0x49, 0x46);

    public static readonly SignatureDetector Jpeg =
        new SignatureDetector(0xff, 0xd8);

    public static readonly IEnumerable<SignatureDetector> Images =
        new ReadOnlyCollection<SignatureDetector>(new[]{Png, Bmp, Gif, Jpeg});

    private readonly byte[] bytes;

    public SignatureDetector(params byte[] bytes)
    {
        if (bytes == null)
        {
            throw new ArgumentNullException("bytes");
        }
        this.bytes = (byte[]) bytes.Clone();
    }

    public bool Matches(byte[] data)
    {
        if (data == null)
        {
            throw new ArgumentNullException("data");
        }
        if (data.Length < bytes.Length)
        {
            return false;
        }
        for (int i=0; i < bytes.Length; i++)
        {
            if (data[i] != bytes[i])
            {
                return false;
            }
        }
        return true;
    }    

    // Convenience method
    public static bool IsImage(byte[] data)
    {
        return Images.Any(detector => detector.Matches(data));
    }        
}

Answer 3

回答by Philippe Leybaert

Are you sure "????" is the result?

你确定吗 ”？？？？” 结果是？

What is the result of:

结果是什么：

(int)ascii[0]
(int)ascii[1]

On the other hand, pure ASCII is 0-127 only...

另一方面，纯 ASCII 只有 0-127 ......

Answer 4

回答by James Curran

If you then wrote:

如果你当时写道：

Console.WriteLine(ascii)

And expected "FFD8" to print out, that's not the way GetString work. For that, you would need:

并期望打印出“FFD8”，这不是 GetString 的工作方式。为此，您需要：

 string ascii = String.Format("{0:X02}{1:X02}", header[0], header[1]);

Answer 5

回答by Joshua

I once wrote a custom encoder/decoder that encoded bytes 0-255 to unicode characters 0-255 and back again.

我曾经编写了一个自定义编码器/解码器，将字节 0-255 编码为 unicode 字符 0-255，然后再返回。

It was only really useful for using string functions on something that isn't actually a string.

它只对在实际上不是字符串的东西上使用字符串函数才真正有用。

c# 和 Encoding.ASCII.GetString

提问by iasksillyquestions

采纳答案by Joe

回答by Jon Skeet

回答by Philippe Leybaert

回答by James Curran

回答by Joshua

相关推荐

最近更新

标签

c# 和 Encoding.ASCII.GetString

提问by iasksillyquestions

采纳答案by Joe

回答by Jon Skeet

回答by Philippe Leybaert

回答by James Curran

回答by Joshua

相关推荐

Linux 在 sed 中插入选项卡的正确方法是什么？

Linux 处理多个 SIGCHLD

Linux /usr/bin/ld: 找不到 -lpython2.7

如何在 Linux 中判断哪个进程向我的进程发送了信号

相关推荐

最近更新

标签