windows windows记事本如何解释字符

Question

提问by nEAnnam

I was wondering how windows interpret characters, for instance:

我想知道 Windows 如何解释字符，例如：

I maked a file with an Hexeditor with the 3 bytes E3 81 81. Those bytes are the "ぁ"character encoded as UTF-8.

我用一个带有 3 个字节的 Hexeditor 制作了一个文件E3 81 81。这些字节是"ぁ"编码为 UTF-8的字符。

I open the notepad and it displays "ぁ"

我打开记事本，它显示 "ぁ"

I don't specified the encoding of the file, i just created the bytes. and the notepad interpret it correctly.

我没有指定文件的编码，我只是创建了字节。和记事本正确解释它。

Is the notepad guessing what encoding probably is? or is the Hex editor saving those bytes with a specific encoding.

记事本是否在猜测可能是什么编码？或者是十六进制编辑器使用特定编码保存这些字节。

Answer 1

采纳答案by Guffa

If the file only contains these three bytes, then there is no information at all about which encoding to use.

如果文件只包含这三个字节，则根本没有关于使用哪种编码的信息。

A byte is just a byte, and there is no way to include any encoding information in it. Besides, the hex editor doesn't even know that you intended to decode the data as text.

一个字节只是一个字节，没有办法在其中包含任何编码信息。此外，十六进制编辑器甚至不知道您打算将数据解码为文本。

Notepad normally uses ANSI encoding, so if it reads the file as UTF-8 then it has to guess the encoding based on the data in the file.

记事本通常使用 ANSI 编码，因此如果它以 UTF-8 格式读取文件，则必须根据文件中的数据猜测编码。

If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BFat the beginning of the file.

如果将文件保存为 UTF-8，记事本会将 BOM（字节顺序标记）EF BB BF放在文件的开头。

Answer 2

回答by Roland Illig

Notepad makes an educated guess. I don't know the details, but loading the first few kilobytes and trying to convert them from UTF-8 is very simple, so it probably does something similar to that.

记事本做出有根据的猜测。我不知道细节，但加载前几千字节并尝试从 UTF-8 转换它们非常简单，所以它可能会做类似的事情。

Answer 3

回答by PhilHibbs

...and sometimes it gets it wrong... https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/

...有时它会出错... https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/

Answer 4

回答by mat2

There is an easy and efficient way to check whether a file is in UTF-8. See Wikipedia: http://en.wikipedia.org/w/index.php?title=UTF-8&oldid=581360767#Advantages, fourth bullet point. Notepad probably uses this.

有一种简单有效的方法可以检查文件是否为 UTF-8。参见维基百科：http: //en.wikipedia.org/w/index.php?title= UTF-8&oldid= 581360767#Advantages，第四个要点。记事本可能使用这个。

Wikipedia claims that Notepad used the IsTextUnicode function, which checks whether a patricular text is written in UTF-16 (it may have stopped using it in Windows Vista, which fixed the "Bush hid the facts" bug): http://en.wikipedia.org/wiki/Bush_hid_the_facts.

维基百科声称记事本使用了 IsTextUnicode 函数，该函数检查特定文本是否以 UTF-16 编写（它可能已停止在 Windows Vista 中使用它，从而修复了“布什隐藏事实”错误）：http://en。 wikipedia.org/wiki/Bush_hid_the_facts。

Answer 5

回答by sai

how to identify the file is in which encoding ....?

如何识别文件是哪种编码....？

go to the file and try to ( save AS) and you can defaultly see the encoding of the file.(By which Encoding format it is saved)

转到文件并尝试（另存为），您可以默认看到文件的编码。（以哪种编码格式保存）

windows windows记事本如何解释字符

提问by nEAnnam

采纳答案by Guffa

回答by Roland Illig

回答by PhilHibbs

回答by mat2

回答by sai

相关推荐

最近更新

标签

windows windows记事本如何解释字符

提问by nEAnnam

采纳答案by Guffa

回答by Roland Illig

回答by PhilHibbs

回答by mat2

回答by sai

相关推荐

windows 批处理文件：将带有空格的参数传递给函数

windows 如何制作一个可以安装的软件？

如何使用 CMake 生成 Windows DLL 版本信息

windows 重命名正在运行的可执行 (exe) 文件

相关推荐

最近更新

标签