windows windows记事本如何解释字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6769311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 17:30:55  来源:igfitidea点击:

How windows notepad interpret characters

windowsencodingutf-8notepadhex-editors

提问by nEAnnam

I was wondering how windows interpret characters, for instance:

我想知道 Windows 如何解释字符,例如:

I maked a file with an Hexeditor with the 3 bytes E3 81 81. Those bytes are the "ぁ"character encoded as UTF-8.

我用一个带有 3 个字节的 Hexeditor 制作了一个文件E3 81 81。这些字节是"ぁ"编码为 UTF-8的字符。

I open the notepad and it displays "ぁ"

我打开记事本,它显示 "ぁ"

I don't specified the encoding of the file, i just created the bytes. and the notepad interpret it correctly.

我没有指定文件的编码,我只是创建了字节。和记事本正确解释它。

Is the notepad guessing what encoding probably is? or is the Hex editor saving those bytes with a specific encoding.

记事本是否在猜测可能是什么编码?或者是十六进制编辑器使用特定编码保存这些字节。

采纳答案by Guffa

If the file only contains these three bytes, then there is no information at all about which encoding to use.

如果文件只包含这三个字节,则根本没有关于使用哪种编码的信息。

A byte is just a byte, and there is no way to include any encoding information in it. Besides, the hex editor doesn't even know that you intended to decode the data as text.

一个字节只是一个字节,没有办法在其中包含任何编码信息。此外,十六进制编辑器甚至不知道您打算将数据解码为文本。

Notepad normally uses ANSI encoding, so if it reads the file as UTF-8 then it has to guess the encoding based on the data in the file.

记事本通常使用 ANSI 编码,因此如果它以 UTF-8 格式读取文件,则必须根据文件中的数据猜测编码。

If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BFat the beginning of the file.

如果将文件保存为 UTF-8,记事本会将 BOM(字节顺序标记)EF BB BF放在文件的开头。

回答by Roland Illig

Notepad makes an educated guess. I don't know the details, but loading the first few kilobytes and trying to convert them from UTF-8 is very simple, so it probably does something similar to that.

记事本做出有根据的猜测。我不知道细节,但加载前几千字节并尝试从 UTF-8 转换它们非常简单,所以它可能会做类似的事情。

回答by mat2

There is an easy and efficient way to check whether a file is in UTF-8. See Wikipedia: http://en.wikipedia.org/w/index.php?title=UTF-8&oldid=581360767#Advantages, fourth bullet point. Notepad probably uses this.

有一种简单有效的方法可以检查文件是否为 UTF-8。参见维基百科:http: //en.wikipedia.org/w/index.php?title= UTF-8&oldid= 581360767#Advantages,第四个要点。记事本可能使用这个。

Wikipedia claims that Notepad used the IsTextUnicode function, which checks whether a patricular text is written in UTF-16 (it may have stopped using it in Windows Vista, which fixed the "Bush hid the facts" bug): http://en.wikipedia.org/wiki/Bush_hid_the_facts.

维基百科声称记事本使用了 IsTextUnicode 函数,该函数检查特定文本是否以 UTF-16 编写(它可能已停止在 Windows Vista 中使用它,从而修复了“布什隐藏事实”错误):http://en。 wikipedia.org/wiki/Bush_hid_the_facts

回答by sai

how to identify the file is in which encoding ....?

如何识别文件是哪种编码....?

go to the file and try to ( save AS) and you can defaultly see the encoding of the file.(By which Encoding format it is saved)

转到文件并尝试(另存为),您可以默认看到文件的编码。(以哪种编码格式保存)