windows windows记事本如何解释字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6769311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How windows notepad interpret characters
提问by nEAnnam
I was wondering how windows interpret characters, for instance:
我想知道 Windows 如何解释字符,例如:
I maked a file with an Hexeditor with the 3 bytes E3 81 81
.
Those bytes are the "ぁ"
character encoded as UTF-8.
我用一个带有 3 个字节的 Hexeditor 制作了一个文件E3 81 81
。这些字节是"ぁ"
编码为 UTF-8的字符。
I open the notepad and it displays "ぁ"
我打开记事本,它显示 "ぁ"
I don't specified the encoding of the file, i just created the bytes. and the notepad interpret it correctly.
我没有指定文件的编码,我只是创建了字节。和记事本正确解释它。
Is the notepad guessing what encoding probably is? or is the Hex editor saving those bytes with a specific encoding.
记事本是否在猜测可能是什么编码?或者是十六进制编辑器使用特定编码保存这些字节。
采纳答案by Guffa
If the file only contains these three bytes, then there is no information at all about which encoding to use.
如果文件只包含这三个字节,则根本没有关于使用哪种编码的信息。
A byte is just a byte, and there is no way to include any encoding information in it. Besides, the hex editor doesn't even know that you intended to decode the data as text.
一个字节只是一个字节,没有办法在其中包含任何编码信息。此外,十六进制编辑器甚至不知道您打算将数据解码为文本。
Notepad normally uses ANSI encoding, so if it reads the file as UTF-8 then it has to guess the encoding based on the data in the file.
记事本通常使用 ANSI 编码,因此如果它以 UTF-8 格式读取文件,则必须根据文件中的数据猜测编码。
If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BF
at the beginning of the file.
如果将文件保存为 UTF-8,记事本会将 BOM(字节顺序标记)EF BB BF
放在文件的开头。
回答by Roland Illig
Notepad makes an educated guess. I don't know the details, but loading the first few kilobytes and trying to convert them from UTF-8 is very simple, so it probably does something similar to that.
记事本做出有根据的猜测。我不知道细节,但加载前几千字节并尝试从 UTF-8 转换它们非常简单,所以它可能会做类似的事情。
回答by PhilHibbs
...and sometimes it gets it wrong... https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/
...有时它会出错... https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/
回答by mat2
There is an easy and efficient way to check whether a file is in UTF-8. See Wikipedia: http://en.wikipedia.org/w/index.php?title=UTF-8&oldid=581360767#Advantages, fourth bullet point. Notepad probably uses this.
有一种简单有效的方法可以检查文件是否为 UTF-8。参见维基百科:http: //en.wikipedia.org/w/index.php?title= UTF-8&oldid= 581360767#Advantages,第四个要点。记事本可能使用这个。
Wikipedia claims that Notepad used the IsTextUnicode function, which checks whether a patricular text is written in UTF-16 (it may have stopped using it in Windows Vista, which fixed the "Bush hid the facts" bug): http://en.wikipedia.org/wiki/Bush_hid_the_facts.
维基百科声称记事本使用了 IsTextUnicode 函数,该函数检查特定文本是否以 UTF-16 编写(它可能已停止在 Windows Vista 中使用它,从而修复了“布什隐藏事实”错误):http://en。 wikipedia.org/wiki/Bush_hid_the_facts。
回答by sai
how to identify the file is in which encoding ....?
如何识别文件是哪种编码....?
go to the file and try to ( save AS) and you can defaultly see the encoding of the file.(By which Encoding format it is saved)
转到文件并尝试(另存为),您可以默认看到文件的编码。(以哪种编码格式保存)