macos 文件名 os x 中的不同 utf8 编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6153345/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 08:05:35  来源:igfitidea点击:

Different utf8 encoding in filenames os x

macosencodingutf-8filesystems

提问by jm666

I have a small shellscript in .x

我有一个小的shellscript .x

$ cat .x
u="B?hmáí"
touch "$u"
ls > .list
echo "$u" >.text

cat .list .text
diff .list .text
od -bc .list
od -bc .text

When i run this scrpit sh -x .x(-x only for showing commands)

当我运行此 scrpit 时sh -x .x(-x 仅用于显示命令)

$ sh -x .x
+ u=B?hmáí
+ touch B?hmáí
+ ls
+ echo B?hmáí
+ cat .list .text
B?hmáí
B?hmáí
+ diff .list .text
1c1
< B?hmáí
---
> B?hmáí
+ od -bc .list
0000000   102 157 314 210 150 155 141 314 201 151 314 201 012            
           B   o   ?    **   h   m   a   ?    **   i   ?    **  \n            
0000015
+ od -bc .text
0000000   102 303 266 150 155 303 241 303 255 012                        
           B   ?  **   h   m   á  **   í  **  \n                        
0000012

The same string B?hmáíhas encoded into different bytes in the filename vs as a content of a file. In the terminal (utf8-encoded) the string looks samein both variants.

相同的字符串B?hmáí在文件名中编码为不同的字节,而不是作为文件的内容。在终端(utf8 编码)中looks same,两种变体的字符串。

Where is the rabbit?

兔子在哪里?

回答by Gordon Davisson

(This is mostly stolen from a previous answer of mine...)

(这主要是从我以前的答案中窃取......)

Unicode allows some accented characters to be represented in several different ways: as a "code point" representing the accented character, or as a series of code points representing the unaccented version of the character, followed by the accent(s). For example, "?" could be represented either precomposed as U+00E4 (UTF-8 0xc3a4, Latin small letter 1 with diaeresis) or decomposed as U+0061 U+0308 (UTF-8 0x61cc88, Latin small letter a + combining diaeresis).

Unicode 允许以几种不同的方式表示某些重音字符:作为表示重音字符的“代码点”,或作为表示字符的非重音版本的一系列代码点,后跟重音符号。例如, ”?” 可以表示为 U+00E4(UTF-8 0xc3a4,带分音符的拉丁小写字母 1)或分解为 U+0061 U+0308(UTF-8 0x61cc88,拉丁小写字母 a + 组合分音符)。

OS X's HFS+ filesystem requires that all filenames be stored in the UTF-8 representation of their fully decomposed form. In an HFS+ filename, "?" MUST be encoded as 0x61cc88, and "?" MUST be encoded as 0x6fcc88.

OS X 的 HFS+ 文件系统要求所有文件名都以其完全分解形式的 UTF-8 表示形式存储。在 HFS+ 文件名中,“?” 必须编码为 0x61cc88 和“?” 必须编码为 0x6fcc88。

So what's happening here is that your shell script contains "B?hmáí" in precomposed form, so it gets stored that way in the variable a, and stored that way in the .text file. But when you create a file with that name (with touch), the filesystem converts it to the decomposed form for the actual filename. And when you lsit, it shows the form the filesystem has: the decomposed form.

所以这里发生的事情是你的 shell 脚本包含预组合形式的“B?hmáí”,所以它以这种方式存储在变量中a,并以这种方式存储在 .text 文件中。但是,当您使用该名称(带有touch)创建文件时,文件系统会将其转换为实际文件名的分解形式。当你使用ls它时,它会显示文件系统的形式:分解形式。