git, msysgit, 口音, utf-8, 最终答案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5854967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
git, msysgit, accents, utf-8, the definitive answers
提问by Benjol
I've read in some places that there are problems with git (or just msysgit?) and character encoding - I believeit's only a problem in file names.
我在某些地方读到 git(或只是 msysgit?)和字符编码存在问题- 我相信这只是文件名的问题。
What I'd like is some 'definitive' (or at least authoritative) information about:
我想要的是关于以下方面的一些“确定性”(或至少是权威性的)信息:
- What exactly are the 'problems'? (The symptoms)
- What are the causes? (Briefly)
- In what scenarios is this a show stopper?
- Is there any resolution in sight, or failing that any workarounds?
- 究竟是什么“问题”?(症状)
- 原因是什么?(简要地)
- 在什么情况下这是一个节目塞子?
- 是否有任何解决方案,或者没有任何解决方法?
I hope this question isn't too vague, I think it would be good to have all of this information in one place to be able to point people to it...
我希望这个问题不是太模糊,我认为将所有这些信息放在一个地方以便能够将人们指向它会很好......
采纳答案by VonC
Update Feb. 2017 (Git 2.12): The character width table has been updated to match Unicode 9.0.
The update_unicode.sh
is moved it into contrib/update-unicode
: see its README.
2017 年 2 月更新(Git 2.12):字符宽度表已更新以匹配Unicode 9.0。
在update_unicode.sh
被转移成contrib/update-unicode
:看其自述。
Update August 2014 (git 2.1): commit a67c821(Torsten B?gershausen (tboegi)) adds support for Unicode 7.0.
2014 年 8 月更新 (git 2.1):commit a67c821(Torsten B?gershausen (tboegi))添加了对 Unicode 7.0 的支持。
Update April 2014: commit d813ab9(Torsten B?gershausen (tboegi)) adds support for Unicode 6.3
(git 1.9.2):
2014 年 4 月更新:commit d813ab9(Torsten B?gershausen (tboegi))添加了对 Unicode 6.3
(git 1.9.2) 的支持:
Unicode 6.3 defines more code points as combining or accents.
For example, the character "?
" could be expressed as an "o
" followed byU+0308 COMBINING DIARESIS
(aka umlaut, double-dot-above).
We should consider that such a sequence of two codepoints occupies one display column for the alignment purposes, and for that,git_wcwidth()
should return 0 for them.Affected codepoints are:
Unicode 6.3 将更多代码点定义为组合或重音。
例如,字符“?
”可以表示为“o
”后跟U+0308 COMBINING DIARESIS
(又名变音符号,双点上方)。
我们应该考虑到,为了对齐目的,这样一个由两个码点组成的序列占据了一个显示列,为此,git_wcwidth()
应该为它们返回 0。受影响的代码点是:
U+0358..U+035C
U+0487
U+05A2, U+05BA, U+05C5, U+05C7
U+0604, U+0616..U+061A, U+0659..U+065F
Earlier unicode standards had defined these as "reserved".
Only the range
0..U+07FF
has been checked to see which codepoints need to be marked as 0-width while preparing for this commit; more updates may be needed.
早期的 unicode 标准将这些定义为“保留”。
0..U+07FF
在准备这次提交时,只检查了范围以查看哪些代码点需要标记为 0-width;可能需要更多更新。
Update April 2012: Unicode support is released in version 1.7.10. See this pagefor notes and settings you should set.
2012 年 4 月更新:Unicode 支持在 1.7.10 版中发布。有关您应该设置的注意事项和设置,请参阅此页面。
Namely:
即:
git config [--global] core.quotepath off
git config [--global] i18n.logoutputencoding utf8
git config [--global] i18n.commitencoding utf8
git config [--global] --unset svn.pathnameencoding
The recodetree check
command scans the entire history of a git repository and prints all non-ASCII file names. If the output is empty, no migration is necessary.
该recodetree check
命令扫描 git 存储库的整个历史记录并打印所有非 ASCII 文件名。如果输出为空,则不需要迁移。
Update February 2012: patches for UTF-8 supports are comming in branch 'devel' of msysgit repo on GitHub, including Update less settings for UTF-8 .
2012 年 2 月更新:用于 UTF-8 支持的补丁正在GitHub 上的msysgit 存储库的分支“开发”中出现,包括更新较少的 UTF-8 设置。
The Git for Windows Google+ page mentions:
Git for Windows Google+ 页面提到:
Karsten Blees' UTF-8 patches for Git for Windows has now been merged to '
devel
'.
This means the upcoming release will support Unicode filenames!
Karsten Blees 的适用于 Windows 的 Git 的 UTF-8 补丁现已合并到“
devel
”。
这意味着即将发布的版本将支持 Unicode 文件名!
May 2011
2011 年 5 月
I believe the msysgit issue 80has the latest on that bug.
Also described in issue 376.
我相信msysgit 问题 80有关于该错误的最新消息。
在issue 376 中也有描述。
For example:
例如:
This is what happens:
git on Windows operates on file names and treats them essentially as byte streams. In your case, the streams happen to be UTF8 encoded text.
git on Windows asks the runtime to create a file, and passes it the byte stream.
Since internally on Windows everything is Unicode, the runtime converts the byte stream to UTF16 using the currently set locale (aka "codepage").
That is, it effectively interprets the byte stream as CP949 (Korean) encoded text.
Apparently, some of the UTF8 byte sequences are invalid CP949 sequences, and the conversion fails ("Invalid argument"); or if the UTF8 sequences happen to be correct CP949 sequences, the result is (most likely) a different character.
这是发生的事情:
Windows 上的 git 对文件名进行操作,并将它们本质上视为字节流。在您的情况下,流恰好是 UTF8 编码的文本。
Windows 上的 git 要求运行时创建一个文件,并将字节流传递给它。
由于在 Windows 内部,一切都是 Unicode,因此运行时使用当前设置的语言环境(又名“代码页”)将字节流转换为 UTF16。
也就是说,它有效地将字节流解释为 CP949(韩语)编码文本。
显然,一些UTF8字节序列是无效的CP949序列,转换失败(“Invalid argument”);或者如果 UTF8 序列恰好是正确的 CP949 序列,则结果(很可能)是不同的字符。
The true fix should be on MingW though:
真正的修复 虽然应该在 MingW 上:
It occurs to me that one solution would be this: solve it at the GCC C run-time library level.
That is, for the mingw GCC run-time library on Windows, make it possible via build-time options to be in a mode where the command-line parameters (passed tomain()
) and file I/O functions use the underlying Windows Unicode API calls, and translate to/from UTF-8 encoding in C's standard function APIs that use byte-strings.
That would "just work" for git perhaps, and could be useful for other Linux-originated open source projects running the Windows environment.
我突然想到一个解决方案是:在 GCC C 运行时库级别解决它。
也就是说,对于 Windows 上的 mingw GCC 运行时库,可以通过构建时选项处于命令行参数(传递给main()
)和文件 I/O 函数使用底层 Windows Unicode API 调用的模式,并在使用字节字符串的 C 标准函数 API 中转换为/从 UTF-8 编码。
这可能对 git 来说“只是工作”,并且可能对运行 Windows 环境的其他源自 Linux 的开源项目有用。
ak2comments that MingWisn't the right place for this fix:
"MinGW compilers provide access to the functionality of the Microsoft C runtime and some language-specific runtimes.
MinGW, being Minimalist, does not, and never will, attempt to provide a POSIX runtime environment for POSIX application deployment on MS-Windows.
If you want POSIX application deployment on this platform, please consider Cygwin instead."
“MinGW 编译器提供了对 Microsoft C 运行时和一些特定于语言的运行时功能的访问
。MinGW 是极简主义者,不会也永远不会尝试为 MS-Windows 上的 POSIX 应用程序部署提供 POSIX 运行时环境。
如果您想要在此平台上部署 POSIX 应用程序,请考虑使用 Cygwin。”
There is some work in progress on a msysgit variant to support unicode.
在msysgit 变体上有一些工作正在进行中以支持 unicode。