windows 带有 git-bash 的 Unicode (utf-8)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10651975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 09:31:41  来源:igfitidea点击:

Unicode (utf-8) with git-bash

windowsbashunicodeutf-8git-bash

提问by Hannes

I'm having some trouble getting unicode to work for git-bash (on windows 7). I have tried many things without success. Although, I'm not quite sure what is responsible to for this so i might be working in the wrong direction.

我在让 unicode 为 git-bash 工作时遇到了一些麻烦(在 Windows 7 上)。我尝试了很多事情都没有成功。虽然,我不太确定是什么原因造成的,所以我可能在错误的方向上工作。

It really seems this should be possible as the encoding for cmd.exe can be changed to unicode with 'chcp 65001'.

看起来这应该是可能的,因为 cmd.exe 的编码可以使用“chcp 65001”更改为 unicode。

Here are some things I've tried (besides the obvious of looking through the configuration options in the GUI).

以下是我尝试过的一些事情(除了在 GUI 中查看配置选项的明显方法)。

  1. Setting environment variables in '.bashrc'. I guess it makes sense this doesn't work since i think it's a linux thing. The 'locale' command does not exist.

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8
    export LANGUAGE=en_US.UTF-8
    
  2. Starting out in cmd.exe, changing the encoding to unicode with 'chcp 65001' and then starting up git-bash. This causes me to get a permission denied when trying to cat my unicode test file. However, catting a file without unicode works just fine. As demonstrated, dropping back out to cmd.exe i can still "cat" the file. Using my default encoding (437) i can cat the file in bash (no permission denied but the output is fudged).

    S:\>chcp 65001
    Active code page: 65001
    S:\>"C:\Program Files (x86)\Git\bin\sh.exe" --login -i
    zarac@TOWELIE /z
    cat /s/unicode.txt
    cat: write error: Permission denied
    zarac@TOWELIE /z
    cat /s/nounicode.txt
    abc
    zarac@TOWELIE /z
    L /s/unicode.txt
    -rw-r--r--    1 zarac    Administ        7 May 18 10:30 /s/unicode.txt
    zarac@TOWELIE /z
    whoami
    towelie\zarac
    zarac@TOWELIE /z
    exit
    Z:\>type S:\unicode.txt
    abc£
    
  3. Using the /U flag when starting the shell (makes sense that it doesn't work because it's not quite what it's for if-i-understand-correctly, but it has to do with unicode so i tried it).

    C:\Windows\SysWOW64\cmd.exe /U /C "C:\Program Files (x86)\Git\bin\sh.exe" --login -i
    
  4. As I prefer to use Console2, I've tried adding a dword value named CodePage with the value 65001 (decimal) to the windows registry under [HKEY_CURRENT_USER\Console] as well as [HKEY_CURRENT_USER\Console\Git Bash]. This seems to have the same effect as setting 'chcp 65001' accept that it's "automatic". (http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters)

  5. JPSoft's TCC/LE

  6. PowerCMD

  7. stackoverflow

  8. duckduckgo

  9. ixquick / google

  1. 在“.bashrc”中设置环境变量。我想这行不通是有道理的,因为我认为这是 linux 的事情。'locale' 命令不存在。

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8
    export LANGUAGE=en_US.UTF-8
    
  2. 从 cmd.exe 开始,使用“chcp 65001”将编码更改为 unicode,然后启动 git-bash。这导致我在尝试 cat 我的 unicode 测试文件时获得权限被拒绝。但是,catting 没有 unicode 的文件工作得很好。正如所演示的,退出 cmd.exe 我仍然可以“cat”该文件。使用我的默认编码 (437),我可以在 bash 中 cat 文件(没有权限被拒绝,但输出被篡改)。

    S:\>chcp 65001
    Active code page: 65001
    S:\>"C:\Program Files (x86)\Git\bin\sh.exe" --login -i
    zarac@TOWELIE /z
    cat /s/unicode.txt
    cat: write error: Permission denied
    zarac@TOWELIE /z
    cat /s/nounicode.txt
    abc
    zarac@TOWELIE /z
    L /s/unicode.txt
    -rw-r--r--    1 zarac    Administ        7 May 18 10:30 /s/unicode.txt
    zarac@TOWELIE /z
    whoami
    towelie\zarac
    zarac@TOWELIE /z
    exit
    Z:\>type S:\unicode.txt
    abc£
    
  3. 在启动 shell 时使用 /U 标志(它不起作用是有道理的,因为它不完全适用于 if-i-understand-correctly,但它与 unicode 有关,所以我尝试了它)。

    C:\Windows\SysWOW64\cmd.exe /U /C "C:\Program Files (x86)\Git\bin\sh.exe" --login -i
    
  4. 由于我更喜欢​​使用 Console2,我尝试在 [HKEY_CURRENT_USER\Console] 和 [HKEY_CURRENT_USER\Console\Git Bash] 下的 Windows 注册表中添加一个名为 CodePage 的双字值,其值为 65001(十进制)。这似乎与设置 'chcp 65001' 接受它是“自动的”具有相同的效果。(http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters)

  5. JPSoft 的 TCC/LE

  6. 电源命令

  7. 堆栈溢出

  8. 鸭鸭

  9. ixquick / 谷歌

So, method 2 seems viable if that permission issue can be fixed. However, I'm open to pretty much any solution although i prefer if i can use Console2 (due mostly to it's nifty tab feature). Perhaps one solution would be to setup an SSH server and then use Putty/Kitty to connect to it, but that's just wrong! ; )

因此,如果可以修复该权限问题,则方法 2 似乎可行。但是,我对几乎任何解决方案都持开放态度,尽管我更喜欢可以使用 Console2(主要是因为它具有漂亮的选项卡功能)。也许一种解决方案是设置一个 SSH 服务器,然后使用 Putty/Kitty 连接到它,但那是错误的!; )

PS. Is there any official documentation for git-bash?

附注。有没有关于 git-bash 的官方文档?

采纳答案by Hannes

As CharlesB said in a comment, msysgit 1.7.10 handles unicode correctly. There are still a few issues but I can confirm that updating did solve the issue I was having.

正如 CharlesB 在评论中所说,msysgit 1.7.10 正确处理 unicode。仍然存在一些问题,但我可以确认更新确实解决了我遇到的问题。

See: https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support

请参阅:https: //github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support

回答by nkatsar

I faced the same issue in MSYS Git 2.8.0 and as it turned out it just needed changing the configuration.

我在 MSYS Git 2.8.0 中遇到了同样的问题,结果证明它只需要更改配置。

$ git --version

git version 2.8.0.windows.1

The default configuration of Git Bash console in my system did not show Greek filenames.

我系统中 Git Bash 控制台的默认配置没有显示希腊文文件名。

$cd ~

$ls

AppData/
'Application Data'@
Contacts/
Cookies@
Desktop/
Documents/
Downloads/
Favorites/
Links/
'Local Settings'@
NTUSER.DAT
.
.
.
''$'6461'' '$'65636371617664'' '$'646775'@

The last line should display "Τα ?γγραφ? μου", the greek translation of "My Documents". In order to fix it I followed the below steps:

最后一行应显示“Τα ?γγραφ? μου”,这是“My Documents”的希腊语翻译。为了修复它,我按照以下步骤操作:

  1. Check your existing locale configuration

    $locale
    
    LANG=en
    LC_CTYPE="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_COLLATE="C"
    LC_MONETARY="C"
    LC_MESSAGES="C"
    LC_ALL=
    

    As shown above, in my case it was not UTF-8

  2. Change the locale to a UTF-8 encoding. Click the icon on the left side of MINGW title bar, select "Options" and in the "Text" category choose "UTF-8" Character set. You should also choose a unicode font, such as the default "Lucida Console". My configuration looks as following: MinGW locale configuration

  3. Change the language for the current window (no need to do this on future windows, as they will be created with the settings of step 2)

     $ LANG='C.UTF-8'
    
  4. The ls command should now display properly

    AppData/
    'Application Data'@
    Contacts/
    Cookies@
    Desktop/
    Documents/
    Downloads/
    Favorites/
    Links/
    'Local Settings'@
    NTUSER.DAT
    .
    .
    .
    'Τα ?γγραφ? μου'@
    
  1. 检查您现有的语言环境配置

    $locale
    
    LANG=en
    LC_CTYPE="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_COLLATE="C"
    LC_MONETARY="C"
    LC_MESSAGES="C"
    LC_ALL=
    

    如上所示,就我而言,它不是 UTF-8

  2. 将区域设置更改为 UTF-8 编码。单击 MINGW 标题栏左侧的图标,选择“选项”并在“文本”类别中选择“UTF-8”字符集。您还应该选择 unicode 字体,例如默认的“Lucida Console”。我的配置如下所示: MinGW 语言环境配置

  3. 更改当前窗口的语言(无需在以后的窗口中执行此操作,因为它们将使用步骤 2 的设置创建)

     $ LANG='C.UTF-8'
    
  4. ls 命令现在应该可以正确显示

    AppData/
    'Application Data'@
    Contacts/
    Cookies@
    Desktop/
    Documents/
    Downloads/
    Favorites/
    Links/
    'Local Settings'@
    NTUSER.DAT
    .
    .
    .
    'Τα ?γγραφ? μου'@
    

回答by TravisChambers

Found this answer elsewhere:

在别处找到了这个答案:

chcp.com 65001

chcp.com 65001

Git bash chcp windows7 encoding issue

Git bash chcp windows7 编码问题

That's what actually solved it for me.

这就是真正为我解决的问题。

回答by VonC

Check if the issue persists with Git 2.1 (August 2014).
See commit 617ce96or commit 1c950a5by Karsten Blees (kblees)

检查 Git 2.1(2014 年 8 月)是否仍然存在此问题。
提交617ce96承诺1c950a5卡斯滕Blees( kblees

Win32: support Unicode console output

Win32:支持Unicode控制台输出

WriteConsoleWseems to be the only way to reliably print unicode to the console (without weird code page conversions).

Also redirects vfprintfto the winansi.cversion.

WriteConsoleW似乎是将 unicode 可靠地打印到控制台的唯一方法(没有奇怪的代码页转换)。

也重定向vfprintfwinansi.c版本。

Win32: add Unicode conversion functions

Win32:添加Unicode转换功能

Add Unicode conversion functions to convert between Windows native UTF-16LE encoding to UTF-8 and back.

To support repositories with legacy-encoded file names, the UTF-8 to UTF-16 conversion function tries to create valid, unique file names even for invalid UTF-8 byte sequences, so that these repositories can be checked out without error.

添加 Unicode 转换函数以在 Windows 原生 UTF-16LE 编码和 UTF-8 之间转换。

为了支持使用旧编码文件名的存储库,UTF-8 到 UTF-16 转换功能会尝试创建有效的、唯一的文件名,即使是无效的 UTF-8 字节序列,以便可以正确检出这些存储库。

It is likely to be a port of something already integrated in msysgit, but at least that means the Windows version of Git won't have to diverge/patch from the main Git repo source code in order to include those improvements.

它很可能是 msysgit 中已经集成的东西的一个端口,但至少这意味着 Windows 版本的 Git 不必为了包含这些改进而从主 Git 存储库源代码中分离/修补。

回答by hakre

I can see that there are some problems with character encoding with git bash for windows. Less for the work with git itself and the tools it ships with (curl, cat, grep etc.). I didn't run into problems with these over the years character encoding related.

我可以看到使用 git bash for windows 进行字符编码存在一些问题。较少使用 git 本身及其附带的工具(curl、cat、grep 等)。多年来,我没有遇到与字符编码相关的问题。

Normally with each new version problems get better resolved. E.g. with the version from a year ago, I couldn't enter characters like "?" into the shell, so it was not possible to write

通常每个新版本问题都会得到更好的解决。例如,使用一年前的版本,我无法?在 shell 中输入诸如“ ”之类的字符,因此无法编写

echo "?"

To quickly test if UTF-8 is supported and at which level. A workaround is to write the byte-sequences octal:

快速测试是否支持 UTF-8 以及在哪个级别。一种解决方法是编写字节序列八进制:

$ echo -e "0344"
?

Still issues I do have when I execute my windows php.exe binary to output text:

当我执行我的 windows php.exe 二进制文件以输出文本时,我仍然有问题:

$ php -r 'echo "\xC3\xA4";'
?

This does not give the the "?" in the terminal, but it outputs "├?" instead. The workaround I have for that is, that I wrap the phpcommand in a bash-script that processes the output through cat:

这不会?在终端中给出“ ”,而是输出“ ├?”。我对此的解决方法是,我将php命令包装在一个 bash 脚本中,该脚本通过cat以下方式处理输出:

#!/bin/bash

{ php.exe "$@" 2>&1 1>&3 | cat 1>&2; } 3>&1 | cat

ref. reg. stdout + stderr cat

参考 注册。标准输出 + 标准错误猫

This magically then makes phpworking again:

这神奇地然后php再次工作:

$ php -r 'echo "\xC3\xA4";'
?

Applies to

适用于

$ git --version
git version 1.9.4.msysgit.1

I must admit I miss deeper understanding why this is all the way it is. But I'm finally happy that I found a workaround to use php in git bash with UTF-8 support.

我必须承认我错过了更深入的理解为什么会这样。但我终于很高兴我找到了一种在 git bash 中使用 php 并支持 UTF-8 的解决方法。

回答by VonC

The problem with chcp 65001 is that there are bugs in the C runtime (MSVCRT) that make stdio calls return inconsistent results when run under code page 65001.

chcp 65001 的问题在于 C 运行时 (MSVCRT) 中存在错误,导致 stdio 调用在代码页 65001 下运行时返回不一致的结果。

That should be better with Git 2.23 (Q3 2019)

Git 2.23(2019 年第三季度)应该会更好

See commit 090d1e8(03 Jul 2019) by Karsten Blees (kblees).
(Merged by Junio C Hamano -- gitster--in commit 0328db0, 11 Jul 2019)

请参阅Karsten Blees ( ) 的提交 090d1e8(2019 年 7 月 3 日(由Junio C Hamano合并-- --commit 0328db0,2019 年 7 月 11 日)kblees
gitster

gettext: always use UTF-8 on native Windows

gettext:在本机 Windows 上始终使用 UTF-8

On native Windows, Git exclusively uses UTF-8 for console output (both with MinTTY and native Win32 Console).

Gettext uses setlocale()to determine the output encoding for translated text, however, MSVCRT's setlocale()does not support UTF-8. As a result, translated text is encoded in system encoding (as per GetAPC()), and non-ASCII chars are mangled in console output.

Side note: There is actually a code page for UTF-8: 65001.
In practice, it does not work as expected at least on Windows 7, though, so we cannot use it in Git. Besides, if we overrode the code page, any process spawned from Git would inherit that code page (as opposed to the code page configured for the current user), which would quite possibly break e.g. diff or merge helpers. So we really cannot override the code page.

In init_gettext_charset(), Git calls gettext's bind_textdomain_codeset()with the character set obtained via locale_charset(); Let's override that latter function to force the encoding to UTF-8 on native Windows.

In Git for Windows' SDK, there is a libcharset.hand therefore we define HAVE_LIBCHARSET_Hin the MINGW-specific section in config.mak.uname, therefore we need to add the override before that conditionally-compiled code block.

Rather than simply defining locale_charset()to return the string "UTF-8", though, we are careful not to break LC_ALL=C: the ab/no-kwsetpatch series, for example, needs to have a way to prevent Git from expecting UTF-8-encoded input.

在本机 Windows 上,Git 专门使用 UTF-8 进行控制台输出(使用 MinTTY 和本机 Win32 控制台)。

Gettext 用于setlocale()确定翻译文本的输出编码,但是,MSVCRTsetlocale()不支持 UTF-8。因此,翻译后的文本以系统编码(按照GetAPC())进行编码,非 ASCII 字符在控制台输出中被破坏

旁注:实际上有一个 UTF-8 代码页:65001
。实际上,它至少在 Windows 7 上没有按预期工作,所以我们不能在 Git 中使用它。此外,如果我们覆盖代码页,任何从 Git 产生的进程都将继承该代码页(与为当前用户配置的代码页相反),这很可能会破坏例如 diff 或 merge 助手。所以我们真的不能覆盖代码页。

在 中init_gettext_charset(),Gitbind_textdomain_codeset()使用通过locale_charset();获得的字符集调用 gettext 。让我们覆盖后一个函数以在本机 Windows 上强制编码为 UTF-8。

在 Git for Windows 的 SDK 中,有一个libcharset.h,因此我们HAVE_LIBCHARSET_H在 中的 MINGW 特定部分中定义config.mak.uname,因此我们需要在该条件编译的代码块之前添加覆盖。

但是,我们不是简单地定义locale_charset()返回 string "UTF-8",而是小心不要破坏LC_ALL=Cab/no-kwset例如,补丁系列需要有一种方法来防止 Git 期待 UTF-8 编码的输入。

And:

和:

See commit 697bdd2(04 Jul 2019), and commit 9423885, commit 39a98e9(27 Jun 2019) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster--in commit 0a2ff7c, 11 Jul 2019)

请参阅Johannes Schindelin ( ) 的提交 697bdd2(2019 年 7 月 4 日)和提交 9423885提交 39a98e9(2019 年 6 月 27 日(由Junio C Hamano合并-- --in commit 0a2ff7c,2019 年 7 月 11 日)dscho
gitster

mingw: use Unicode functions explicitly

Many Win32 API functions actually exist in two variants: one with the Asuffix that takes ANSI parameters (char *or const char *) and one with the Wsuffix that takes Unicode parameters (wchar_t *or const wchar_t *).

The ANSI variant assumes that the strings are encoded according to whatever is the current locale.
This is not what Git wants to use on Windows: we assume that char *variables point to strings encoded in UTF-8.

There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale.

Further, it is actually highly encouraged to use the Unicode versions instead of the ANSI versions, so let's do precisely that.

Note: when calling the Win32 API functions withoutany suffix, it depends whether the UNICODEconstant is defined before the relevant headers are #include'd.
Without that constant, the ANSI variants are used.
Let's be explicit and avoid that ambiguity.

mingw: 显式使用 Unicode 函数

许多 Win32 API 函数实际上存在两种变体:一种带有A采用 ANSI 参数的后缀 ( char *or const char *),另一种带有W采用 Unicode 参数的后缀 ( wchar_t *or const wchar_t *)。

ANSI 变体假定字符串根据当前语言环境进行编码。
这不是 Git 想在 Windows 上使用的:我们假设char *变量指向以 UTF-8 编码的字符串。

Windows 上有一个伪 UTF-8 语言环境,但它不像人们预期的那样工作。此外,如果我们覆盖用户的语言环境,那将修改 Git 生成的程序(例如编辑器、difftools 等)的行为,因此我们不能使用该伪语言环境。

此外,实际上强烈鼓励使用 Unicode 版本而不是 ANSI 版本,所以让我们准确地做到这一点。

注意:当调用不带任何后缀的 Win32 API 函数时,这取决于UNICODE常量是否在相关头文件被 #include 之前定义。
如果没有该常量,则使用 ANSI 变体。
让我们明确一点,避免这种歧义。