我可以让 git 将 UTF-16 文件识别为文本吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/777949/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 06:21:38  来源:igfitidea点击:

Can I make git recognize a UTF-16 file as text?

gitunicodecharacter-encodingdiffutf-16

提问by skiphoppy

I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-16.

我正在 git 中跟踪 Virtual PC 虚拟机文件 (*.vmc),在进行更改后,git 将该文件标识为二进制文件,并且不会对我进行区分。我发现该文件是用 UTF-16 编码的。

Can git be taught to recognize that this file is text and handle it appropriately?

可以教 git 识别这个文件是文本并适当地处理它吗?

I'm using git under Cygwin, with core.autocrlf set to false. I could use mSysGit or git under UNIX, if necessary.

我在 Cygwin 下使用 git,core.autocrlf 设置为 false。如果需要,我可以在 UNIX 下使用 mSysGit 或 git。

采纳答案by Sam Stokes

I've been struggling with this problem for a while, and just discovered (for me) a perfect solution:

我一直在为这个问题苦苦挣扎,刚刚发现(对我来说)一个完美的解决方案:

$ git config --global diff.tool vimdiff      # or merge.tool to get merging too!
$ git difftool commit1 commit2

git difftooltakes the same arguments as git diffwould, but runs a diff program of your choice instead of the built-in GNU diff. So pick a multibyte-aware diff (in my case, vimin diff mode) and just use git difftoolinstead of git diff.

git difftool使用相同的参数git diff,但运行您选择的 diff 程序而不是内置的 GNU diff。因此,选择一个多字节感知差异(在我的情况下,vim在差异模式下)并使用git difftool而不是git diff.

Find "difftool" too long to type? No problem:

发现“difftool”太长而无法输入?没问题:

$ git config --global alias.dt difftool
$ git dt commit1 commit2

Git rocks.

吉特摇滚。

回答by IlDan

There is a very simple solution that works out of the box on Unices.

有一个非常简单的解决方案,可以在 Unices 上开箱即用。

For example, with Apple's .stringsfiles just:

例如,使用 Apple 的.strings文件:

  1. Create a .gitattributesfile in the root of your repository with:

    *.strings diff=localizablestrings
    
  2. Add the following to your ~/.gitconfigfile:

    [diff "localizablestrings"]
    textconv = "iconv -f utf-16 -t utf-8"
    
  1. 使用以下.gitattributes命令在存储库的根目录中创建一个文件:

    *.strings diff=localizablestrings
    
  2. 将以下内容添加到您的~/.gitconfig文件中:

    [diff "localizablestrings"]
    textconv = "iconv -f utf-16 -t utf-8"
    

Source: Diff .strings files in Git(and older postfrom 2010).

来源:Git 中的 Diff .strings 文件(以及2010 年的旧帖子)。

回答by Chealion

Have you tried setting your .gitattributesto treat it as a text file?

您是否尝试将您的设置.gitattributes视为文本文件?

e.g.:

例如:

*.vmc diff

More details at http://www.git-scm.com/docs/gitattributes.html.

更多详细信息,请访问 http://www.git-scm.com/docs/gitattributes.html

回答by Jared Oberhaus

By default, it looks like gitwon't work well with UTF-16; for such a file you have to make sure that no CRLFprocessing is done on it, but you want diffand mergeto work as a normal text file (this is ignoring whether or not your terminal/editor can handle UTF-16).

默认情况下,它看起来git不适用于 UTF-16;对于这样的文件,您必须确保没有对其进行任何CRLF处理,但您希望diffmerge作为普通文本文件工作(这忽略了您的终端/编辑器是否可以处理 UTF-16)。

But looking at the .gitattributesmanpage, here is the custom attribute that is binary:

但是查看.gitattributes联机帮助页,这是自定义属性binary

[attr]binary -diff -crlf

So it seems to me that you could define a custom attribute in your top level .gitattributesfor utf16(note that I add merge here to be sure it is treated as text):

所以,在我看来,你可以定义你的顶级定制属性.gitattributesutf16(请注意,我添加合并在这里,以确保它被视为文本):

[attr]utf16 diff merge -crlf

From there you would be able to specify in any .gitattributesfile something like:

从那里你可以在任何.gitattributes文件中指定如下内容:

*.vmc utf16

Also note that you should still be able to diffa file, even if gitthinks it's binary with:

另请注意,您仍然应该能够访问diff文件,即使git认为它是二进制文件:

git diff --text

Edit

编辑

This answerbasically says that GNU diff wth UTF-16 or even UTF-8 doesn't work very well. If you want to have gituse a different tool to see differences (via --ext-diff), that answer suggests Guiffy.

这个答案基本上是说 GNU diff 与 UTF-16 甚至 UTF-8 不能很好地工作。如果您想git使用不同的工具来查看差异(通过--ext-diff),该答案建议使用Guiffy

But what you likely need is just to diffa UTF-16 file that contains only ASCII characters. A way to get that to work is to use --ext-diffand the following shell script:

但是您可能只需要diff一个只包含 ASCII 字符的 UTF-16 文件。使其工作的一种方法是使用--ext-diff以下 shell 脚本:

#!/bin/bash
diff <(iconv -f utf-16 -t utf-8 "") <(iconv -f utf-16 -t utf-8 "")

Note that converting to UTF-8 might work for merging as well, you just have to make sure it's done in both directions.

请注意,转换为 UTF-8 也可能适用于合并,您只需要确保它在两个方向上都完成。

As for the output to the terminal when looking at a diff of a UTF-16 file:

至于在查看 UTF-16 文件的差异时到终端的输出:

Trying to diff like that results in binary garbage spewed to the screen. If git is using GNU diff, it would seem that GNU diff is not unicode-aware.

试图这样区分会导致二进制垃圾喷到屏幕上。如果 git 正在使用 GNU diff,那么 GNU diff 似乎不是 unicode-aware。

GNU diff doesn't really care about unicode, so when you use diff --text it just diffs and outputs the text. The problem is that the terminal you're using can't handle the UTF-16 that's emitted (combined with the diff marks that are ASCII characters).

GNU diff 并不真正关心 unicode,因此当您使用 diff --text 时,它只是比较并输出文本。问题是您使用的终端无法处理发出的 UTF-16(与 ASCII 字符的差异标记相结合)。

回答by Jared Oberhaus

Solution is to filter through cmd.exe /c "type %1". cmd's typebuiltin will do the conversion, and so you can use that with the textconv ability of git diff to enable text diffing of UTF-16 files (should work with UTF-8 as well, although untested).

解决办法是过滤掉cmd.exe /c "type %1"。cmd 的type内置函数将进行转换,因此您可以将其与 git diff 的 textconv 功能一起使用,以启用 UTF-16 文件的文本比较(也应与 UTF-8 一起使用,尽管未经测试)。

Quoting from gitattributes man page:

引用 gitattributes 手册页:



Performing text diffs of binary files

执行二进制文件的文本差异

Sometimes it is desirable to see the diff of a text-converted version of some binary files. For example, a word processor document can be converted to an ASCII text representation, and the diff of the text shown. Even though this conversion loses some information, the resulting diff is useful for human viewing (but cannot be applied directly).

有时需要查看某些二进制文件的文本转换版本的差异。例如,文字处理器文档可以转换为 ASCII 文本表示,并显示文本的差异。尽管这种转换会丢失一些信息,但由此产生的差异对人类查看很有用(但不能直接应用)。

The textconv config option is used to define a program for performing such a conversion. The program should take a single argument, the name of a file to convert, and produce the resulting text on stdout.

textconv 配置选项用于定义执行此类转换的程序。该程序应采用单个参数,即要转换的文件名,并在 stdout 上生成结果文本。

For example, to show the diff of the exif information of a file instead of the binary information (assuming you have the exif tool installed), add the following section to your $GIT_DIR/configfile (or $HOME/.gitconfigfile):

例如,要显示文件的 exif 信息的差异而不是二进制信息(假设您安装了 exif 工具),请将以下部分添加到您的$GIT_DIR/config文件(或$HOME/.gitconfig文件)中:

[diff "jpg"]
        textconv = exif


A solution for mingw32, cygwin fans may have to alter the approach. The issue is with passing the filename to convert to cmd.exe - it will be using forward slashes, and cmd assumes backslash directory separators.

mingw32 的解决方案,cygwin 粉丝可能不得不改变方法。问题在于传递文件名以转换为 cmd.exe - 它将使用正斜杠,而 cmd 假定反斜杠目录分隔符。

Step 1:

第1步:

Create the single argument script that will do the conversion to stdout. c:\path\to\some\script.sh:

创建将转换为标准输出的单参数脚本。c:\path\to\some\script.sh:

#!/bin/bash
SED='s/\//\\\\/g'
FILE=\`echo  | sed -e "$SED"\`
cmd.exe /c "type $FILE"

Step 2:

第2步:

Set up git to be able to use the script file. Inside your git config (~/.gitconfigor .git/configor see man git-config), put this:

设置 git 以便能够使用脚本文件。在你的 git config (~/.gitconfig.git/config或 see man git-config) 中,输入:

[diff "cmdtype"]
textconv = c:/path/to/some/script.sh

Step 3:

第 3 步:

Point out files to apply this workarond to by utilizing .gitattributes files (see man gitattributes(5)):

通过使用 .gitattributes 文件指出要应用此 workarond 的文件(请参阅 man gitattributes(5)):

*vmc diff=cmdtype

then use git diffon your files.

然后git diff在您的文件上使用。

回答by Chaitanya Gupta

I have written a small git-diff driver, to-utf8, which should make it easy to diff any non-ASCII/UTF-8 encoded files. You can install it using the instructions here: https://github.com/chaitanyagupta/gitutils#to-utf8(the to-utf8script is available in the same repo).

我编写了一个小的 git-diff 驱动程序,to-utf8它应该可以很容易地比较任何非 ASCII/UTF-8 编码的文件。您可以使用此处的说明安装它:https: //github.com/chaitanyagupta/gitutils#to-utf8(该to-utf8脚本在同一个 repo 中可用)。

Note that this script requires both fileand iconvcommands to be available on the system.

请注意,此脚本要求fileiconv命令在系统上可用。

回答by Rusi

git recently has begun to understand encodings such as utf16. See gitattributesdocs, search for working-tree-encoding

git最近开始了解utf16等编码。查看gitattributes文档,搜索working-tree-encoding

[Make sure your man page matches since this is quite new!]

[确保您的手册页匹配,因为这是相当新的!]

If (say) the file is UTF-16 without BOM on Windows machine then add to your .gitattributesfile

如果(比如说)文件是 UTF-16,在 Windows 机器上没有 BOM,然后添加到您的.gitattributes文件

*.vmc text working-tree-encoding=UTF-16LE eol=CRLF

If UTF-16 (with bom) on *nix make it:

如果 *nix 上的 UTF-16(带 bom)使它:

*.vmc text working-tree-encoding=UTF-16 eol=LF

(Replace *.vmcwith *.whateverfor whatevertype files you need to handle)

(替换*.vmc*.whateverwhatever类型的文件需要处理)

See: Support working-tree-encoding "UTF-16LE-BOM".

请参阅:支持工作树编码“UTF-16LE-BOM”

回答by Matt Messersmith

Had this problem on Windows recently, and the dos2unixand unix2dosbins that ship with git for windows did the trick. By default they're located in C:\Program Files\Git\usr\bin\. Observe this will only work if your file doesn'tneed to be UTF-16.For example, someone accidently encoded a python file as UTF-16 when it didn't need to be (in my case).

最近在 Windows 上遇到了这个问题,Windows 的git 附带的dos2unixand unix2dosbins解决了这个问题。默认情况下,它们位于C:\Program Files\Git\usr\bin\. 如果你的文件中看到如此只会工作并不需要是UTF-16。例如,有人在不需要时意外地将 python 文件编码为 UTF-16(在我的情况下)。

PS C:\Users\xxx> dos2unix my_file.py
dos2unix: converting UTF-16LE file my_file.py to ANSI_X3.4-1968 Unix format...

and

PS C:\Users\xxx> unix2dos my_file.py
unix2dos: converting UTF-16LE file my_file.py to ANSI_X3.4-1968 DOS format...