我可以让 git 将 UTF-16 文件识别为文本吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/777949/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I make git recognize a UTF-16 file as text?
提问by skiphoppy
I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-16.
我正在 git 中跟踪 Virtual PC 虚拟机文件 (*.vmc),在进行更改后,git 将该文件标识为二进制文件,并且不会对我进行区分。我发现该文件是用 UTF-16 编码的。
Can git be taught to recognize that this file is text and handle it appropriately?
可以教 git 识别这个文件是文本并适当地处理它吗?
I'm using git under Cygwin, with core.autocrlf set to false. I could use mSysGit or git under UNIX, if necessary.
我在 Cygwin 下使用 git,core.autocrlf 设置为 false。如果需要,我可以在 UNIX 下使用 mSysGit 或 git。
采纳答案by Sam Stokes
I've been struggling with this problem for a while, and just discovered (for me) a perfect solution:
我一直在为这个问题苦苦挣扎,刚刚发现(对我来说)一个完美的解决方案:
$ git config --global diff.tool vimdiff # or merge.tool to get merging too!
$ git difftool commit1 commit2
git difftool
takes the same arguments as git diff
would, but runs a diff program of your choice instead of the built-in GNU diff
. So pick a multibyte-aware diff (in my case, vim
in diff mode) and just use git difftool
instead of git diff
.
git difftool
使用相同的参数git diff
,但运行您选择的 diff 程序而不是内置的 GNU diff
。因此,选择一个多字节感知差异(在我的情况下,vim
在差异模式下)并使用git difftool
而不是git diff
.
Find "difftool" too long to type? No problem:
发现“difftool”太长而无法输入?没问题:
$ git config --global alias.dt difftool
$ git dt commit1 commit2
Git rocks.
吉特摇滚。
回答by IlDan
There is a very simple solution that works out of the box on Unices.
有一个非常简单的解决方案,可以在 Unices 上开箱即用。
For example, with Apple's .strings
files just:
例如,使用 Apple 的.strings
文件:
Create a
.gitattributes
file in the root of your repository with:*.strings diff=localizablestrings
Add the following to your
~/.gitconfig
file:[diff "localizablestrings"] textconv = "iconv -f utf-16 -t utf-8"
使用以下
.gitattributes
命令在存储库的根目录中创建一个文件:*.strings diff=localizablestrings
将以下内容添加到您的
~/.gitconfig
文件中:[diff "localizablestrings"] textconv = "iconv -f utf-16 -t utf-8"
Source: Diff .strings files in Git(and older postfrom 2010).
来源:Git 中的 Diff .strings 文件(以及2010 年的旧帖子)。
回答by Chealion
Have you tried setting your .gitattributes
to treat it as a text file?
您是否尝试将您的设置.gitattributes
视为文本文件?
e.g.:
例如:
*.vmc diff
More details at http://www.git-scm.com/docs/gitattributes.html.
回答by Jared Oberhaus
By default, it looks like git
won't work well with UTF-16; for such a file you have to make sure that no CRLF
processing is done on it, but you want diff
and merge
to work as a normal text file (this is ignoring whether or not your terminal/editor can handle UTF-16).
默认情况下,它看起来git
不适用于 UTF-16;对于这样的文件,您必须确保没有对其进行任何CRLF
处理,但您希望diff
并merge
作为普通文本文件工作(这忽略了您的终端/编辑器是否可以处理 UTF-16)。
But looking at the .gitattributes
manpage, here is the custom attribute that is binary
:
但是查看.gitattributes
联机帮助页,这是自定义属性binary
:
[attr]binary -diff -crlf
So it seems to me that you could define a custom attribute in your top level .gitattributes
for utf16
(note that I add merge here to be sure it is treated as text):
所以,在我看来,你可以定义你的顶级定制属性.gitattributes
的utf16
(请注意,我添加合并在这里,以确保它被视为文本):
[attr]utf16 diff merge -crlf
From there you would be able to specify in any .gitattributes
file something like:
从那里你可以在任何.gitattributes
文件中指定如下内容:
*.vmc utf16
Also note that you should still be able to diff
a file, even if git
thinks it's binary with:
另请注意,您仍然应该能够访问diff
文件,即使git
认为它是二进制文件:
git diff --text
Edit
编辑
This answerbasically says that GNU diff wth UTF-16 or even UTF-8 doesn't work very well. If you want to have git
use a different tool to see differences (via --ext-diff
), that answer suggests Guiffy.
这个答案基本上是说 GNU diff 与 UTF-16 甚至 UTF-8 不能很好地工作。如果您想git
使用不同的工具来查看差异(通过--ext-diff
),该答案建议使用Guiffy。
But what you likely need is just to diff
a UTF-16 file that contains only ASCII characters. A way to get that to work is to use --ext-diff
and the following shell script:
但是您可能只需要diff
一个只包含 ASCII 字符的 UTF-16 文件。使其工作的一种方法是使用--ext-diff
以下 shell 脚本:
#!/bin/bash
diff <(iconv -f utf-16 -t utf-8 "") <(iconv -f utf-16 -t utf-8 "")
Note that converting to UTF-8 might work for merging as well, you just have to make sure it's done in both directions.
请注意,转换为 UTF-8 也可能适用于合并,您只需要确保它在两个方向上都完成。
As for the output to the terminal when looking at a diff of a UTF-16 file:
至于在查看 UTF-16 文件的差异时到终端的输出:
Trying to diff like that results in binary garbage spewed to the screen. If git is using GNU diff, it would seem that GNU diff is not unicode-aware.
试图这样区分会导致二进制垃圾喷到屏幕上。如果 git 正在使用 GNU diff,那么 GNU diff 似乎不是 unicode-aware。
GNU diff doesn't really care about unicode, so when you use diff --text it just diffs and outputs the text. The problem is that the terminal you're using can't handle the UTF-16 that's emitted (combined with the diff marks that are ASCII characters).
GNU diff 并不真正关心 unicode,因此当您使用 diff --text 时,它只是比较并输出文本。问题是您使用的终端无法处理发出的 UTF-16(与 ASCII 字符的差异标记相结合)。
回答by Jared Oberhaus
Solution is to filter through cmd.exe /c "type %1"
. cmd's type
builtin will do the conversion, and so you can use that with the textconv ability of git diff to enable text diffing of UTF-16 files (should work with UTF-8 as well, although untested).
解决办法是过滤掉cmd.exe /c "type %1"
。cmd 的type
内置函数将进行转换,因此您可以将其与 git diff 的 textconv 功能一起使用,以启用 UTF-16 文件的文本比较(也应与 UTF-8 一起使用,尽管未经测试)。
Quoting from gitattributes man page:
引用 gitattributes 手册页:
Performing text diffs of binary files
执行二进制文件的文本差异
Sometimes it is desirable to see the diff of a text-converted version of some binary files. For example, a word processor document can be converted to an ASCII text representation, and the diff of the text shown. Even though this conversion loses some information, the resulting diff is useful for human viewing (but cannot be applied directly).
有时需要查看某些二进制文件的文本转换版本的差异。例如,文字处理器文档可以转换为 ASCII 文本表示,并显示文本的差异。尽管这种转换会丢失一些信息,但由此产生的差异对人类查看很有用(但不能直接应用)。
The textconv config option is used to define a program for performing such a conversion. The program should take a single argument, the name of a file to convert, and produce the resulting text on stdout.
textconv 配置选项用于定义执行此类转换的程序。该程序应采用单个参数,即要转换的文件名,并在 stdout 上生成结果文本。
For example, to show the diff of the exif information of a file instead of the binary information (assuming you have the exif tool installed), add the following section to your $GIT_DIR/config
file (or $HOME/.gitconfig
file):
例如,要显示文件的 exif 信息的差异而不是二进制信息(假设您安装了 exif 工具),请将以下部分添加到您的$GIT_DIR/config
文件(或$HOME/.gitconfig
文件)中:
[diff "jpg"]
textconv = exif
A solution for mingw32, cygwin fans may have to alter the approach. The issue is with passing the filename to convert to cmd.exe - it will be using forward slashes, and cmd assumes backslash directory separators.
mingw32 的解决方案,cygwin 粉丝可能不得不改变方法。问题在于传递文件名以转换为 cmd.exe - 它将使用正斜杠,而 cmd 假定反斜杠目录分隔符。
Step 1:
第1步:
Create the single argument script that will do the conversion to stdout. c:\path\to\some\script.sh:
创建将转换为标准输出的单参数脚本。c:\path\to\some\script.sh:
#!/bin/bash
SED='s/\//\\\\/g'
FILE=\`echo | sed -e "$SED"\`
cmd.exe /c "type $FILE"
Step 2:
第2步:
Set up git to be able to use the script file. Inside your git config (~/.gitconfig
or .git/config
or see man git-config
), put this:
设置 git 以便能够使用脚本文件。在你的 git config (~/.gitconfig
或.git/config
或 see man git-config
) 中,输入:
[diff "cmdtype"]
textconv = c:/path/to/some/script.sh
Step 3:
第 3 步:
Point out files to apply this workarond to by utilizing .gitattributes files (see man gitattributes(5)):
通过使用 .gitattributes 文件指出要应用此 workarond 的文件(请参阅 man gitattributes(5)):
*vmc diff=cmdtype
then use git diff
on your files.
然后git diff
在您的文件上使用。
回答by Chaitanya Gupta
I have written a small git-diff driver, to-utf8
, which should make it easy to diff any non-ASCII/UTF-8 encoded files. You can install it using the instructions here: https://github.com/chaitanyagupta/gitutils#to-utf8(the to-utf8
script is available in the same repo).
我编写了一个小的 git-diff 驱动程序,to-utf8
它应该可以很容易地比较任何非 ASCII/UTF-8 编码的文件。您可以使用此处的说明安装它:https: //github.com/chaitanyagupta/gitutils#to-utf8(该to-utf8
脚本在同一个 repo 中可用)。
Note that this script requires both file
and iconv
commands to be available on the system.
请注意,此脚本要求file
和iconv
命令在系统上可用。
回答by Rusi
git recently has begun to understand encodings such as utf16.
See gitattributesdocs, search for working-tree-encoding
git最近开始了解utf16等编码。查看gitattributes文档,搜索working-tree-encoding
[Make sure your man page matches since this is quite new!]
[确保您的手册页匹配,因为这是相当新的!]
If (say) the file is UTF-16 without BOM on Windows machine then add to your .gitattributes
file
如果(比如说)文件是 UTF-16,在 Windows 机器上没有 BOM,然后添加到您的.gitattributes
文件
*.vmc text working-tree-encoding=UTF-16LE eol=CRLF
If UTF-16 (with bom) on *nix make it:
如果 *nix 上的 UTF-16(带 bom)使它:
*.vmc text working-tree-encoding=UTF-16 eol=LF
(Replace *.vmc
with *.whatever
for whatever
type files you need to handle)
(替换*.vmc
用*.whatever
的whatever
类型的文件需要处理)
回答by Matt Messersmith
Had this problem on Windows recently, and the dos2unix
and unix2dos
bins that ship with git for windows did the trick. By default they're located in C:\Program Files\Git\usr\bin\
. Observe this will only work if your file doesn'tneed to be UTF-16.For example, someone accidently encoded a python file as UTF-16 when it didn't need to be (in my case).
最近在 Windows 上遇到了这个问题,Windows 的git 附带的dos2unix
and unix2dos
bins解决了这个问题。默认情况下,它们位于C:\Program Files\Git\usr\bin\
. 如果你的文件中看到如此只会工作并不需要是UTF-16。例如,有人在不需要时意外地将 python 文件编码为 UTF-16(在我的情况下)。
PS C:\Users\xxx> dos2unix my_file.py
dos2unix: converting UTF-16LE file my_file.py to ANSI_X3.4-1968 Unix format...
and
和
PS C:\Users\xxx> unix2dos my_file.py
unix2dos: converting UTF-16LE file my_file.py to ANSI_X3.4-1968 DOS format...