如何确定 Git 将文件处理为二进制文件还是文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6119956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to determine if Git handles a file as binary or as text?
提问by kayahr
I know that Git somehow automatically detects if a file is binary or text and that .gitattributes
can be used to set this manually if needed. But is there also a way to ask Git how it treats a file?
我知道 Git 会以某种方式自动检测文件是二进制文件还是文本文件,.gitattributes
如果需要,可用于手动设置。但是还有一种方法可以询问 Git 如何处理文件吗?
So let's say I have a Git repository with two files in it: An ascii.dat
file containing plain-text and a binary.dat
file containing random binary stuff. Git handles the first .dat
file as text and the secondary file as binary. Now I want to write a Git web front end which has a viewer for text files and a special viewer for binary files (displaying a hex dump for example). Sure, I could implement my own text/binary check but it would be more useful if the viewer relies on the information how Git handles these files.
因此,假设我有一个 Git 存储库,其中包含两个文件:一个ascii.dat
包含纯文本的binary.dat
文件和一个包含随机二进制内容的文件。Git 将第一个.dat
文件作为文本处理,将第二个文件作为二进制文件处理。现在我想编写一个 Git Web 前端,它有一个文本文件查看器和一个特殊的二进制文件查看器(例如显示十六进制转储)。当然,我可以实现我自己的文本/二进制检查,但如果查看器依赖 Git 如何处理这些文件的信息,它会更有用。
So how can I ask Git if it treats a file as text or binary?
那么我如何询问 Git 将文件视为文本文件还是二进制文件?
采纳答案by Chris Johnsen
builtin_diff()
1calls diff_filespec_is_binary()
which calls buffer_is_binary()
which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).
builtin_diff()
1调用diff_filespec_is_binary()
它调用buffer_is_binary()
检查前 8000 个字节(或整个长度,如果较短)中是否出现任何零字节(NUL“字符”)。
I do not see that this “is it binary?” test is explicitly exposed in any command though.
我没有看到这是“它是二进制的吗?” 但是,测试在任何命令中都明确公开。
git merge-file
directly uses buffer_is_binary()
, so you may be able to make use of it:
git merge-file
直接使用buffer_is_binary()
,因此您可以使用它:
git merge-file /dev/null /dev/null file-to-test
It seems to produce the error message like error: Cannot merge binary files: file-to-test
and yields an exit status of 255 when given a binary file. I am not sure I would want to rely on this behavior though.
error: Cannot merge binary files: file-to-test
当给定二进制文件时,它似乎会产生类似的错误消息并产生 255 的退出状态。不过,我不确定我是否想依赖这种行为。
Maybe git diff --numstat
would be more reliable:
也许git diff --numstat
会更可靠:
isBinary() {
p=$(printf '%s\t-\t' -)
t=$(git diff --no-index --numstat /dev/null "")
case "$t" in "$p"*) return 0 ;; esac
return 1
}
isBinary file-to-test && echo binary || echo not binary
For binary files, the --numstat
output should start with -
TAB -
TAB, so we just test for that.
对于二进制文件,--numstat
输出应以-
TAB -
TAB开头,因此我们只进行测试。
1builtin_diff()
has strings like Binary files %s and %s differ
that should be familiar.
1builtin_diff()
有这样的字符串Binary files %s and %s differ
应该很熟悉。
回答by cstork
git grep -I --name-only --untracked -e . -- ascii.dat binary.dat ...
will return the names of files that git interprets as text files.
将返回 git 解释为文本文件的文件名。
You can use wildcards e.g.
您可以使用通配符,例如
git grep -I --name-only --untracked -e . -- *.ps1
回答by Seth Robertson
I don't like this answer, but you can parse the output of git-diff-tree to see if it is binary. For example:
我不喜欢这个答案,但是您可以解析 git-diff-tree 的输出以查看它是否是二进制的。例如:
git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- MegaCli
diff --git a/megaraid/MegaCli b/megaraid/MegaCli
new file mode 100755
index 0000000..7f0e997
Binary files /dev/null and b/megaraid/MegaCli differ
as opposed to:
与:
git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- megamgr
diff --git a/megaraid/megamgr b/megaraid/megamgr
new file mode 100755
index 0000000..50fd8a1
--- /dev/null
+++ b/megaraid/megamgr
@@ -0,0 +1,78 @@
+#!/bin/sh
[…]
Oh, and BTW, 4b825d… is a magic SHA which represents the empty tree (it isthe SHA for an empty tree, but git is specially aware of this magic).
哦,顺便说一句,4b825d……是代表空树的魔术SHA(它是空树的SHA,但git特别了解这种魔术)。
回答by yoder2000
At the risk of getting slapped for poor code quality, I'm listing a C utility, is_binary, built around the original buffer_is_binary() routine in the Git source. Please see internal comments for how to build and run. Easily modifyable:
冒着因代码质量差而被打脸的风险,我列出了一个 C 实用程序 is_binary,它围绕 Git 源中的原始 buffer_is_binary() 例程构建。有关如何构建和运行,请参阅内部评论。易于修改:
/***********************************************************
* is_binary.c
*
* Usage: is_binary <pathname>
* Returns a 1 if a binary; return a 0 if non-binary
*
* Thanks to Git and Stackoverflow developers for helping with these routines:
* - the buffer_is_binary() routine from the xdiff-interface.c module
* in git source code.
* - the read-a-filename-from-stdin route
* - the read-a-file-into-memory (fill_buffer()) routine
*
* To build:
* % gcc is_binary.c -o is_binary
*
* To build debuggable (to push a few messages to stdout):
* % gcc -DDEBUG=1 ./is_binary.c -o is_binary
*
* BUGS:
* Doesn't work with piped input, like
* % cat foo.tar | is_binary
* Claims that zero input is binary. Actually,
* what should it be?
*
* Revision 1.4
*
* Tue Sep 12 09:01:33 EDT 2017
***********************************************************/
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX_PATH_LENGTH 200
#define FIRST_FEW_BYTES 8000
/* global, unfortunately */
char *source_blob_buffer;
/* From: https://stackoverflow.com/questions/14002954/c-programming-how-to-read-the-whole-file-contents-into-a-buffer */
/* From: https://stackoverflow.com/questions/1563882/reading-a-file-name-from-piped-command */
/* From: https://stackoverflow.com/questions/6119956/how-to-determine-if-git-handles-a-file-as-binary-or-as-text
*/
/* The key routine in this function is from libc: void *memchr(const void *s, int c, size_t n); */
/* Checks for any occurrence of a zero byte (NUL character) in the first 8000 bytes (or the entire length if shorter). */
int buffer_is_binary(const char *ptr, unsigned long size)
{
if (FIRST_FEW_BYTES < size)
size = FIRST_FEW_BYTES;
/* printf("buff = %s.\n", ptr); */
return !!memchr(ptr, 0, size);
}
int fill_buffer(FILE * file_object_pointer) {
fseek(file_object_pointer, 0, SEEK_END);
long fsize = ftell(file_object_pointer);
fseek(file_object_pointer, 0, SEEK_SET); //same as rewind(f);
source_blob_buffer = malloc(fsize + 1);
fread(source_blob_buffer, fsize, 1, file_object_pointer);
fclose(file_object_pointer);
source_blob_buffer[fsize] = 0;
return (fsize + 1);
}
int main(int argc, char *argv[]) {
char pathname[MAX_PATH_LENGTH];
FILE *file_object_pointer;
if (argc == 1) {
file_object_pointer = stdin;
} else {
strcpy(pathname,argv[1]);
#ifdef DEBUG
printf("pathname=%s.\n", pathname);
#endif
file_object_pointer = fopen (pathname, "rb");
if (file_object_pointer == NULL) {
printf ("I'm sorry, Dave, I can't do that--");
printf ("open the file '%s', that is.\n", pathname);
exit(3);
}
}
if (!file_object_pointer) {
printf("Not a file nor a pipe--sorry.\n");
exit (4);
}
int fsize = fill_buffer(file_object_pointer);
int result = buffer_is_binary(source_blob_buffer, fsize - 2);
#ifdef DEBUG
if (result == 1) {
printf ("%s %d\n", pathname, fsize - 1);
}
else {
printf ("File '%s' is NON-BINARY; size is %d bytes.\n", pathname, fsize - 1);
}
#endif
exit(result);
/* easy check -- 'echo $?' after running */
}
回答by Michael Freidgeim
You can use command-line tool 'file' utility. On Windows it's included in git installation and normally located in in C:\Program Files\git\usr\bin folder
您可以使用命令行工具“文件”实用程序。在 Windows 上,它包含在 git 安装中,通常位于 C:\Program Files\git\usr\bin 文件夹中
file --mime-encoding *
See more in Get encoding of a file in Windows