如何确定 Git 将文件处理为二进制文件还是文本文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6119956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 05:31:26  来源:igfitidea点击:

How to determine if Git handles a file as binary or as text?

git

提问by kayahr

I know that Git somehow automatically detects if a file is binary or text and that .gitattributescan be used to set this manually if needed. But is there also a way to ask Git how it treats a file?

我知道 Git 会以某种方式自动检测文件是二进制文件还是文本文件,.gitattributes如果需要,可用于手动设置。但是还有一种方法可以询问 Git 如何处理文件吗?

So let's say I have a Git repository with two files in it: An ascii.datfile containing plain-text and a binary.datfile containing random binary stuff. Git handles the first .datfile as text and the secondary file as binary. Now I want to write a Git web front end which has a viewer for text files and a special viewer for binary files (displaying a hex dump for example). Sure, I could implement my own text/binary check but it would be more useful if the viewer relies on the information how Git handles these files.

因此,假设我有一个 Git 存储库,其中包含两个文件:一个ascii.dat包含纯文本的binary.dat文件和一个包含随机二进制内容的文件。Git 将第一个.dat文件作为文本处理,将第二个文件作为二进制文件处理。现在我想编写一个 Git Web 前端,它有一个文本文件查看器和一个特殊的二进制文件查看器(例如显示十六进制转储)。当然,我可以实现我自己的文本/二进制检查,但如果查看器依赖 Git 如何处理这些文件的信息,它会更有用。

So how can I ask Git if it treats a file as text or binary?

那么我如何询问 Git 将文件视为文本文件还是二进制文件?

采纳答案by Chris Johnsen

builtin_diff()1calls diff_filespec_is_binary()which calls buffer_is_binary()which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).

builtin_diff()1调用diff_filespec_is_binary()它调用buffer_is_binary()检查前 8000 个字节(或整个长度,如果较短)中是否出现任何零字节(NUL“字符”)。

I do not see that this “is it binary?” test is explicitly exposed in any command though.

我没有看到这是“它是二进制的吗?” 但是,测试在任何命令中都明确公开。

git merge-filedirectly uses buffer_is_binary(), so you may be able to make use of it:

git merge-file直接使用buffer_is_binary(),因此您可以使用它:

git merge-file /dev/null /dev/null file-to-test

It seems to produce the error message like error: Cannot merge binary files: file-to-testand yields an exit status of 255 when given a binary file. I am not sure I would want to rely on this behavior though.

error: Cannot merge binary files: file-to-test当给定二进制文件时,它似乎会产生类似的错误消息并产生 255 的退出状态。不过,我不确定我是否想依赖这种行为。

Maybe git diff --numstatwould be more reliable:

也许git diff --numstat会更可靠:

isBinary() {
    p=$(printf '%s\t-\t' -)
    t=$(git diff --no-index --numstat /dev/null "")
    case "$t" in "$p"*) return 0 ;; esac
    return 1
}
isBinary file-to-test && echo binary || echo not binary

For binary files, the --numstatoutput should start with -TAB -TAB, so we just test for that.

对于二进制文件,--numstat输出应以-TAB -TAB开头,因此我们只进行测试。



1builtin_diff()has strings like Binary files %s and %s differthat should be familiar.

1builtin_diff()有这样的字符串Binary files %s and %s differ应该很熟悉。

回答by cstork

git grep -I --name-only --untracked -e . -- ascii.dat binary.dat ...

will return the names of files that git interprets as text files.

将返回 git 解释为文本文件的文件名。

You can use wildcards e.g.

您可以使用通配符,例如

git grep -I --name-only --untracked -e . -- *.ps1

回答by Seth Robertson

I don't like this answer, but you can parse the output of git-diff-tree to see if it is binary. For example:

我不喜欢这个答案,但是您可以解析 git-diff-tree 的输出以查看它是否是二进制的。例如:

git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- MegaCli 
diff --git a/megaraid/MegaCli b/megaraid/MegaCli
new file mode 100755
index 0000000..7f0e997
Binary files /dev/null and b/megaraid/MegaCli differ

as opposed to:

与:

git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- megamgr
diff --git a/megaraid/megamgr b/megaraid/megamgr
new file mode 100755
index 0000000..50fd8a1
--- /dev/null
+++ b/megaraid/megamgr
@@ -0,0 +1,78 @@
+#!/bin/sh
[…]

Oh, and BTW, 4b825d… is a magic SHA which represents the empty tree (it isthe SHA for an empty tree, but git is specially aware of this magic).

哦,顺便说一句,4b825d……是代表空树的魔术SHA(它空树的SHA,但git特别了解这种魔术)。

回答by yoder2000

At the risk of getting slapped for poor code quality, I'm listing a C utility, is_binary, built around the original buffer_is_binary() routine in the Git source. Please see internal comments for how to build and run. Easily modifyable:

冒着因代码质量差而被打脸的风险,我列出了一个 C 实用程序 is_binary,它围绕 Git 源中的原始 buffer_is_binary() 例程构建。有关如何构建和运行,请参阅内部评论。易于修改:

/***********************************************************
 * is_binary.c 
 *
 * Usage: is_binary <pathname>
 *   Returns a 1 if a binary; return a 0 if non-binary
 * 
 * Thanks to Git and Stackoverflow developers for helping with these routines:
 * - the buffer_is_binary() routine from the xdiff-interface.c module 
 *   in git source code.
 * - the read-a-filename-from-stdin route
 * - the read-a-file-into-memory (fill_buffer()) routine
 *
 * To build:
 *    % gcc is_binary.c -o is_binary
 *
 * To build debuggable (to push a few messages to stdout):
 *    % gcc -DDEBUG=1 ./is_binary.c -o is_binary
 *
 * BUGS:
 *  Doesn't work with piped input, like 
 *    % cat foo.tar | is_binary 
 *  Claims that zero input is binary. Actually, 
 *  what should it be?
 *
 * Revision 1.4
 *
 * Tue Sep 12 09:01:33 EDT 2017
***********************************************************/
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define MAX_PATH_LENGTH 200
#define FIRST_FEW_BYTES 8000

/* global, unfortunately */
char *source_blob_buffer;

/* From: https://stackoverflow.com/questions/14002954/c-programming-how-to-read-the-whole-file-contents-into-a-buffer */

/* From: https://stackoverflow.com/questions/1563882/reading-a-file-name-from-piped-command */

/* From: https://stackoverflow.com/questions/6119956/how-to-determine-if-git-handles-a-file-as-binary-or-as-text
*/

/* The key routine in this function is from libc: void *memchr(const void *s, int c, size_t n); */
/* Checks for any occurrence of a zero byte (NUL character) in the first 8000 bytes (or the entire length if shorter). */

int buffer_is_binary(const char *ptr, unsigned long size)
{
  if (FIRST_FEW_BYTES < size)
    size = FIRST_FEW_BYTES;
    /* printf("buff = %s.\n", ptr); */
  return !!memchr(ptr, 0, size);
}
int fill_buffer(FILE * file_object_pointer) {
  fseek(file_object_pointer, 0, SEEK_END);
  long fsize = ftell(file_object_pointer);
  fseek(file_object_pointer, 0, SEEK_SET);  //same as rewind(f);
  source_blob_buffer = malloc(fsize + 1);
  fread(source_blob_buffer, fsize, 1, file_object_pointer);
  fclose(file_object_pointer);
  source_blob_buffer[fsize] = 0;
  return (fsize + 1);
}
int main(int argc, char *argv[]) {

  char pathname[MAX_PATH_LENGTH];
  FILE *file_object_pointer;

  if (argc == 1) {
    file_object_pointer = stdin;
  } else {
    strcpy(pathname,argv[1]);
#ifdef DEBUG
    printf("pathname=%s.\n", pathname); 
#endif 
    file_object_pointer = fopen (pathname, "rb");
    if (file_object_pointer == NULL) {
      printf ("I'm sorry, Dave, I can't do that--");
      printf ("open the file '%s', that is.\n", pathname);
      exit(3);
    }
  }
  if (!file_object_pointer) {
    printf("Not a file nor a pipe--sorry.\n");
    exit (4);
  }
  int fsize = fill_buffer(file_object_pointer);
  int result = buffer_is_binary(source_blob_buffer, fsize - 2);

#ifdef DEBUG
  if (result == 1) {
    printf ("%s %d\n", pathname, fsize - 1);
  }
  else {
    printf ("File '%s' is NON-BINARY; size is %d bytes.\n", pathname, fsize - 1); 
  }
#endif
  exit(result);
  /* easy check -- 'echo $?' after running */
}

回答by Michael Freidgeim

You can use command-line tool 'file' utility. On Windows it's included in git installation and normally located in in C:\Program Files\git\usr\bin folder

您可以使用命令行工具“文件”实用程序。在 Windows 上,它包含在 git 安装中,通常位于 C:\Program Files\git\usr\bin 文件夹中

file --mime-encoding *

See more in Get encoding of a file in Windows

在 Windows获取文件编码中查看更多信息