git 限制git存储库中的文件大小

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7147699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 05:52:47  来源:igfitidea点击:

Limiting file size in git repository

gitversion-control

提问by dubbaluga

I'm currently thinking of changing my VCS (from subversion) to git. Is it possible to limit the file size within a commit in a git repository? For e. g. subversion there is a hook: http://www.davidgrant.ca/limit_size_of_subversion_commits_with_this_hook

我目前正在考虑将我的 VCS(从 subversion)更改为 git。是否可以在 git 存储库中的提交中限制文件大小?例如颠覆有一个钩子:http: //www.davidgrant.ca/limit_size_of_subversion_commits_with_this_hook

From my experience people, especially those who are inexperienced, sometimes tend to commit files which should not go into a VCS (e. g. big file system images).

根据我的经验,人们,尤其是那些没有经验的人,有时倾向于提交不应进入 VCS 的文件(例如大文件系统映像)。

采纳答案by eis

As I was struggling with it for a while, even with the description, and I think this is relevant for others too, I thought I'd post an implementation of how what J16 SDiZ describedcould be implemented.

由于我为此苦苦挣扎了一段时间,即使有描述,而且我认为这对其他人也很重要,我想我会发布一个关于如何实现J16 SDiZ 描述的实现。

So, my take on the server-side updatehook preventing too big files to be pushed:

所以,我对服务器端update钩子的看法是防止推送太大的文件:

#!/bin/bash

# Script to limit the size of a push to git repository.
# Git repo has issues with big pushes, and we shouldn't have a real need for those
#
# eis/02.02.2012

# --- Safety check, should not be run from command line
if [ -z "$GIT_DIR" ]; then
        echo "Don't run this script from the command line." >&2
        echo " (if you want, you could supply GIT_DIR then run" >&2
        echo "  
#!/bin/bash -u
#
# git-max-filesize
#
# git pre-receive hook to reject large files that should be commited
# via git-lfs (large file support) instead.
#
# Author: Christoph Hack <[email protected]>
# Copyright (c) 2017 mgIT GmbH. All rights reserved.
# Distributed under the Apache License. See LICENSE for details.
#
set -o pipefail

readonly DEFAULT_MAXSIZE="5242880" # 5MB
readonly CONFIG_NAME="hooks.maxfilesize"
readonly NULLSHA="0000000000000000000000000000000000000000"
readonly EXIT_SUCCESS="0"
readonly EXIT_FAILURE="1"

# main entry point
function main() {
  local status="$EXIT_SUCCESS"

  # get maximum filesize (from repository-specific config)
  local maxsize
  maxsize="$(get_maxsize)"
  if [[ "$?" != 0 ]]; then
    echo "failed to get ${CONFIG_NAME} from config"
    exit "$EXIT_FAILURE"
  fi

  # skip this hook entirely if maxsize is 0.
  if [[ "$maxsize" == 0 ]]; then
    cat > /dev/null
    exit "$EXIT_SUCCESS"
  fi

  # read lines from stdin (format: "<oldref> <newref> <refname>\n")
  local oldref
  local newref
  local refname
  while read oldref newref refname; do
    # skip branch deletions
    if [[ "$newref" == "$NULLSHA" ]]; then
      continue
    fi

    # find large objects
    # check all objects from $oldref (possible $NULLSHA) to $newref, but
    # skip all objects that have already been accepted (i.e. are referenced by
    # another branch or tag).
    local target
    if [[ "$oldref" == "$NULLSHA" ]]; then
      target="$newref"
    else
      target="${oldref}..${newref}"
    fi
    local large_files
    large_files="$(git rev-list --objects "$target" --not --branches=\* --tags=\* | \
      git cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
      awk -F '\t' -v maxbytes="$maxsize" ' > maxbytes' | cut -f 4-)"
    if [[ "$?" != 0 ]]; then
      echo "failed to check for large files in ref ${refname}"
      continue
    fi

    IFS=$'\n'
    for file in $large_files; do
      if [[ "$status" == 0 ]]; then
        echo ""
        echo "-------------------------------------------------------------------------"
        echo "Your push was rejected because it contains files larger than $(numfmt --to=iec "$maxsize")."
        echo "Please use https://git-lfs.github.com/ to store larger files."
        echo "-------------------------------------------------------------------------"
        echo ""
        echo "Offending files:"
        status="$EXIT_FAILURE"
      fi
      echo " - ${file} (ref: ${refname})"
    done
    unset IFS
  done

  exit "$status"
}

# get the maximum filesize configured for this repository or the default
# value if no specific option has been set. Suffixes like 5k, 5m, 5g, etc.
# can be used (see git config --int).
function get_maxsize() {
  local value;
  value="$(git config --int "$CONFIG_NAME")"
  if [[ "$?" != 0 ]] || [[ -z "$value" ]]; then
    echo "$DEFAULT_MAXSIZE"
    return "$EXIT_SUCCESS"
  fi
  echo "$value"
  return "$EXIT_SUCCESS"
}

main
<ref> <oldrev> <newrev>)" >&2 exit 1 fi # Test that tab replacement works, issue in some Solaris envs at least testvariable=`echo -e "\t" | sed 's/\s//'` if [ "$testvariable" != "" ]; then echo "Environment check failed - please contact git hosting." >&2 exit 1 fi # File size limit is meant to be configured through 'hooks.filesizelimit' setting filesizelimit=$(git config hooks.filesizelimit) # If we haven't configured a file size limit, use default value of about 100M if [ -z "$filesizelimit" ]; then filesizelimit=100000000 fi # Reference to incoming checkin can be found at refname= # With this command, we can find information about the file coming in that has biggest size # We also normalize the line for excess whitespace biggest_checkin_normalized=$(git ls-tree --full-tree -r -l $refname | sort -k 4 -n -r | head -1 | sed 's/^ *//;s/ *$//;s/\s\{1,\}/ /g' ) # Based on that, we can find what we are interested about filesize=`echo $biggest_checkin_normalized | cut -d ' ' -f4,4` # Actual comparison # To cancel a push, we exit with status code 1 # It is also a good idea to print out some info about the cause of rejection if [ $filesize -gt $filesizelimit ]; then # To be more user-friendly, we also look up the name of the offending file filename=`echo $biggest_checkin_normalized | cut -d ' ' -f5,5` echo "Error: Too large push attempted." >&2 echo >&2 echo "File size limit is $filesizelimit, and you tried to push file named $filename of size $filesize." >&2 echo "Contact configuration team if you really need to do this." >&2 exit 1 fi exit 0

Note that it's been commentedthat this code only checks the latest commit, so this code would need to be tweaked to iterate commits between $2 and $3 and do the check to all of them.

请注意,有人评论说此代码仅检查最新提交,因此需要调整此代码以在 $2 和 $3 之间迭代提交并对所有提交进行检查。

回答by Galt Barber

The answers by eis and J-16 SDiZ suffer from a severe problem. They are only checking the state of the finale commit $3 or $newrev. They need to also check what is being submitted in the other commits between $2 (or $oldrev) and $3 (or $newrev) in the udpate hook.

eis 和 J-16 SDiZ 的答案遇到了严重的问题。他们只检查结局提交 $3 或 $newrev 的状态。他们还需要检查 udpate 挂钩中 $2(或 $oldrev)和 $3(或 $newrev)之间的其他提交中提交的内容。

J-16 SDiZ is closer to the right answer.

J-16 SDiZ 更接近正确答案。

The big flaw is that someone whose departmental server has this update hook installed to protect it will find out the hard way that:

最大的缺陷是部门服务器安装了这个更新钩子来保护它的人会发现:

After using git rm to remove the big file accidentally being checked in, then the current tree or last commit only will be fine, and it will pull in the entire chain of commits, including the big file that was deleted, creating a swollen unhappy fat history that nobody wants.

用git rm删除意外签入的大文件后,那么当前树或者只最后一次commit就可以了,会拉入整个提交链,包括被删除的大文件,造成一个肿不爽的胖子没有人想要的历史。

To solution is either to check each and every commit from $oldrev to $newrev, or to specify the entire range $oldrev..$newrev. Be darn sure you are not just checking $newrev alone, or this will fail with massive junk in your git history, pushed out to share with others, and then difficult or impossible to remove after that.

解决方案要么检查从 $oldrev 到 $newrev 的每个提交,要么指定整个范围 $oldrev..$newrev。一定要确保您不仅仅是单独检查 $newrev ,否则这将因您的 git 历史记录中的大量垃圾而失败,被推出与他人分享,然后很难或不可能删除。

回答by Gillespie

This oneis pretty good:

这个很不错:

[hooks]
        maxfilesize = 1048576 # 1 MiB

You can configure the size in the serverside configfile by adding:

您可以config通过添加以下内容在服务器端文件中配置大小:

repo name
RW+     =   username
-   VREF/MAX_NEWBIN_SIZE/1000   =   usernames 

回答by luchetto

if you are using gitolite you can also try VREF. There is one VREF already provided by default (the code is in gitolite/src/VREF/MAX_NEWBIN_SIZE). It is called MAX_NEWBIN_SIZE. It works like this:

如果您使用 gitolite,您也可以尝试 VREF。默认已经提供了一个 VREF(代码在 gitolite/src/VREF/MAX_NEWBIN_SIZE 中)。它被称为 MAX_NEWBIN_SIZE。它是这样工作的:

- refname=
+ read a b refname

Where 1000 is example threshold in Bytes.

其中 1000 是以字节为单位的示例阈值。

This VREF works like a update hook and it will reject your push if one file you are to push is greater than the threshold.

此 VREF 的工作方式类似于更新挂钩,如果您要推送的文件大于阈值,它将拒绝您的推送。

回答by ?imon Tóth

Yes, git has hooks as well (git hooks). But it kind of depends on the actually work-flow you will be using.

是的,git 也有钩子(git hooks)。但这有点取决于您将使用的实际工作流程。

If you have inexperienced users, it is much safer to pull, then to let them push. That way, you can make sure they won't screw up the main repository.

如果你有没有经验的用户,拉动,然后让他们推动要安全得多。这样,您可以确保他们不会搞砸主存储库。

回答by VonC

Another way is to version a .gitignore, which will prevent any file with a certain extension to show up in the status.
You still can have hooks as well (on downstream or upstream, as suggested by the other answers), but at least all downstream repo can include that .gitignoreto avoid adding .exe, .dll, .iso, ...

另一种方法是版本 a .gitignore,这将阻止具有特定扩展名的任何文件显示在状态中。
你仍然可以有钩子(在下游或上游,如其他答案所建议的那样),但至少所有下游 repo 都可以包含它.gitignore以避免添加.exe, .dll, .iso, ...

回答by manojlds

This is going to be a very rare case from what I have seen when some one checks in, say a 200Mb or even more size file.

这将是一种非常罕见的情况,当有人签入时,例如 200Mb 甚至更大大小的文件,我所看到的情况。

While you can prevent this from happening by using server side hooks ( not sure about client side hooks since you have to rely on the person having the hooks installed ) much like how you would in SVN, you also have to take into account that in Git, it is much much easier to remove such a file / commit from the repository. You did not have such a luxury in SVN, atleast not an easy way.

虽然您可以通过使用服务器端钩子来防止这种情况发生(不确定客户端钩子,因为您必须依赖安装钩子的人),就像您在 SVN 中所做的那样,但您还必须在 Git 中考虑到这一点,从存储库中删除这样的文件/提交要容易得多。你在 SVN 中没有这样的奢侈,至少不是一个简单的方法。

回答by Nerrve

You need a solution that caters to the following scenarios.

您需要一个适合以下场景的解决方案。

  1. If someone is pushing multiple commits together, then the hook should check ALL the commits (between oldref and newref) in that push for files greater than a certain limit
  2. The hook should run for all users. If you write a client side hook, it will not be available for all users since such hooks are not pushed when you do a git push. So, what is needed is a server side hook such as a pre-receive hook.
  1. 如果有人同时推送多个提交,则钩子应该检查该推送中的所有提交(在 oldref 和 newref 之间),以获取大于特定限制的文件
  2. 钩子应该为所有用户运行。如果你编写一个客户端钩子,它不会对所有用户可用,因为当你执行 git push 时不会推送这些钩子。所以,需要的是一个服务器端的钩子,比如一个预接收钩子。

This hook (https://github.com/mgit-at/git-max-filesize) deals with the above 2 cases and seems to also correctly handle edge cases such as new branch pushes and branch deletes.

这个钩子 ( https://github.com/mgit-at/git-max-filesize) 处理上述 2 种情况,似乎也可以正确处理边缘情况,例如新分支推送和分支删除。

回答by jdavidbakr

I am using gitolite and the update hook was already being used - instead of using the update hook, I used the pre-receive hook. The script posted by Chriki worked fabulously with the exception that the data is passed via stdin - so I made one line change:

我正在使用 gitolite 并且已经使用了更新挂钩 - 我没有使用更新挂钩,而是使用了预接收挂钩。Chriki 发布的脚本工作得非常好,除了数据是通过 stdin 传递的——所以我做了一行更改:

100644 blob 97293e358a9870ac4ddf1daf44b10e10e8273d57    3301    file1
100644 blob 02937b0e158ff8d3895c6e93ebf0cbc37d81cac1     507    file2

(there may be a more elegant way to do that but it works)

(可能有更优雅的方法来做到这一点,但它有效)

回答by J-16 SDiZ

You can use a hook, either pre-commithook (on client), or a updatehook (on server). Do a git ls-files --cached(for pre-commit) or git ls-tree --full-tree -r -l $3(for update) and act accordingly.

您可以使用钩子pre-commit钩子(在客户端)或update钩子(在服务器上)。执行git ls-files --cached(用于预提交)或git ls-tree --full-tree -r -l $3(用于更新)并相应地采取行动。

git ls-tree -lwould give something like this:

git ls-tree -l会给出这样的东西:

##代码##

Grab the forth column, and it is the size. Use git ls-tree --full-tree -r -l HEAD | sort -k 4 -n -r | head -1to get the largest file. cutto extract, if [ a -lt b ]to check size, etc..

抓住第四列,它是大小。使用git ls-tree --full-tree -r -l HEAD | sort -k 4 -n -r | head -1获得的最大文件。cut提取,if [ a -lt b ]检查大小等。

Sorry, I think if you are a programmer, you should be able to do this yourself.

抱歉,我觉得如果你是程序员,你应该可以自己做。