git 限制git存储库中的文件大小
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7147699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Limiting file size in git repository
提问by dubbaluga
I'm currently thinking of changing my VCS (from subversion) to git. Is it possible to limit the file size within a commit in a git repository? For e. g. subversion there is a hook: http://www.davidgrant.ca/limit_size_of_subversion_commits_with_this_hook
我目前正在考虑将我的 VCS(从 subversion)更改为 git。是否可以在 git 存储库中的提交中限制文件大小?例如颠覆有一个钩子:http: //www.davidgrant.ca/limit_size_of_subversion_commits_with_this_hook
From my experience people, especially those who are inexperienced, sometimes tend to commit files which should not go into a VCS (e. g. big file system images).
根据我的经验,人们,尤其是那些没有经验的人,有时倾向于提交不应进入 VCS 的文件(例如大文件系统映像)。
采纳答案by eis
As I was struggling with it for a while, even with the description, and I think this is relevant for others too, I thought I'd post an implementation of how what J16 SDiZ describedcould be implemented.
由于我为此苦苦挣扎了一段时间,即使有描述,而且我认为这对其他人也很重要,我想我会发布一个关于如何实现J16 SDiZ 描述的实现。
So, my take on the server-side update
hook preventing too big files to be pushed:
所以,我对服务器端update
钩子的看法是防止推送太大的文件:
#!/bin/bash
# Script to limit the size of a push to git repository.
# Git repo has issues with big pushes, and we shouldn't have a real need for those
#
# eis/02.02.2012
# --- Safety check, should not be run from command line
if [ -z "$GIT_DIR" ]; then
echo "Don't run this script from the command line." >&2
echo " (if you want, you could supply GIT_DIR then run" >&2
echo " #!/bin/bash -u
#
# git-max-filesize
#
# git pre-receive hook to reject large files that should be commited
# via git-lfs (large file support) instead.
#
# Author: Christoph Hack <[email protected]>
# Copyright (c) 2017 mgIT GmbH. All rights reserved.
# Distributed under the Apache License. See LICENSE for details.
#
set -o pipefail
readonly DEFAULT_MAXSIZE="5242880" # 5MB
readonly CONFIG_NAME="hooks.maxfilesize"
readonly NULLSHA="0000000000000000000000000000000000000000"
readonly EXIT_SUCCESS="0"
readonly EXIT_FAILURE="1"
# main entry point
function main() {
local status="$EXIT_SUCCESS"
# get maximum filesize (from repository-specific config)
local maxsize
maxsize="$(get_maxsize)"
if [[ "$?" != 0 ]]; then
echo "failed to get ${CONFIG_NAME} from config"
exit "$EXIT_FAILURE"
fi
# skip this hook entirely if maxsize is 0.
if [[ "$maxsize" == 0 ]]; then
cat > /dev/null
exit "$EXIT_SUCCESS"
fi
# read lines from stdin (format: "<oldref> <newref> <refname>\n")
local oldref
local newref
local refname
while read oldref newref refname; do
# skip branch deletions
if [[ "$newref" == "$NULLSHA" ]]; then
continue
fi
# find large objects
# check all objects from $oldref (possible $NULLSHA) to $newref, but
# skip all objects that have already been accepted (i.e. are referenced by
# another branch or tag).
local target
if [[ "$oldref" == "$NULLSHA" ]]; then
target="$newref"
else
target="${oldref}..${newref}"
fi
local large_files
large_files="$(git rev-list --objects "$target" --not --branches=\* --tags=\* | \
git cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
awk -F '\t' -v maxbytes="$maxsize" ' > maxbytes' | cut -f 4-)"
if [[ "$?" != 0 ]]; then
echo "failed to check for large files in ref ${refname}"
continue
fi
IFS=$'\n'
for file in $large_files; do
if [[ "$status" == 0 ]]; then
echo ""
echo "-------------------------------------------------------------------------"
echo "Your push was rejected because it contains files larger than $(numfmt --to=iec "$maxsize")."
echo "Please use https://git-lfs.github.com/ to store larger files."
echo "-------------------------------------------------------------------------"
echo ""
echo "Offending files:"
status="$EXIT_FAILURE"
fi
echo " - ${file} (ref: ${refname})"
done
unset IFS
done
exit "$status"
}
# get the maximum filesize configured for this repository or the default
# value if no specific option has been set. Suffixes like 5k, 5m, 5g, etc.
# can be used (see git config --int).
function get_maxsize() {
local value;
value="$(git config --int "$CONFIG_NAME")"
if [[ "$?" != 0 ]] || [[ -z "$value" ]]; then
echo "$DEFAULT_MAXSIZE"
return "$EXIT_SUCCESS"
fi
echo "$value"
return "$EXIT_SUCCESS"
}
main
<ref> <oldrev> <newrev>)" >&2
exit 1
fi
# Test that tab replacement works, issue in some Solaris envs at least
testvariable=`echo -e "\t" | sed 's/\s//'`
if [ "$testvariable" != "" ]; then
echo "Environment check failed - please contact git hosting." >&2
exit 1
fi
# File size limit is meant to be configured through 'hooks.filesizelimit' setting
filesizelimit=$(git config hooks.filesizelimit)
# If we haven't configured a file size limit, use default value of about 100M
if [ -z "$filesizelimit" ]; then
filesizelimit=100000000
fi
# Reference to incoming checkin can be found at
refname=
# With this command, we can find information about the file coming in that has biggest size
# We also normalize the line for excess whitespace
biggest_checkin_normalized=$(git ls-tree --full-tree -r -l $refname | sort -k 4 -n -r | head -1 | sed 's/^ *//;s/ *$//;s/\s\{1,\}/ /g' )
# Based on that, we can find what we are interested about
filesize=`echo $biggest_checkin_normalized | cut -d ' ' -f4,4`
# Actual comparison
# To cancel a push, we exit with status code 1
# It is also a good idea to print out some info about the cause of rejection
if [ $filesize -gt $filesizelimit ]; then
# To be more user-friendly, we also look up the name of the offending file
filename=`echo $biggest_checkin_normalized | cut -d ' ' -f5,5`
echo "Error: Too large push attempted." >&2
echo >&2
echo "File size limit is $filesizelimit, and you tried to push file named $filename of size $filesize." >&2
echo "Contact configuration team if you really need to do this." >&2
exit 1
fi
exit 0
Note that it's been commentedthat this code only checks the latest commit, so this code would need to be tweaked to iterate commits between $2 and $3 and do the check to all of them.
请注意,有人评论说此代码仅检查最新提交,因此需要调整此代码以在 $2 和 $3 之间迭代提交并对所有提交进行检查。
回答by Galt Barber
The answers by eis and J-16 SDiZ suffer from a severe problem. They are only checking the state of the finale commit $3 or $newrev. They need to also check what is being submitted in the other commits between $2 (or $oldrev) and $3 (or $newrev) in the udpate hook.
eis 和 J-16 SDiZ 的答案遇到了严重的问题。他们只检查结局提交 $3 或 $newrev 的状态。他们还需要检查 udpate 挂钩中 $2(或 $oldrev)和 $3(或 $newrev)之间的其他提交中提交的内容。
J-16 SDiZ is closer to the right answer.
J-16 SDiZ 更接近正确答案。
The big flaw is that someone whose departmental server has this update hook installed to protect it will find out the hard way that:
最大的缺陷是部门服务器安装了这个更新钩子来保护它的人会发现:
After using git rm to remove the big file accidentally being checked in, then the current tree or last commit only will be fine, and it will pull in the entire chain of commits, including the big file that was deleted, creating a swollen unhappy fat history that nobody wants.
用git rm删除意外签入的大文件后,那么当前树或者只最后一次commit就可以了,会拉入整个提交链,包括被删除的大文件,造成一个肿不爽的胖子没有人想要的历史。
To solution is either to check each and every commit from $oldrev to $newrev, or to specify the entire range $oldrev..$newrev. Be darn sure you are not just checking $newrev alone, or this will fail with massive junk in your git history, pushed out to share with others, and then difficult or impossible to remove after that.
解决方案要么检查从 $oldrev 到 $newrev 的每个提交,要么指定整个范围 $oldrev..$newrev。一定要确保您不仅仅是单独检查 $newrev ,否则这将因您的 git 历史记录中的大量垃圾而失败,被推出与他人分享,然后很难或不可能删除。
回答by Gillespie
回答by luchetto
if you are using gitolite you can also try VREF. There is one VREF already provided by default (the code is in gitolite/src/VREF/MAX_NEWBIN_SIZE). It is called MAX_NEWBIN_SIZE. It works like this:
如果您使用 gitolite,您也可以尝试 VREF。默认已经提供了一个 VREF(代码在 gitolite/src/VREF/MAX_NEWBIN_SIZE 中)。它被称为 MAX_NEWBIN_SIZE。它是这样工作的:
- refname=
+ read a b refname
Where 1000 is example threshold in Bytes.
其中 1000 是以字节为单位的示例阈值。
This VREF works like a update hook and it will reject your push if one file you are to push is greater than the threshold.
此 VREF 的工作方式类似于更新挂钩,如果您要推送的文件大于阈值,它将拒绝您的推送。
回答by ?imon Tóth
Yes, git has hooks as well (git hooks). But it kind of depends on the actually work-flow you will be using.
是的,git 也有钩子(git hooks)。但这有点取决于您将使用的实际工作流程。
If you have inexperienced users, it is much safer to pull, then to let them push. That way, you can make sure they won't screw up the main repository.
如果你有没有经验的用户,拉动,然后让他们推动要安全得多。这样,您可以确保他们不会搞砸主存储库。
回答by VonC
Another way is to version a .gitignore
, which will prevent any file with a certain extension to show up in the status.
You still can have hooks as well (on downstream or upstream, as suggested by the other answers), but at least all downstream repo can include that .gitignore
to avoid adding .exe
, .dll
, .iso
, ...
另一种方法是版本 a .gitignore
,这将阻止具有特定扩展名的任何文件显示在状态中。
你仍然可以有钩子(在下游或上游,如其他答案所建议的那样),但至少所有下游 repo 都可以包含它.gitignore
以避免添加.exe
, .dll
, .iso
, ...
回答by manojlds
This is going to be a very rare case from what I have seen when some one checks in, say a 200Mb or even more size file.
这将是一种非常罕见的情况,当有人签入时,例如 200Mb 甚至更大大小的文件,我所看到的情况。
While you can prevent this from happening by using server side hooks ( not sure about client side hooks since you have to rely on the person having the hooks installed ) much like how you would in SVN, you also have to take into account that in Git, it is much much easier to remove such a file / commit from the repository. You did not have such a luxury in SVN, atleast not an easy way.
虽然您可以通过使用服务器端钩子来防止这种情况发生(不确定客户端钩子,因为您必须依赖安装钩子的人),就像您在 SVN 中所做的那样,但您还必须在 Git 中考虑到这一点,从存储库中删除这样的文件/提交要容易得多。你在 SVN 中没有这样的奢侈,至少不是一个简单的方法。
回答by Nerrve
You need a solution that caters to the following scenarios.
您需要一个适合以下场景的解决方案。
- If someone is pushing multiple commits together, then the hook should check ALL the commits (between oldref and newref) in that push for files greater than a certain limit
- The hook should run for all users. If you write a client side hook, it will not be available for all users since such hooks are not pushed when you do a git push. So, what is needed is a server side hook such as a pre-receive hook.
- 如果有人同时推送多个提交,则钩子应该检查该推送中的所有提交(在 oldref 和 newref 之间),以获取大于特定限制的文件
- 钩子应该为所有用户运行。如果你编写一个客户端钩子,它不会对所有用户可用,因为当你执行 git push 时不会推送这些钩子。所以,需要的是一个服务器端的钩子,比如一个预接收钩子。
This hook (https://github.com/mgit-at/git-max-filesize) deals with the above 2 cases and seems to also correctly handle edge cases such as new branch pushes and branch deletes.
这个钩子 ( https://github.com/mgit-at/git-max-filesize) 处理上述 2 种情况,似乎也可以正确处理边缘情况,例如新分支推送和分支删除。
回答by jdavidbakr
I am using gitolite and the update hook was already being used - instead of using the update hook, I used the pre-receive hook. The script posted by Chriki worked fabulously with the exception that the data is passed via stdin - so I made one line change:
我正在使用 gitolite 并且已经使用了更新挂钩 - 我没有使用更新挂钩,而是使用了预接收挂钩。Chriki 发布的脚本工作得非常好,除了数据是通过 stdin 传递的——所以我做了一行更改:
100644 blob 97293e358a9870ac4ddf1daf44b10e10e8273d57 3301 file1
100644 blob 02937b0e158ff8d3895c6e93ebf0cbc37d81cac1 507 file2
(there may be a more elegant way to do that but it works)
(可能有更优雅的方法来做到这一点,但它有效)
回答by J-16 SDiZ
You can use a hook, either pre-commit
hook (on client), or a update
hook (on server). Do a git ls-files --cached
(for pre-commit) or git ls-tree --full-tree -r -l $3
(for update) and act accordingly.
您可以使用钩子,pre-commit
钩子(在客户端)或update
钩子(在服务器上)。执行git ls-files --cached
(用于预提交)或git ls-tree --full-tree -r -l $3
(用于更新)并相应地采取行动。
git ls-tree -l
would give something like this:
git ls-tree -l
会给出这样的东西:
Grab the forth column, and it is the size. Use git ls-tree --full-tree -r -l HEAD | sort -k 4 -n -r | head -1
to get the largest file. cut
to extract, if [ a -lt b ]
to check size, etc..
抓住第四列,它是大小。使用git ls-tree --full-tree -r -l HEAD | sort -k 4 -n -r | head -1
获得的最大文件。cut
提取,if [ a -lt b ]
检查大小等。
Sorry, I think if you are a programmer, you should be able to do this yourself.
抱歉,我觉得如果你是程序员,你应该可以自己做。