git 可以将 zip 文件视为目录并将 zip 中的文件视为 blob 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8001663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 06:08:17  来源:igfitidea点击:

Can git treat zip files as directories and files inside the zip as blobs?

gitzipmsysgit

提问by Jonas Heidelberg

The scenario

场景

Imagine I am forced to work with some of my files always stored inside .zipfiles. Some of the files inside the zip are small text files and change often, while others are larger but luckily rather static (e.g. images).

想象一下,我被迫使用一些始终存储在.zip文件中的文件。zip 中的一些文件是小的文本文件并且经常更改,而其他文件较大但幸运的是相当静态(例如图像)。

If I want to place these zip files inside a gitrepository, each zip is treated as a blob, so whenever I commit the repository grows by the size of the zip file... even if only one small text file inside changed!

如果我想将这些 zip 文件放在git存储库中,每个 zip 都被视为一个 blob,因此每当我提交存储库时,存储库都会随着 zip 文件的大小而增长......即使里面只有一个小文本文件发生了变化!

Why this is realistic

为什么这是现实的

MS Word 2007/2010 .docxand Excel .xlsxfiles are ZIP files...

MS Word 2007/2010.docx和 Excel.xlsx文件是 ZIP 文件...

What I want

我想要的是

Is there, by any chance, a way to tell gitto not treat zips as files, but rather as directories and treat their contents as files?

有没有办法告诉git不要将 zip 视为文件,而是将其视为目录并将其内容视为文件?

The advantages

优点

But it couldn't work, you say?

但它不能工作,你说?

I realize that without extra metadata this would lead to some amount of ambiguity: on a git checkoutgit would have to decide whether to create foo.zip/bar.txtas a file in a regular directory or a zip file. However this could be solved through config options, I would think.

我意识到如果没有额外的元数据,这会导致一定程度的歧义:在git checkoutgit 上必须决定是foo.zip/bar.txt在常规目录中创建文件还是 zip 文件。但是,我认为这可以通过配置选项解决。

Two ideas how it could be done(if it doesn't exist yet)

两个想法如何完成(如果尚不存在)

  • using a library such as minizipor IO::Compress::Zipinside git
  • somehow adding a filesystem layer such that git actually sees zip files as directories to start with
  • 使用诸如minizipIO::Compress::Zipgit 之类的库
  • 以某种方式添加一个文件系统层,以便 git 实际上将 zip 文件视为开始的目录

采纳答案by Jeff Ferland

This doesn't exist, but it could easily exist in the current framework. Just as git acts differently with displaying binary or ascii files when performing a diff, it could be told to offer special treatment to certain file types through the configuration interface.

这不存在,但它很容易存在于当前框架中。正如 git 在执行 diff 时显示二进制或 ascii 文件的行为不同一样,可以通过配置界面告诉它对某些文件类型提供特殊处理。

If you don't want to change the code base (although this is kind of a cool idea you've got), you could also script it for yourself by using pre-commit and post-checkout hooksto unzip and store the files, then return them to their .zip state on checkout. You would have to restrict actions to only those files blobs / indexes that are specified by git add.

如果您不想更改代码库(尽管这是一个很酷的主意),您也可以通过使用pre-commit 和 post-checkout 挂钩来解压缩和存储文件,为自己编写脚本,然后在结帐时将它们返回到它们的 .zip 状态。您必须将操作限制为仅由git add.

Either way is a bit of work -- it's just a question of whether the other git commends are aware of what's going on and play nicely.

无论哪种方式都需要一些工作——这只是一个问题,其他 git 推荐是否知道发生了什么并且玩得很好。

回答by Sippey

Not sure if anyone is still interested in this question. I am facing the same problems and here is my solution that uses git file filter.

不确定是否有人仍然对这个问题感兴趣。我面临同样的问题,这是我使用 git 文件过滤器的解决方案。

Edit: First, I may not state it clear, but this ISan answer to the OP's question! Read the entire sentence before you comment. Moreover, thanks to @Toon Krijthe for the advice to clarify the solution in place.

编辑:首先,我可能没有说清楚,但这对 OP 问题的回答!在发表评论之前阅读整个句子。此外,感谢@Toon Krijthe 提供的建议以澄清解决方案。

My solution is to use a filter to "flat" the zip file into an monolithic expanded (may be huge) text file. During git add/commit the zip file will be automatically expanded to this text format for normal text diffing, and during checkout, it is automatically zipped up again.

我的解决方案是使用过滤器将 zip 文件“扁平化”为一个整体扩展(可能很大)的文本文件。在 git add/commit 期间,zip 文件将自动扩展为这种文本格式以进行普通文本比较,并且在结帐期间,它会再次自动压缩。

The text file is composed of records, each represents a file in the zip. So you can thing this text file is a text-based image for the original zip. If the file in the zip is text in deed, it is copied into the text file; otherwise, it is base64 encoded before copied into the text format file. This keeps the text file always a text file.

文本文件由记录组成,每条记录代表 zip 中的一个文件。所以你可以认为这个文本文件是原始 zip 的基于文本的图像。如果zip中的文件确实是文本,则将其复制到文本文件中;否则,它在复制到文本格式文件之前进行 base64 编码。这使文本文件始终为文本文件。

Although this filter does not make each file in the zip a blob, text file are mapped line to line, which is the unit of the diff, while binary files changes can be represented by updates of their corresponding base64, I think this is equivalent to what the OP imagines.

虽然这个过滤器不会把压缩包中的每个文件都变成一个blob,文本文件是逐行映射的,这是diff的单位,而二进制文件的变化可以用它们对应的base64的更新来表示,我认为这相当于OP的想象。

For details and a prototyping code you can read the following link:

有关详细信息和原型代码,您可以阅读以下链接:

Zippey Git file filter

Zippey Git 文件过滤器

Also, credit to the place that inspired me about this solution: Description of how file filter works

另外,感谢启发我使用此解决方案的地方: 文件过滤器工作原理的描述

回答by VonC

Use bup(presented in details in GitMinutes #24)

使用bup(在GitMinutes #24中有详细介绍)

It is the only git-like system designed to deal with large (even very verylarge) files, which means every version of a zip file will only increase the repo from its delta (instead of a full additional copy)

它是唯一的git-like系统专门用来对付大(甚至是非常非常大)文件,这意味着一个zip文件的每个版本将只从它的增量增加(而不是一个完整的额外副本)回购

The result is an actual git repo, that a regular Git command can read.

结果是一个实际的 git 存储库,常规 Git 命令可以读取它。

I detail how bupdiffers from Git in "git with large files".

bup在“带有大文件的 git”中详细说明了与 Git 的区别。



Any other workaround (like git-annex) isn't entirely satisfactory, as detailed in "git-annexwith large files".

任何其他解决方法(如git-annex)都不完全令人满意,如“git-annex大文件”中所述。

回答by VonC

http://tante.cc/2010/06/23/managing-zip-based-file-formats-in-git/

http://tante.cc/2010/06/23/managing-zip-based-file-formats-in-git/

(Note: per comment from Ruben, this is only about getting a proper diff though, not about committing unzipped files.)

(注意:根据Ruben 的评论,这只是为了获得适当的差异,而不是关于提交解压缩文件。)

Open your ~/.gitconfig file (create if not existing already) and add the following stanza:

[diff "zip"] textconv = unzip -c -a

What it does is using “unzip -c -a FILENAME” to convert your zipfile into ASCII text (unzip -c unzips to STDOUT). Next thing is to create/modify the file REPOSITORY/.gitattributes and add the following

*.pptx diff=zip

which tells git to use the zip-diffing description from the config for files mathcing the given mask (in this case everything ending with .pptx). Now git diff automatically unzips the files and diffs the ASCII output which is a little better than just “binary files differ”. On the other hand to to the convoluted mess that the corresponding XML of pptx files is, it doesn't help a lot but for ZIP-files including text (like for example source code archives) this is actually quite handy.

打开您的 ~/.gitconfig 文件(如果不存在则创建)并添加以下节:

[diff "zip"] textconv = unzip -c -a

它的作用是使用“unzip -c -a FILENAME”将您的 zipfile 转换为 ASCII 文本(unzip -c 解压缩到 STDOUT)。接下来是创建/修改文件 REPOSITORY/.gitattributes 并添加以下内容

*.pptx diff=zip

它告诉 git 使用配置中的 zip-diffing 描述来计算给定掩码的文件(在这种情况下,所有内容都以 .pptx 结尾)。现在 git diff 会自动解压缩文件并区分 ASCII 输出,这比“二进制文件不同”要好一些。另一方面,对于 pptx 文件的相应 XML 令人费解的混乱,它没有多大帮助,但对于包含文本的 ZIP 文件(例如源代码存档),这实际上非常方便。

回答by hoijui

Rezip, similar to Zippey by sippey, allows to handle ZIP files in a nicer way with git.

重新压缩,类似于由sippey Zippey,允许处理与git的一个更好的方式ZIP文件。

How it works

这个怎么运作

When adding/committing a ZIP based file, Rezip unpacks it and repacks it without compression, before adding it to the index/commit. In an uncompressed ZIP file, the archived files appear as-isin its content (together with some binary meta-info before each file). If those archived files are plain-text files, this method will play nicely with git.

添加/提交基于 ZIP 的文件时,Rezip 会在将其添加到索引/提交之前将其解压缩并重新打包而不压缩。在未压缩的 ZIP 文件中,存档文件在其内容中按原样显示(以及每个文件之前的一些二进制元信息)。如果这些存档文件是纯文本文件,则此方法将与 git 一起使用。

Benefits

好处

The main benefit of Rezip over Zippey, is that the actual file stored in the repository is still a ZIP file. Thus, in many cases, it will still work as-iswith the respective application (for example Open Office), even if it is obtained without going through a re-packing-with-compression filter.

Rezip 相对于 Zippey 的主要好处是,存储在存储库中的实际文件仍然是 ZIP 文件。因此,在许多情况下,它仍然可以与相应的应用程序(例如 Open Office)一起工作,即使它是在不通过重新打包压缩过滤器的情况下获得的。

How to use

如何使用

Install the filter(s) on your system:

在您的系统上安装过滤器:

mkdir -p ~/bin
cd ~/bin

# Download the filer executable
wget https://github.com/costerwi/rezip/blob/master/Rezip.class

# Install the add/commit filter
git config --global --replace-all filter.rezip.clean "java -cp ~/bin Rezip --store"

# (optionally) Install the checkout filter
    git config --global --add filter.rezip.smudge "java -cp ~/bin Rezip"

Use the filter in your repository, by adding lines like these to your <repo-root>/.gitattributesfile:

使用存储库中的过滤器,在<repo-root>/.gitattributes文件中添加如下几行:

[attr]textual     diff merge text
[attr]rezip       filter=rezip textual

# MS Office
*.docx  rezip
*.xlsx  rezip
*.pptx  rezip
# OpenOffice
*.odt   rezip
*.ods   rezip
*.odp   rezip
# Misc
*.mcdx  rezip
*.slx   rezip

The textualpart is so that these files are actually shown as text files in diffs.

textual部分是为了使这些文件实际上在差异中显示为文本文件。

回答by Philip Oakley

Often there are problems with pre-zipped files for applications as they expect the zip compression method and file order to be the one they chose. I believe that open office .odf files have that problem.

应用程序的预压缩文件通常存在问题,因为他们希望 zip 压缩方法和文件顺序是他们选择的一种。我相信开放式办公室 .odf 文件有这个问题。

That said, if you are simply using any-old-zip as a method for keeping stuff together that you should be able to create a few simple aliases which will unzip and re-zip when required. The very latest Msysgit (aka Git for Windows) now has both zip and unzip on the shell code side so you can use them in aliases.

也就是说,如果您只是使用 any-old-zip 作为将内容保存在一起的方法,那么您应该能够创建一些简单的别名,以便在需要时解压缩和重新压缩。最新的 Msysgit(又名 Git for Windows)现在在 shell 代码端同时具有 zip 和 unzip,因此您可以在别名中使用它们。

The project I'm currently working on uses zips as the main local version control / archive, so I'm also trying to get a workable set of aliases for sucking these hundreds of zips into git (and getting them out again ;-) so that the co-workers are happy.

我目前正在处理的项目使用 zip 作为主要的本地版本控制/存档,所以我也在尝试获得一组可行的别名,用于将这数百个 zip 吸入 git(并再次将它们取出;-)所以同事们很高兴。

回答by Brad

I think you're going to need to mount a zip file to the filesystem. I haven't used it, but consider FUSE:

我认为您需要将 zip 文件挂载到文件系统。我没用过,但考虑一下 FUSE:

http://code.google.com/p/fuse-zip/

http://code.google.com/p/fuse-zip/

There is also ZFS for Windows and Linux:

还有适用于 Windows 和 Linux 的 ZFS:

http://users.telenet.be/tfautre/softdev/zfs/

http://users.telnet.be/tfautre/softdev/zfs/