git 如何对 Excel 文件和 SQL 架构文件执行更好的文档版本控制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17083502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 16:22:36  来源:igfitidea点击:

How to perform better document version control on Excel files and SQL schema files

gitversion-controlms-officeversioning

提问by Marcus Thornton

I am in charge of several Excel files and SQL schema files. How should I perform better document version control on these files?

我负责几个 Excel 文件和 SQL 架构文件。我应该如何对这些文件执行更好的文档版本控制?

I need to know the part modified (different part) in these files and keep all the versions for reference. Currently I am appending the time stamp on the file name, but I found it seemed to be inefficient.

我需要知道这些文件中修改的部分(不同的部分)并保留所有版本以供参考。目前我在文件名上附加时间戳,但我发现它似乎效率低下。

Is there a way or good practice to do better document version control?

有没有更好的方法或好的做法来进行更好的文档版本控制?

By the way, editors send me the files via email.

顺便说一下,编辑通过电子邮件将文件发送给我。

采纳答案by kirelagin

Since you've tagged your question with gitI assume you are asking about Git usage for this.

由于您已使用git标记了您的问题,因此我假设您正在为此询问 Git 使用情况。

Well, SQL dumps are normal text files so it makes perfect sense to track them with Git. Just create a repository and store them in it. When you get a new version of a file, simply overwrite it and commit, Git will figure out everything for you, and you'll be able to see modification dates, checkout specific versions of this file and compare different versions.

好吧,SQL 转储是普通的文本文件,因此使用 Git 跟踪它们非常有意义。只需创建一个存储库并将它们存储在其中。当你得到一个文件的新版本时,只需覆盖它并提交,Git 会为你计算一切,你将能够查看修改日期、检查该文件的特定版本并比较不同版本。

The same is true for .xlsxif you decompress them. .xlsxfiles are zipped up directories of XML files (See How to properly assemble a valid xlsx file from its internal sub-components?). Git will view them as binary unless decompressed. It is possible to unzip the .xlsxand track the changes to the individual XML files inside of the archive.

.xlsx如果您解压缩它们,情况也是如此。.xlsx文件被压缩到 XML 文件的目录中(请参阅如何从其内部子组件正确组装有效的 xlsx 文件?)。除非解压缩,否则 Git 会将它们视为二进制文件。可以解压缩.xlsx存档中的各个 XML 文件并跟踪对这些文件的更改。

You could also do this with .xlsfiles, but the problem here is that .xlsformat is binary, so you can't get meaningful diffs from it. But you'll still be able to see modification history and checkout specific versions.

您也可以对.xls文件执行此操作,但这里的问题是.xls格式是二进制的,因此您无法从中获得有意义的差异。但是您仍然可以查看修改历史记录并签出特定版本。

回答by 1615903

The answer I have written herecan be applied in this case. A tool called xls2txtcan provide human-readable output from .xls files. So in short, you should put this to your .gitattributes file:

我在这里写的答案可以应用于这种情况。名为xls2txt的工具可以从 .xls 文件提供人类可读的输出。所以简而言之,你应该把它放到你的 .gitattributes 文件中:

*.xls diff=xls

And in the .git/config:

在 .git/config 中:

[diff "xls"]
    binary = true
    textconv = /path/to/xls2txt

Of course, I'm sure you can find similar tools for other file types as well, making git diffa very useful tool for office documents. This is what I currently have in my global .gitconfig:

当然,我相信您也可以找到适用于其他文件类型的类似工具,这git diff是一款非常有用的办公文档工具。这是我目前在全局 .gitconfig 中的内容:

[diff "xls"]
    binary = true
    textconv = /usr/bin/py_xls2txt
[diff "pdf"]
    binary = true
    textconv = /usr/bin/pdf2txt
[diff "doc"]
    binary = true
    textconv = /usr/bin/catdoc
[diff "docx"]
    binary = true
    textconv = /usr/bin/docx2txt

The Pro Git book has a good chapter on the subject: 8.2 Customizing Git - Git Attributes

Pro Git 书有一个关于这个主题的很好的章节:8.2 自定义 Git - Git 属性

回答by Jon G

I've been struggling with this exact problem for the last few days and have written a small .NET utility to extract and normalise Excel files in such a way that they're much easier to store in source control. I've published the executable here:

在过去的几天里,我一直在努力解决这个确切的问题,并编写了一个小的 .NET 实用程序来提取和规范 Excel 文件,以便它们更容易存储在源代码管理中。我在这里发布了可执行文件:

https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe

https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe

..and the source here:

..和这里的来源:

https://bitbucket.org/htilabs/ooxmlunpack

https://bitbucket.org/htilabs/ooxmlunpack

If there's any interest I'm happy to make this more configurable, but at the moment, you should put the executable in a folder (e.g. the root of your source repository) and when you run it, it will:

如果有任何兴趣,我很乐意让这更可配置,但目前,您应该将可执行文件放在一个文件夹中(例如,源存储库的根目录),当您运行它时,它将:

  • Scan the folder and its subfolders for any .xlsx and .xlsm files
  • Take a copy of the file as *.orig.
  • Unzip each file and re-zip it with no compression.
  • Pretty-print any files in the archive which are valid XML.
  • Delete the calcchain.xml file from the archive (since it changes a lot and doesn't affect the content of the file).
  • Inline any unformatted text values (otherwise these are kept in a lookup table which causes big changes in the internal XML if even a single cell is modified).
  • Delete the values from any cells which contain formulas (since they can just be calculated when the sheet is next opened).
  • Create a subfolder *.extracted, containing the extracted zip archive contents.
  • 扫描文件夹及其子文件夹中的任何 .xlsx 和 .xlsm 文件
  • 将文件复制为 *.orig。
  • 解压缩每个文件并重新压缩,不压缩。
  • 漂亮地打印存档中任何有效 XML 的文件。
  • 从存档中删除 calcchain.xml 文件(因为它更改了很多并且不会影响文件的内容)。
  • 内联任何未格式化的文本值(否则这些值将保存在查找表中,即使修改单个单元格也会导致内部 XML 发生重大变化)。
  • 从包含公式的任何单元格中删除值(因为它们只能在下次打开工作表时计算)。
  • 创建一个子文件夹 *.extracted,其中包含提取的 zip 存档内容。

Clearly not all of these things are necessary, but the end result is a spreadsheet file that will still open in Excel, but which is much more amenable to diffing and incremental compression. Also, storing the extracted files as well makes it much more obvious in the version history what changes have been applied in each version.

显然,并非所有这些东西都是必需的,但最终结果是一个仍将在 Excel 中打开的电子表格文件,但它更适合差异和增量压缩。此外,存储提取的文件还可以在版本历史记录中更清楚地了解每个版本中应用的更改。

If there's any appetite out there, I'm happy to make the tool more configurable since I guess not everyone will want the contents extracted, or possibly the values removed from formula cells, but these are both very useful to me at the moment.

如果有任何胃口,我很高兴使该工具更具可配置性,因为我猜不是每个人都希望提取内容,或者可能从公式单元格中删除值,但目前这些对我来说都非常有用。

In tests, a 2 MB spreadsheet 'unpacks' to 21 MB, but then I was able to store five versions of it with small changes between each, in a 1.9 MB Mercurial data file, and visualise the differences between versions effectively using Beyond Comparein text mode.

在测试中,一个2 MB的电子表格“解包”到21 MB,后来我才得以保存它的五个版本,每个之间的微小变化,在1.9 MB水银数据文件和可视化有效利用版本之间的区别除了比较中文本模式。

NB: although I'm using Mercurial, I read this question while researching my solution and there's nothing Mercurial-specific about the solution, should work fine for Git or any other VCS.

注意:虽然我正在使用 Mercurial,但我在研究我的解决方案时阅读了这个问题,并且该解决方案没有任何 Mercurial 特定的内容,应该适用于 Git 或任何其他 VCS。

回答by Roberto Cabellon

Tante made it very simple in Managing ZIP-based file formats in Git:

Tante在 Git管理基于 ZIP 的文件格式中变得非常简单:

Open your ~/.gitconfig file (create if not existing already) and add the following stanza:

[diff "zip"]

textconv = unzip -c -a

打开您的 ~/.gitconfig 文件(如果不存在则创建)并添加以下节:

[差异“拉链”]

textconv = unzip -c -a

回答by thouliha

Use the open document extension .fods. It's a plain, uncompressed XML markup format that both Excel and LibreOffice can open, and the diffs will look good.

使用打开的文档扩展名.fods。它是 Excel 和 LibreOffice 都可以打开的简单、未压缩的 XML 标记格式,并且差异看起来不错。

回答by Bjoern Stiel

We've built an open-source Git command line extension for Excel workbooks: https://www.xltrail.com/git-xltrail.

我们为 Excel 工作簿构建了一个开源 Git 命令行扩展:https: //www.xltrail.com/git-xltrail

In a nutshell, the main feature is that it makes git diffwork on any workbook file formats so that it shows the diff on the workbook's VBA content (at some point, we'll make this work for the worksheets content, too).

简而言之,主要功能是它可以git diff处理任何工作簿文件格​​式,以便它显示工作簿 VBA 内容的差异(在某些时候,我们也会对工作表内容进行此项工作)。

It's still early days but it might help.

现在还为时尚早,但可能会有所帮助。

回答by eriklind

This Excel utility works very well for me:

这个 Excel 实用程序对我来说效果很好:

Version Control for Excel

Excel 版本控制

It is a quite straightforward versioning tool for workbooks and VBA macros. Once you commit a version, it is saved to a Git repository on your PC. I never tried it re. SQL schema files, but I'm sure there's a way around.

对于工作簿和 VBA 宏来说,它是一个非常简单的版本控制工具。提交版本后,它会保存到您 PC 上的 Git 存储库中。我从来没有尝试过。SQL 模式文件,但我相信有办法解决。

回答by g19fanatic

As mentioned in another answer's comment, .xlsx files are just XML.

正如另一个答案的评论中提到的,.xlsx 文件只是 XML。

To get to the XML directory (which is git-able), you have to "unzip" the .xlsx file to a directory. A quick way see this on Windows is to rename the file <filename>.xlsx to <filename>.zip, and you'll see the inner contents. I'd store this along with the binary so that when you checkout, you do not have to do other steps in order to open the document in Excel.

要访问 XML 目录(它是git-able),您必须将 .xlsx 文件“解压缩”到一个目录中。在 Windows 上查看此内容的一种快速方法是将文件 <filename>.xlsx 重命名为 <filename>.zip,您将看到内部内容。我会将它与二进制文件一起存储,以便在您结帐时,您无需执行其他步骤即可在 Excel 中打开文档。

回答by stenci

My approach with Excel files is similar to Jon's, but instead of working with the raw Excel text data I export to more friendly formats.

我使用 Excel 文件的方法与 Jon 的类似,但我没有使用原始 Excel 文本数据,而是将其导出为更友好的格式。

Here is the tool that I use: https://github.com/stenci/ExcelToGit/tree/master

这是我使用的工具:https: //github.com/stenci/ExcelToGit/tree/master

All you need is to download the .xlsm file (click the View Raw link on this page.) Don't forget to check the Excel setting as described in the readme. You can also add the code to export SQL data to text files.

您只需要下载 .xlsm 文件(单击此页面上的查看原始链接。)不要忘记检查自述文件中描述的 Excel 设置。您还可以添加代码以将 SQL 数据导出到文本文件。

The workbook is both a converter from binary Excel to text files and a launcher of the windows Git tools, and it can be used also with non Excel related projects.

该工作簿既是从二进制 Excel 到文本文件的转换器,也是 Windows Git 工具的启动器,也可用于非 Excel 相关项目。

My working version is configured with dozens of Excel workbooks. I use the file also to open Git-gui for non Excel projects, just adding the git folder by hand.

我的工作版本配置了数十个 Excel 工作簿。我也使用该文件为非 Excel 项目打开 Git-gui,只需手动添加 git 文件夹。