diff --git a/Git_Commit_Modification_Study/Git_Commit_Modification_Study.md b/Git_Commit_Modification_Study/Git_Commit_Modification_Study.md new file mode 100644 index 0000000000000000000000000000000000000000..2f3168379f94d13ddd0917e019b0c34dcf2558a3 --- /dev/null +++ b/Git_Commit_Modification_Study/Git_Commit_Modification_Study.md @@ -0,0 +1,93 @@ +# 关于 Git Commit 内容修改的研究 + +导师:马全一 + +作者:何若轻 + +邮箱:timeprinciple@gmail.com + +## 研究背景 + +为支撑 `P2P` 等协议的开发,实现将特定的文件内容修改/加密,以及在另一端感知/解密,于是尝试 Git LFS 将原文件用指针文件替换的工作逻辑广义化,使其可以用加密文件替换等。Git LFS 插件可以将 `git lfs track` 的文件内容用指针文件(包含被替换文件的 `oid`, `size` 等信息的小文件)在对应的 `commit` 替换,从而实现大文件存储与 Git 服务端的分离,又能保证其文件的一致性(通过对 LFS 服务端请求对应 `oid`, `size` 的文件)。对 Git Hook 的掌握可以帮助在 Git 其特定的命令前/后完成一定的处理,让 Git 的多向扩展成为可能。 + +## 研究方法 + +通过对 Git 的源码阅读发现,Git 的 Hook 是一组可以独立运行的可执行文件,存放在每个仓库的 `.git` 目录下,如图: + +![](images/2023-05-28-17-28-23-image.png) + +即使此时的系统没有安装 LFS 插件,仍然可见该目录下包含一组 hook: + +![](images/2023-05-28-17-29-56-image.png) + +从 Git 官方的文档可以得知,Git 仅会在特定的情况下执行特定名字的 hook。 + +在启用 LFS 插件之前,我们需要在对应的仓库下执行 `git lfs install`,随后可以看见对应的提示: + +![](images/2023-05-28-17-34-33-image.png) + +之后再次检查 hook 目录,可以发现新增了一些 hook : + +![](images/2023-05-28-17-42-22-image.png) + +新增的 hook 没有 `.sample` 后缀,这样的 hook 处于被 Git 启用的状态。以 `post-commit` 为例: + +![](images/2023-05-28-17-47-06-image.png) + +其中,`"$@"` 为所有被传入给该脚本的参数。由此可知,对应的 hook 脚本会执行 `git lfs post-commit` 命令从而在 commit 中替换被 LFS 管理的大文件为指针文件。 + +通过阅读 `git lfs post-commit` 命令的源码发现,该命令是在修改文件的读写权限,并未修改 `commit` 中 `blob` 的内容。需要重新分析该过程的调用栈: + +设置 `GIT_TRACE=1` 来观察调用栈: + +1. `git lfs intsall`: ![](images/2023-05-29-10-05-35-image.png) + +2. `git lfs track `: + + ![](images/2023-05-29-10-08-52-image.png) + +3. `git add `: + + ![](images/2023-05-29-10-11-19-image.png) + + 从该命令的调用栈可以发现 `git lfs filter-process` 子命令被调用。分析得知,`git add` 命令的行为会受到 `.gitattributes` (被 `git lfs track` 命令引入的文件)的影响。如果删掉 `.gitattributes` 之后再进行 `git add`,得到的 `commit` 是原文件而非指针文件: + + ![](images/2023-05-29-11-07-11-image.png) + + 通过 insertions 的数量可以判断该文件为原文件,并非指针文件。 + +猜测该指针文件替换原文件的行为发生在 `git add` 过程中。`.gitattributes` 的文档指出,`filter` 字段的设置确会影响到 Git 的行为: + +![](images//2023-05-29-11-34-49-image.png) + +分析 `git lfs filter-process` 的源码得知,该实现使用了名为 `git-lfs/gitobj` 的 Go 语言库,从而使能对 Git 对象的操作和修改: + +![](images/2023-05-29-13-01-05-image.png) + +## 研究结果 + +对于 Git 客户端而言,`hook` 是一组特定名称(对应特定的 Git 行为)的可执行脚本(可以使用其他的脚本语言 Python,Perl等)。这些脚本会在其名称描述的时机被执行,核心逻辑为在脚本中指明需要执行的命令或文件。需要注意以 `.sample` 后缀结尾的 `hook` 是处于没有被启用的状态,如需启用对应的 `hook`,要去掉该后缀。 + +以 `post-commit` 为例,只需要让我们自己的程序在该脚本文件中被执行,就能达到 LFS 在commit 中替换大文件内容为指针文件的效果: + +![](images/2023-05-28-18-21-13-image.png) + +> 可以直接使用的 `hook` 以及各自对应的参数,用途,调用时机可以在该链接中找到: https://manpages.ubuntu.com/manpages/jammy/man5/githooks.5.html + +经过研究可以得出如下 LFS 替换原文件为指针文件的工作流程如下图: + +![](images/git-lfs-filter-workflow.png) + +在原始文件被 `git lfs track` 之后,会为被 track 的文件在 `.gitattributes` 下标注上 `lfs` 的 filter,该 filter 随后被用于 `git add`, `git checkout` 等命令,在上图的案例中,`filter process` 调用了 `clean filter`,将原文件内容替换为指针文件(`smudge` 为相反过程),这样在 Git 看来,staging area 内的该文件就是一个指针文件,从而减小了对于源码仓库存储服务器的容量需求,也达到了在该服务器上的内容替换。 + +为能修改,甚至是加密 commit 到 repo 的内容,需要实现对应的 `filter` (该 filter 会对 Git 对象进行修改)以及附带的 `clean` 与 `smudge`,之后在 `~/.gitconfig` 下配置实现的filter: + +![](images/2023-05-29-13-23-16-image.png) + +并在 repo 的 `.gitattributes` 字段中为需要修改/加密的文件注明对应的 `filter`: + +![](images//2023-05-29-13-24-02-image.png) + +可以得到如下结果: + +![](images/2023-05-29-13-24-41-image.png) diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-28-23-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-28-23-image.png new file mode 100644 index 0000000000000000000000000000000000000000..98fadebe56d98715941dde81061162d972ca73f5 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-28-23-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-29-56-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-29-56-image.png new file mode 100644 index 0000000000000000000000000000000000000000..1136b4c92a24950ec701053e90ff283d5dc5eceb Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-29-56-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-34-33-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-34-33-image.png new file mode 100644 index 0000000000000000000000000000000000000000..79e57ae625225841417a00cd15639e1e71e79007 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-34-33-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-42-22-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-42-22-image.png new file mode 100644 index 0000000000000000000000000000000000000000..7cdae8d76bfb1a0fce1036ae81534dde34865b64 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-42-22-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-46-13-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-46-13-image.png new file mode 100644 index 0000000000000000000000000000000000000000..89e3e057164f2e0c8ba9ad1e1d6690bb44530aa4 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-46-13-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-17-47-06-image.png b/Git_Commit_Modification_Study/images/2023-05-28-17-47-06-image.png new file mode 100644 index 0000000000000000000000000000000000000000..07e837c39744e87620b46a181e29932aa6c09c5b Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-17-47-06-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-28-18-21-13-image.png b/Git_Commit_Modification_Study/images/2023-05-28-18-21-13-image.png new file mode 100644 index 0000000000000000000000000000000000000000..b1f8a8d174c011868952a7c4c899dba23a14fec8 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-28-18-21-13-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-10-05-35-image.png b/Git_Commit_Modification_Study/images/2023-05-29-10-05-35-image.png new file mode 100644 index 0000000000000000000000000000000000000000..468a1f85ebe052f2aa200edcf38944a2a115a84c Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-10-05-35-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-10-08-52-image.png b/Git_Commit_Modification_Study/images/2023-05-29-10-08-52-image.png new file mode 100644 index 0000000000000000000000000000000000000000..493a3eeb0ff4549cb2d53ee799866fd855fc74c1 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-10-08-52-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-10-11-19-image.png b/Git_Commit_Modification_Study/images/2023-05-29-10-11-19-image.png new file mode 100644 index 0000000000000000000000000000000000000000..047e5d9674f2357a3a7959815c2811ffaf2fda27 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-10-11-19-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-11-07-11-image.png b/Git_Commit_Modification_Study/images/2023-05-29-11-07-11-image.png new file mode 100644 index 0000000000000000000000000000000000000000..2df4d65bb3f72ff5306a2ef0d34c954e7e019500 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-11-07-11-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-11-34-49-image.png b/Git_Commit_Modification_Study/images/2023-05-29-11-34-49-image.png new file mode 100644 index 0000000000000000000000000000000000000000..2aeeb74dbe9222b32f928b90ee8a80463cf2adf4 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-11-34-49-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-13-01-05-image.png b/Git_Commit_Modification_Study/images/2023-05-29-13-01-05-image.png new file mode 100644 index 0000000000000000000000000000000000000000..ba576ac9c6a8335b29e92ef91ad3d389ea50ec7e Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-13-01-05-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-13-23-11-image.png b/Git_Commit_Modification_Study/images/2023-05-29-13-23-11-image.png new file mode 100644 index 0000000000000000000000000000000000000000..49400675cced6130fe0804299e6285e155ac7935 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-13-23-11-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-13-23-16-image.png b/Git_Commit_Modification_Study/images/2023-05-29-13-23-16-image.png new file mode 100644 index 0000000000000000000000000000000000000000..49400675cced6130fe0804299e6285e155ac7935 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-13-23-16-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-13-24-02-image.png b/Git_Commit_Modification_Study/images/2023-05-29-13-24-02-image.png new file mode 100644 index 0000000000000000000000000000000000000000..0390bb82470a9ebcc7912ec61bde355dd5a6e105 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-13-24-02-image.png differ diff --git a/Git_Commit_Modification_Study/images/2023-05-29-13-24-41-image.png b/Git_Commit_Modification_Study/images/2023-05-29-13-24-41-image.png new file mode 100644 index 0000000000000000000000000000000000000000..56224bd57643ec232e3d56a32270a731e2620c78 Binary files /dev/null and b/Git_Commit_Modification_Study/images/2023-05-29-13-24-41-image.png differ diff --git a/Git_Commit_Modification_Study/images/git-lfs-filter-workflow.png b/Git_Commit_Modification_Study/images/git-lfs-filter-workflow.png new file mode 100644 index 0000000000000000000000000000000000000000..bc3ad11aa584db9525eed706d5aa780a330dd0d4 Binary files /dev/null and b/Git_Commit_Modification_Study/images/git-lfs-filter-workflow.png differ