作者简介

方法一

王振威,CODING 创始团队成员之一,多年系统软件开发经验,擅长
Linux,Golang,Java,Ruby,Docker 等技术领域,近两年来一直在 CODING 从事系统架构和运维工作

http://stackoverflow.com/questions/25815202/git-fetch-a-single-commit

前言

The git fetch command delivers references (names, not raw
commit-IDs) to the remote, more or less.

最近 Google 发布了一篇文章,描述了对 Git 的一个传输协议的更新,引起了国内技术圈的不小规模的轰动(相关文章请自行百度“Git v2 性能提升”)。
很多技术圈的朋友也在转载这个新闻,那至于性能改进有多大,里面的细节是什么呢?事实上这次改动只在极端情况下有性能提升,绝大多数情况下,用户感受不到性能的提升。很多不明所以的转发大概是因为
Google 的品牌效应吧 :)

 

Git 是什么?

(More specifically, use git ls-remote remotename to see what the
remote is willing to give you in terms of names.

为了讲清楚 why,我们先来简单介绍一下 Git 相关的协议。如果你还不了解 Git,想了解更多内容,可参考其官方网站: . 也可来
这里了解如何在国内使用优质快速的
Git 托管服务。

 

Git 传输协议

This produces a list of SHA-1s on the left with names on the right, and
the only thing your fetch can ask for is the names-on-the-right.

Git 常见的有三种协议,SSH,HTTP(S),Git,使用最广泛的是前两种。

 

让我们来看一下, HTTP(S) 和 SSH 协议的使用示例

At which point you’ll get the ID-on-the-left if the name on the remote
still points to that ID, so it depends on how actively that remote gets
updated.)

git clone

 

Cloning into ‘coding-demo’…

 

remote: Counting objects: 3, done.

 

remote: Total 3 (delta 0), reused 0 (delta 0)

It is possible, in various ways, to deliver raw commit-IDs to a remote
and ask that remote what is visible starting from that point,

Unpacking objects: 100% (3/3), done.

 

git clone git@git.coding.net:wzw/coding-demo.git

and sometimes working backwards through history as well, but not
via git fetch.

Cloning into ‘coding-demo’…

 

remote: Counting objects: 3, done.

(You can use git archive but the remote can decide whether or not to
allow you to access via raw commit-IDs;

remote: Total 3 (delta 0), reused 0 (delta 0)

 

Receiving objects: 100% (3/3), done.

or with remotes that have web server access, including to specific
commits, you can often just view the top-level contents of a commit,

可以看到,对于全新 clone 来讲两者基本上的过程是一模一样的。

 

事实上, Git 底层对于各种应用层协议的底层处理是一致的,不管是
HTTP(S) 还是 SSH 还是 Git 协议。

and use that to “drill down”, as they say, to the various pieces. But
that is a very slow way to do it.)

让我们来进一步看一下, Git 在传输过程中都做了什么。

 

GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone

 

17:48:21.767799 git.c:344               trace: built-in: git ‘clone’

 

Cloning into ‘coding-demo’…

If you’d like to use git fetch to get some particular commit, probably
the easiest way to do that is to have someone with access to the remote
attach a name—most likely a tag—to that commit ID.

17:48:21.797959 run-command.c:626       trace: run_command:
‘git-remote-https’ ‘origin’ ”

 

17:48:22.278880 pkt-line.c:80           packet:          git< #
service=git-upload-pack

Then you can have your git fetch bring over that refspec, and put it
under any other refspec you like.

17:48:22.279390 pkt-line.c:80           packet:          git< 0000

 

17:48:22.279405 pkt-line.c:80           packet:          git<
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEADmulti_ack thin-pack
side-band side-band-64k ofs-delta shallow deepen-since deepen-not
deepen-relative no-progress include-tag multi_ack_detailed no-done
symref=HEAD:refs/heads/master agent=git/2.15.0

For instance, suppose you can ssh directly to whatever hosts origin:

17:48:22.279419 pkt-line.c:80           packet:          git<
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master

 

17:48:22.279431 pkt-line.c:80           packet:  

$ ssh our.origin.host 'cd /repos/repo.git; git tag temporary f1e32e1'
[enter password, etc; observe tag created]
$ git fetch origin refs/tags/temporary:refs/heads/newbranch
[observe fetch happen; now you have local branch 'newbranch']
$ ssh our.origin.host 'cd /repos/repo.git; git tag -d temporary'

2.jpg (上传于2018-05-27 12:00:33)
澳门新萄京官方网站 1

 

好,基础知识补充完毕,有没有发现火爆的区块链在技术层面上跟
Git 的存储是有相似之处的 :)

Note that the name need not be a branch, it need only be a reference you
can pull over with git fetch and see with git ls-remote.

在 Clone 过程中,服务器端首先会推荐给客户端一些
ref 列表,这也是 Git v2 协议号称的性能改进的地方,后文有解释。

 

像这样:

You then use a name that will match that on the left-hand-side of your
refspec when fetching.

17:49:19.772436 pkt-line.c:80           packet:        clone<
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master

 

17:49:19.772527 pkt-line.c:80           packet:        clone<
1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc

The name created in your repo is controlled by the right-hand-side of
the refspec (refs/heads/newbranch in the example above).

17:49:19.772549 pkt-line.c:80           packet:        clone<
1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd

 

17:49:19.772566 pkt-line.c:80           packet:        clone<
30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0

 

17:49:19.772863 pkt-line.c:80           packet:        clone<
1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}

 

很显然,上文中的 40 位16进制数字就是对应后面的 ref 指向的对象 ID。

This is also the answer to your last paragraph question: you can only
name things that have names on the remote (this is partly intended to
avoid “leaking” unnamed commits that remain in a repository before
garbage-collection, so it’s considered a feature rather than a bug).
These names go on the LHS of the refspec. Your own names go on the
right.

而客户端,只需要依据自己感兴趣的
ref 和自己本地已经存在的对象库(对于
pull 和 fetch 来讲,本地有对象库,对于 clone 来讲本地还没有对象库,那么他就是需要所有的感兴趣的对象)。

 

在客户端计算完毕自己感兴趣的对象列表后,会用
want 指令告诉远端服务器。

 

17:49:19.776185 pkt-line.c:80           packet:        clone> want
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed
side-band-64k thin-pack ofs-delta deepen-since deepen-not
agent=git/2.15.1.(Apple.Git-101)

 

17:49:19.776215 pkt-line.c:80           packet:        clone> want
fdacba1d541c75bd48f2cd742ee18f77ea3517a1

Your name on the right is assumed to be a branch or tag name (based on
what the name on the left matches, though you can explicitly spell
out refs/heads/ or refs/tags/ to override it), so even
though f1e32e1... is a valid SHA-1, it’s treated as a branch name
here—the missing name on the left translates to HEAD, as missing names
almost always do—and git fetch creates a branch whose name is
disturbingly SHA-1-ish. (Incidentally I once created a branch name that
looked like an SHA-1, and later confused myself. I forget exactly what
the name was, something like de-beadwithout the hyphen. I renamed it
to the hyphenated version just to make it clear I didn’t mean a raw
commit ID澳门新萄京官方网站,! 🙂 )

17:49:19.776224 pkt-line.c:80           packet:        clone> want
1536ad10fc0a188c50680932ca191c8da46938c4

 

17:49:19.776232 pkt-line.c:80           packet:        clone> want
1536ad10fc0a188c50680932ca191c8da46938c4

貌似是无解的,远端必须给第一个commit起了名字,或者创建分支,或者Tag

17:49:19.776239 pkt-line.c:80           packet:        clone> want
30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e

才能通过fetch来获取

如果客户端执行的是 pull 或者 fetch ,他还会告诉远端自己已经有了什么对象(在文章的后面,我们会补充一段专门说明此点)。

 

远端服务器会根据客户端想要的对象以及客户端已经有的对象并对比自身的对象库和对象依赖关系,将客户端必须的对象整理起来并打包压缩传给客户端。

 1. 本地初始化一个版本库

客户端收到对象包后,解包并校验对象,并更新引用的对应指向。

git init

Google 在 Protocol version 2 做了什么

2.添加远端

完整的 version 2 的协议说明在这里:

git remote add origin

这里我们对其做的主要改动做些说明,主要有三点:

3.参照上面的链接处理

服务端引用过滤

 git ls-remote origin >origin.txt

新特性的易扩展性升级(例如可声明想要什么
ref)

把git ls-remote origin的结果输出到origin.txt的文件中

简化的客户端 HTTP 协议处理

 

被很多标题党夸大其词的主要是其第一点:服务端引用过滤。

d6602ec5194c87b0fc87103ca4d67251c76f233a   refs/tags/v0.99

Google 官方的博客中对此段的描述是这样的:

$ git pull origin v0.99
remote: Counting objects: 4508, done.
remote: Compressing objects: 100% (234/234), done.
emote: Total 4508 (delta 1498), reused 1409 (delta 1409), pack-reused
2865R
Receivjects: 100% (4508/4508), 980.00 KiB | 334.00 KiB/s
ing objects: 100% (4508/4508), 1.08 MiB | 334.00 KiB/s, done.
Resolving deltas: 100% (3056/3056), done.
From
* tag v0.99 -> FETCH_HEAD
error: Trying to write non-commit object
d6602ec5194c87b0fc87103ca4d67251c76f233
a to branch refs/heads/master
fatal: Cannot update the ref ‘HEAD’.

The main motivation for the new protocol was to enable server sid

或者

git fetch origin v0.99

这个命令不会出错,但是本地没有代码,也找不到方法把代码弄出来。【因为本地一次提交都没有,是没有HEAD的】

 

$ git rev-list FETCH_HEAD –count
1076

//在版本0.99的时候,一共有1076次提交

 

$ git reset –hard d6602ec5194c87b0fc87103ca4d67251c76f233a  
【这个也没啥用】
HEAD is now at a3eb250 [PATCH] alternate object store and fsck

 

//HEAD的指向和master的指向是一致的

8d530c4d64ffcc853889f7b385f554d53db375ed HEAD    

//5个分支
ee6ad5f4d56e697c972af86cbefdf269b386e470 refs/heads/maint
8d530c4d64ffcc853889f7b385f554d53db375ed refs/heads/master
c07a1e8782dadcedeffd389aa9bce4fda5b0983c refs/heads/next
9b9e9adbccf975e4ffc7af213fbf55f187e752bf refs/heads/pu
49a1e5ee48904bfe562388041bdcdb3d8ad21d10 refs/heads/todo

网站地图xml地图