Git sparse-checkout and partial clones for Mega-Repos
Recently I was planning to make a small contribution to project DefinitelyTyped/DefinitelyTyped. It is a huge repository that hosts and provides the type declaration files for thousands of packages on npm, with a structure like the following
types/
├── 11ty__eleventy-img
├── 1line-aa
├── 3box
├── ...and more of them...
Each sub-folder under the types/
directory corresponds to a specific npm package.
The package I am interested in is marked
, whose type declaration files reside in types/marked/
directory. Towards this end, cloning the whole repository and doing a full checkout is considered worthless, since my disk space and network traffic would be eaten up by a bunch of irrelevant files.
Then I read several lines of instructions in the README file, prompting that I could pair partial clone and sparse-checkout to achieve more efficient workflow. The instructions target a newer version of git 2.27, which doesn’t work for my environment with git 2.25. After some research I found the ones at Bring your monorepo down to size with sparse-checkout | Github Blog for my case. If you would apply the steps in this post, make sure to check the version of git on your hand.
To start with, we clone the forked repository with some additional arguments:
$ git clone --filter=blob:none --depth=1 --no-checkout [email protected]:hsfzxjy/DefinitelyTyped
With --filter=blob:none
option, files in the repo won’t be fetched until they are needed in the future. This would be helpful where the repo contains large amounts of files but those of interest only take up a small proportion.
The --depth=1
option creates a shallow clone with the commit history truncated and accelerates the cloning process. If you execute git log
afterwards, only a single commit would be displayed.
And finally, the --no-checkout
option tells git to not checkout any files, leaving simply a .git
directory in the working area.
Finished the cloning, we run a special command to initialize the configuration for git sparse-checkout:
$ cd DefinitelyTyped/
$ git sparse-checkout init --cone
The --cone
option, as described here, enables the “Cone Mode” that brings improved performance during the checkout. After that, we use git sparse-checkout set <pattern>
to take specific part of the repository into the working area:
$ git sparse-checkout set types/marked
Now we should have the needed files ready under the root directory and the types/marked/
directory, if you like a tree
command to see:
$ tree
.
├── azure-pipelines.yml
├── dangerfile.ts
├── LICENSE
├── notNeededPackages.json
├── package.json
├── README.es.md
├── README.it.md
├── README.ja.md
├── README.ko.md
├── README.md
├── README.pt.md
├── README.ru.md
├── README.zh-Hans.md
└── types
└── marked
├── index.d.mts
├── index.d.ts
├── marked-tests.ts
├── OTHER_FILES.txt
├── package.json
├── tsconfig.json
├── tslint.json
└── v3
├── index.d.ts
├── marked-tests.ts
├── tsconfig.json
└── tslint.json
3 directories, 24 files
From here on, the daily git operations such as commits and pushes would proceed as per usual.
References
- https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
- https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
Author: hsfzxjy.
Link: .
License: CC BY-NC-ND 4.0.
All rights reserved by the author.
Commercial use of this post in any form is NOT permitted.
Non-commercial use of this post should be attributed with this block of text.
OOPS!
A comment box should be right here...But it was gone due to network issues :-(If you want to leave comments, make sure you have access to disqus.com.