Sharing code between projects with git subtree
Thursday, 4 February 2010, 20:18I came across a problem recently. I have a project called xBlip which I've described before – it's an iPhone client for a Polish Twitter-like service blip.pl. This project has a backend part which I keep in a subdirectory “ObjectiveBlip” and which I've tried to keep as separate from the rest as possible, with the intention that it might be one day extracted as a separate project.
Now I got the idea that I could write a desktop application for Mac that does the same – and of course I could reuse that backend part for that. I would also like to create a separate project on Github with just the backend, so that theoretically someone might use it in future for some purpose.
But this means that I would be maintaining three separate copies of the same code, which I'd have to keep in sync somehow. So the question is, how to do this best?
There are a few ways in Git to share code between projects (for example, git submodules) – but most of them are intended only for one-way communication, i.e. downloading updates to a library maintained by someone else into your project. Here, I want to have a two-way communication: I could extend the backend code while working either on the iPhone or the Mac application (working directly on the backend-only project wouldn't usually make sense), and then broadcast the changes into the other two projects. I also don't want the solution to be inconvenient to people who download the project, as is the case with git submodules – you have to manually update them once you download the main code, initially their directories are empty.
So I started looking for a way to pull this off – and I found a script called “git subtree” which seems to do exactly what I need (confusingly, there's another plugin also called git subtree which is completely unrelated to the first one…). It took me some time (and a few emails to the author, Avery Pennarun) to figure out how to use it, so I thought I'd post a tutorial here in case anyone has a similar situation.
So, here's what we need to do… (grab a coffee, it's going to be long):
Starting point
We have one project – xBlip – with the backend code in ObjectiveBlip/ and the UI code in other subdirectories. We want to make a second project with just the backend code, and a third one with a Mac application which reuses it, and set up a way to sync the changes between these three.
Extracting ObjectiveBlip
There are (at least) two ways to extract the backend project from xBlip: I can either use git subtree to extract whole ObjectiveBlip's history, or I can copy the files manually. If I chose the first option, I'd do:
git subtree split -P ObjectiveBlip -b export
This would create a new 'export' branch in my repo, containing only the commits and changes that had anything to do with the ObjectiveBlip directory, and ignoring anything that happened outside it. That way, the new project would have some kind of history from the beginning. Then, I would create a new repository out of that specific branch (I've learned that trick from Avery):
cd ~/Projects
mkdir ObjectiveBlip
cd ObjectiveBlip
git init
git fetch ../xblip export
git checkout -b master FETCH_HEAD
This looks weird because normally when you create a new repo (git init), the first thing you do is make the initial commit. Here, we instead fetch existing commits from an existing repo, and only commits from a specific branch (“export”), and then we manually create a master branch out of the fetched commits.
I've decided not to do that; the reason is that the commits that would form ObjectiveBlip's history weren't created with this separate project in mind – they were done as a part of coding on xBlip. And while it's possible to extract only the relevant information with git subtree, the commits just wouldn't always make sense. It would all be a bit artificial.
So instead I extracted the files manually and created a fresh project with no history:
cd ~/Projects
cp -R xblip/ObjectiveBlip .
cd ObjectiveBlip
git init
git add .
git commit -m "extracted ObjectiveBlip from iPhone xBlip"
git remote add origin git@github.com:psionides/ObjectiveBlip.git
git push origin master
Adding ObjectiveBlip back to xBlip as a subproject
In order to move commits around between projects, I need to have ObjectiveBlip repo added as a remote in both application projects. I will then see all ObjectiveBlip commits in a separate branch (objblip/master), and I will decide how to copy commits between that branch and the master branch.
cd ~/Projects/xblip
git remote add objblip git@github.com:psionides/ObjectiveBlip.git
git fetch objblip
The graph in GitX looks like this at this point:

(I cheated a bit and did some tricks involving commit --amend in order to force GitX to draw the graph this way – the author didn't really foresee such configuration and GitX kind of freaks out sometimes when you work with git subtree, and shows long and messy lines, or even lines that break and continue somewhere else…)
To add ObjectiveBlip into the master branch as a subproject, I need to delete the existing files first:
git rm -r ObjectiveBlip
git commit -m "removed ObjectiveBlip files"
Now, to join the subproject I need to use 'git subtree add' with the option '--prefix ObjectiveBlip' (or -P ObjectiveBlip). There are actually two ways to do that; I can do it either with an additional option --squash, or without it. Squash means that the subproject commits that you add into your main project are merged into one.
Let's try the version without squash first:
git subtree add -P ObjectiveBlip -m "readded ObjectiveBlip as a subproject" objblip/master
If you don't use squash, the commits will be kept intact, so both application projects will contain a complete history, commit by commit, of the changes in ObjectiveBlip code. They will form a separate timeline parallel to your main one, but it will be connected to your main timeline at the points of merges, so if you look at a one-dimensional commit list (e.g. Github “commits” page), it will show the backend commits mixed with frontend commits. What's worse, any commit you make to the subproject while working on the application's master and backport to the other timeline (and I certainly will be making commits this way, because it's easier to develop the backend if I can constantly test it in the actual app), will appear two times on the “commits” page – once in the main timeline, and once in the backend timeline.
I know, you probably didn't understand any of this. Maybe this graph will clear things up:

This is the state of the xBlip repository after a few commits made in the xBlip repo and in the ObjectiveBlip repo.
The left vertical line is the main (master) timeline, which contains normal code of my project, with ObjectiveBlip in a subdirectory. The right vertical line is the ObjectiveBlip's timeline which contains its files at the root of the project, and none of the UI code. Note that this isn't really a direct git merge, and you can't use plain git merge command to make the joins, or bad things will happen. You have to use git subtree to “translate” the commits for you.
Note also that the commit named “added foo to ObjectiveBlip” appears twice, once in the original version, and once in the “translated” version, and both versions will be visible on the “commits” page on Github. I could prevent that if I used 'git rebase' to delete the commit in the left timeline after I copied it to the right timeline, but that's one extra thing I'd have to remember…
Merging and splitting
After the initial merge with 'git subtree add', for subsequent merges you use 'git subtree merge' (for any command, you need to remember to use the prefix option to tell it the location of the subdirectory). If you make any commits to the subproject inside master, you can use 'git subtree split' to backport the commits to the right timeline; pass it a --branch option with a name of a branch to be created or updated to point to the newest commit, and then push it to the external repository. Note that it's usually better to keep the changes to files inside subproject's directory and to files outside it in separate commits, e.g. make a commit “added foo to ObjectiveBlip” and then separately “added FooController in the UI”, even if you worked on both parts simultaneously.
Here's a list of commands that were used to create the graph above:
# add the subproject - creates the first merge point,
# adds a ObjectiveBlip/ subdirectory
git subtree add -P ObjectiveBlip -m "readded ObjectiveBlip as a subproject" objblip/master
# after we create the commit "added readme"
# in external ObjectiveBlip repo:
git fetch objblip
git subtree merge -P ObjectiveBlip -m "merged changes in ObjectiveBlip" objblip/master
# after we make the changes to both UI and the backend while working
# in xblip master, we backport the relevant commit to the timeline on
# the right, and push it to objblip repo; note that the second commit
# is ignored, as it contains only changes unrelated to ObjectiveBlip
git subtree split -P ObjectiveBlip -b backport
git push objblip backport:master
# after we update readme in ObjectiveBlip repo:
git fetch objblip
git subtree merge -P ObjectiveBlip -m "merged changes in ObjectiveBlip" oblblip/master
Using squash
I've decided to use the version with --squash instead. If you use squash, you will actually have 3 timelines (!) in your application repo… First will be the master, second – the subproject one, and the third one will be the squashed one. What's important is that the squashed timeline will be merged with the master timeline, but the original subproject timeline will be kept completely separate, and you don't even have to push it to Github with your application project.
Again, a graph will (hopefully) clear this up a bit:

The left vertical line is the master, the right one contains the squashed commits. The subproject timeline – the one that is used to make pushes and fetches from the external repository – will appear separately from the rest, either on top, or at the bottom. You only need it locally and you don't need to push it to the 'origin' repo.
Here are the commands used this time:
git subtree add -P ObjectiveBlip --squash -m "readded ObjectiveBlip as a subproject" objblip/master
...
git fetch objblip
git subtree merge -P ObjectiveBlip --squash -m "merged changes from ObjectiveBlip" objblip/master
...
# note: for split, you don't pass --squash (there's currently no way to squash the backported commits)
git subtree split -P ObjectiveBlip -b backport
git push objblip backport:master
...
git fetch objblip
git subtree merge -P ObjectiveBlip --squash -m "merged changes from ObjectiveBlip" objblip/master
There are two practical differences in the way your commit timelines will look like between the two strategies:
- with squash, there will always be only one commit per merge in the right (squashed) timeline; this may be good or bad, depending on what you expect, but I think most of the time you probably won't need every single commit from the subproject to appear in your timeline
- with squash, commits backported from master to subproject will not appear second time in the right timeline, because they will be a part of one of the squashed commits
I believe you can use either of the approaches depending of how you want your commit graph to look like. But please pick one at the beginning and stick to it, or Bad Things (tm) will happen…
What about git submodule?
Friday, 5 February 2010, 13:51The question before should be: "What's wrong with git submodule?"
Anyway found it.
I still prefer to use git submodule :)
a/ It's better known.
Friday, 5 February 2010, 14:07b/ It's built-in into git suite.
Inconvenience for end users is one thing, but the bigger problem is that - as the git submodule manual says:
And that's the whole point, I wanted to have the backend extracted as a separate project, but still be able to make updates to it while working on one of the frontend apps (because it's much easier to test the changes in the backend if I can use the frontend to check if the new feature works).
Friday, 5 February 2010, 17:47Great writeup! I actually started writing my git subtree before I found out about apenwarr's. It uses the same technique (subtree merge) to to solve the same problem, it just goes about it differently.
You can see that I hit a a wall -- I ended up spending all my time working around Bash limitations rather than adding features, and I wasn't willing to take the time to rewrite it in Ruby or Perl, so... I guess my git-subtree is dead. Wish github had a place to put abandoned projects without deleting them.
Why not git submodules? Because once a submodule is in your project, you can't merge, bisect, etc, without a lot hassle. They basically prevent you from using the coolest features of git. They do work great if you have one developer who's moving forward on one branch. Once you add another programmer and start merging, it doesn't take long to discover first-hand why submodules are so horrible. :)
Tuesday, 16 February 2010, 00:28Hi, I am fairly new to versioning control in general but I git subtree seems to be what I need. I have asked a fairly basic usage question here though:
Tuesday, 23 March 2010, 22:58http://stackoverflow.com/questions/2503816
and would be greatful for any help.
I was searching for this kind of feature, and while I was reading this post and comments, an idea came to my mind. What if I put the subtree as a normal git-managed directory and just ignore it from the main tree? For example:
/ mainProject folder
|-- .git
|-- .gitignore file (contents: libProject/)
|-- ...
|-- [a lot of files and folders specific to mainProject]
|-- ...
|-- libProject
|- .git
|- [a lot of files and folders common to other projects]
Would that work?
Wednesday, 5 May 2010, 21:13Fixing the tree:
/ mainProject folder
Wednesday, 5 May 2010, 21:15|-- .git
|-- .gitignore file (contents: libProject/)
|-- ...
|-- [a lot of files and folders specific to mainProject]
|-- ...
|-- libProject
....|- .git
....|- [a lot of files and folders common to other projects]
Luciano: Yes, it would work in that you would have a subdirectory within the main project directory, you could update it separately and even push updates back to the sub-repo; but there's one problem: the directory wouldn't be included in the main project's repository - if you or someone else clones your main project's repo on a new machine, they wouldn't have the libProject directory at all. So this would only work if you agreed that anytime someone clones your project, they have to manually clone the subproject too - otherwise they won't be able to compile or run your project because the subproject's files will be missing.
Wednesday, 5 May 2010, 22:22