The One and the Many

Subversion to Git

I've recently started using Git professionally. At the company I worked at previously I used Subversion (SVN). Switching to Git has been a big change and a big improvement for me. I enjoy working with version control now whereas before I often felt frustrated. I thought it would be good to write about the contrast while the change is fresh in my mind.

I've worked with Git for years now (I've been on GitHub since 2010 and have many projects on there) but it's only recently that I've been working with it heavily day to day and in a workflow more complex than commit and push.

I'm also learning about Git concepts and best practices. I'm reading Pro Git and starting to feel like I know what is going on.

A note on workflows

There are many workflows possible with Git and SVN. I'm going to talk about Git and SVN from the perspective of workflows I've used. With both Git and SVN I've practiced a workflow involving topic branches that get merged to a central trunk/master.

Rewriting history

One thing I've found useful and enjoyable is the ability to edit commit history with Git. This is something you can't do with SVN.

I'll give you an example of where this is handy: I often make a series of commits and then realize that a subsequent change would fit in with an earlier commit. I could make another commit with this change, but then there are two commits where there could have been only one.

With Git I can make a new commit and then rebase (git rebase -i <earlier commit>) and squash the new commit into the earlier one easily. I don't even need a new branch. I do this frequently.

With SVN this is not possible. Approximating it is difficult enough that I wouldn't bother. It would be something like this:

  1. Create a new branch (off of wherever I branched for the current branch)
  2. Check out the new branch (requiring a new download of the repository)
  3. Figure out what commits I want to include first and get their revision numbers
  4. Merge commits up to that point (one by one if I wanted to retain separate commits)
  5. Merge the commit I want to include in an earlier commit
  6. Merge subsequent commits
  7. Delete the old branch

This is time consuming and error prone. In practice I never edited history like this when I used SVN. Editing history is not part of the usual SVN workflow, so comparing Git and SVN on this point is a little unfair.

A well crafted commit history is something that Git enables and that is not reasonably possible in SVN in my experience.

Preserving history

With SVN my workflow was to have a topic branch where I would have a series of commits addressing the issue/feature. When it came time to merge the topic branch into trunk they would all get squashed together into a single merge commit. Individual commits from the topic branches were lost. The only way to see commits from topic branches was to track down the name of the branch and look at the deleted branch's history.

Given that SVN's workflow is not focussed on creating a useful commit history, losing commits from topic branches is not entirely a negative. If you have a large number of low quality commits and no way to edit them to improve them, then there's little reason to keep them around. Squashing commits that have no value independently is a good thing.

There are times you want to retain individual commits from branches however. For example, if you find and fix a bug while working on your topic branch and include it a separate commit, this commit won't be separate when you squash and merge the topic branch. It won't be clear from looking at trunk's history that there was a fix for a bug. With Git it is easy to retain the separate commit.

Noodling with history

The ability to edit history has its downsides though. I've found myself editing commits trying to get them just right when I could be doing something else.

Before pushing a branch out for review I usually split up, squash, or edit commits, such as to give them clearer commit messages. Being able to do this makes me want to have beautiful and focussed commits with well written messages. With SVN I rarely bothered doing this since it would get lost and would at best be useful during code review.

This means I end up spending more time crafting commits when I use Git. I'm not convinced this is a negative though. There is a balance between how much effort you put into crafting history versus not of course, but I've found it pleasant and helpful. It is a tool for communicating with other developers, makes me think about how to describe a change and where a change belongs, and helps make the history in a repository useful.

If I need to look for where a bug was introduced, having to look through a large number of small but clear commits is easier than looking through a small number of large commits with lots of unrelated changes in them.

Immutability

While the ability to change history is great, the resulting lack of immutability is concerning.

With SVN once I make a commit it is there forever. There is no undo. I can only continue and make a commit amending prior commits, or make a new branch and start over.

You get a certain confidence from this that you don't get from Git. Trunk shows you exactly what happened.

However, this immutability is less valuable when you squash topic branches when you merge them. You have nice immutable history in the branch but it does not get included in trunk's history.

It's also less a point in SVN's favour when you consider it is a best practice to not alter the history of shared repositories in Git due to it making collaboration difficult. There is always the risk that this can happen of course, despite policies/restrictions to the contrary, so you're on less firm ground with Git.

Commit IDs

One aspect of SVN I prefer to Git is commit IDs.

SVN's are monotonically increasing numbers that are easy to read and type. For example, my topic branch might have commits 23142 through 23145. With Git, commit IDs are sha1 hashes (you can refer to them by prefix).

If I want to refer to a specific commit I find it more natural to use numbers than hashes. It is also obvious when I have two commits which happened first.

However, to make up for this Git has a natural syntax for referring to relative commits. If I want to refer to the parent of a commit, I can say <commit>~1. If I want to refer to its grandparent, <commit>~2, and so on.

Switching branches

Git makes switching branches easy and cheap. If I'm in the middle of working on something I can stash my changes or commit them to a branch (and later edit the commit to pick up where I left off), and switch to a new branch. Switching branches is nearly instant.

In SVN I have to check out a branch from the repository in a different path. This is time consuming since I have to navigate elsewhere and download the repository again. It isn't a no brainer to look at another branch since it can easily take minutes depending on the size of your repository.

Freshening up branches

A crucial part of workflows involving topic branches is keeping branches up to date as trunk/master changes.

After branching off trunk/master there are usually commits made prior to a branch being merged. Before merging a branch it is a good practice to include new commits in the branch. This helps avoid/resolve conflicts and also lets you check that the branch continues to work when taking other changes into account.

With Git this is simple. I pull down master and rebase my branch on it, replaying my commits. As Git replays the commits, I get dropped out to resolve conflicts if they occur. It's a quick and painless process. If there are no conflicts it is instant.

With SVN this is tedious. I make a new branch off of trunk to a branch named something like <topic>-rebranch, check out that branch (and wait for the repository to download), and then merge my original branch into the new branch.

I either have to merge each commit from the original branch one by one (which I never did), or merge the branch all at once and squash its commits down to a single commit, losing history. Doing the latter means I'm forced to deal with all conflicts at once from a potentially large commit.

Renamed files

Git makes comparing changes to renamed files easy. With SVN you have a delete and then an add and SVN is not able to show you the differences.

To see the differences during code review I manually retrieved copies of the file before and after and ran diff myself.

Git doesn't track based on filenames so its comparisons are more intelligent. It's a little thing but a time saver.

Local branches

Creating a branch in Git doesn't require pushing it to a shared repository.

I often find myself creating little branches, sometimes just to save something that isn't quite finished or to have a backup before I edit history or perform a larger merge. This too is a lightweight operation with Git.

With SVN this is impossible. Every branch made will be in the repository. If I want to have a temporary branch for some reason then I have to pollute this shared resource and make the branch would be visible to everyone.

Speed

Most operations with Git are quick compared to SVN. Simply looking at the commit log can take time with SVN as it has to contact the server. Even if your server is on your LAN there will be a noticeable delay. With Git since the entire repository is local most operations are near instantaneous.

GitHub

One of the things I've enjoyed about using Git is GitHub.

With SVN I used Trac. I appreciate certain features of Trac, such as the ability to easily view a diff of several selected changes at once (which is strangely difficult with GitHub), but GitHub has many little things to improve productivity. Here are two examples:

Wrapping up

I am a fan of Git. I don't hate SVN, but I think Git is the better tool. There are a lot of aspects that help me as a developer. Beyond helping me be productive, it's fun to use.

Tags: programming, version control, svn, git, subversion, github, productivity, trac, workflow