Saturday, September 25, 2010

My Distributed Version Control System comparison

If you want to cut to the chase, I've been using Bazaar for 6 months now and I really like it. Read more below for background and details on why I chose Bazaar over other VCS'es.

A while back I had decided that I wanted to switch the version control system that we use for the main project I'm on at my job. The company has used a propriety, licensed version control system called Accurev for years now (as a side note, most of the work at the company is not Ruby on Rails, but is C, C++, and C#) Accurev is pretty easy to pick up and start using but has lots of shortcomings and can be pretty inflexible, not to mention the fact that there is a yearly per developer license cost.

So 7-8 months ago I finally decided to evaluate other version control systems for our project, and possibly for the rest of the company. Here are my requirements for a new VCS:
  1. Free and preferably open source
  2. Easy branching so each task people are working on can be in its own branch
  3. Easy merging
  4. Can use a central server for the main repository with secure, encrypted transport to server
  5. Developers still need to be able to use VCS if server is down or if they have no Internet
  6. Command line support for all main operations
  7. GUI available for performing basic commands, viewing commit log, doing diffs, and performing merges. Preferrably GUI is built in to VCS (I don't want to have to do a whole other study on what 3rd party GUIs to use)
  8. Works the same in Windows, Mac, and Linux (I know what you're saying, why Windows - this isn't for our project but for other non Rails projects at the company, the majority of which are Windows)
  9. Has a large community of users (not some new VCS that could go away in a year)

Because of #5, traditional free VCSes like Subversion fall short. So, I evaluated the 3 main distributed version control systems. A problem with all 3 that will hopefully get better with time (although Git has had plenty of time, so I'm not sure if it will) is a lack of documentation. Most of the examples for all 3 seem to be geared either towards a single developer, or a large, widely distributed open source project. The work on my project is neither of those - it's a group of 3 people working on a company proprietary project.

Git is by far the most widely used distributed version control system. I started using git before this evaulation for personal projects. If you are working on an open source project, this is definitely the way to go, git gives you a ton of functionality, and using is really easy. And it's extremely fast and efficient, it definitely has the others beat in that department. Setting up a central server repository is very easy, it can use SSH so you don't even need to have git installed on the server. But I eliminated git pretty quickly, here's why:
  1. Doesn't work too well in Windows. You can either go through the hassle of using it in Cygwin, which is not fun, or use the Google project msysgit. During the Pragmatic Studios Advanced Ruby on Rails class that I took earlier in the year, I saw about half of the class who had Windows laptops struggling the entire class to get git to install and work correctly. Since this time, msysGit has come a long way, but I still believe that the other VCSes work better in Windows.
  2. Terrible built in GUI. The gitk program that comes with Git looks like a UNIX GUI from 1995. Even forgetting how bad it looks, it isn't too functional, it won't do much. I did a brief investigation to find a 3rd party GUI but couldn't find much. While I use the command line for most of my work, it's no substitute for a GUI for doing merges, diffs, and viewing branches, forks, and merges in the log.
  3. Revisions are identified by a long hash tag. For convenience you can refer to the revisions with just the last 6 characters of the tag. Not everyone writing source code is so hardcore that they remember all of their work with a 6 hex digit identifier. Which would you rather remember, revision 121.5, or revision 6f88ca?
  4. User unfriendliness/complicated - Using git just seems to require having to learn too much. This is more of an intangible thing. While geeks who eat, breathe, and sleep coding (still not sure if I'd put myself in that category) can easily pick up git, seeing how a large group of regular developers struggled with Git at the Advanced Ruby and Rails class did not give me a good feeling for using Git with other developers.
Next up is Mercurial. It's very similar to Git. A main difference is that it's built with Python and runs well on any OS that Python runs on. The user experience is pretty much the same on Windows, Mac, and Linux. But I decided not to use Mercurial for some of the same reasons as Git:
  1. Bad built in GUI. Not quite as ugly as the git GUI, but pretty much just as useless.
  2. User unfriendliness/complication. Same as git, it just seemed a bit too complicated for regular developers to be comfortable with. I'll admit I didn't look in to Mercurial too extensively but it just seemed too much like git for my project.
  3. Slow and inefficient - mercurial definitely seemed to run a little slower than git, and according to the benchmarks on Bazaar's site, the repository takes up much more space than a git or bazaar repository.
Bazaar isn't quite as widely used as Mercurial, but does have the support of the Ubuntu group and MySQL, and a code hosting site which is like github (but also has bug tracking, mailing lists, etc.). I decided to go with Bazaar. My team has been using it for abou5 months now, and things have gone pretty well, everyone seems to like it (although it did take a little getting used to, coming from Accurev). For a good "why should I use Bazaar" page look at Let me explain my reasons why we're using it, and why I think it's better than the other VCSes:
  1. User friendliness is a core goal and not an afterthought. Somewhere on their web page they have a quote that a main goal of Bazaar is "version control for human beings". I don't think you'd ever see this quote anywhere about git, the attitude there seems to be more "it works perfectly for the Linux kernel development team so of course it will work for everything else". Most commands are very easy to use, and the work flow seems to be a little easier. Example - you don't have to add modified files. A commit will automatically commit modified files. Having to "add" an already tracked file in the same way that you actually add an untracked file to the repository, like you have to do in git, doesn't seem intuitive.
  2. Cross platform support - like Mercurial, it's built on Python and runs great in Windows, Mac, and Linux.
  3. Nice built in GUI - The Bazaar Explorer gui that comes with Bazaar is actually pretty nice. It looks nice, allows you to do almost all operations from the GUI, the log viewer is excellent. About the only downside is that there is no built in merge tool, you have to use a 3rd party tool (I've found diffmerge to work the best across all OS'es). But the hooks are in explorer to launch any 3rd party merge tool.
  4. Revision numbers are not hash tags - Each revision has a full hash tag that never changes (called revision ID), but there is also a "revision number", which is sequential. It's much easier to refer to revisions by a single number than a hash tag. A complication with this is that after a merge, revision numbers from the branch that you're merging with get renamed. Like say that revision 6 is the last common revision. New revisions 7 and 8 are created on branch A, and 7 is created on branch B. If a merge with B is done on branch A, what was 7 on branch B now becomes 6.1 (or 6.1.1), then a new revision 8 is created on branch B when you commit after performing the merge. At first this seems a little unsettling but you get used to it quickly. It also makes viewing the revision log after branches a little clearer, you immediately see the last common revision from the revision number and not by tracing back. I believe Mercurial works this way too but I didn't look in to it far enough.
  5. Both checkouts and branches. Bazaar has a checkout command which is the same as a Subversion checkout - a working directory is created from the server with all current files, and any operations (commit, log view, etc.) performed happen on the server, there is no local repository. Having the ability to do both checkouts and fully distributed branches gives you a ton of flexibility for your project management. Say that you want work done from your office desktops to always be preserved on the server for security in case of a disk failure, but you also want people to do distrubed work - contractors at other facilities, developers on laptops, etc. Bazaar is the only VCS that I know of that can do both. This is also handy for test servers - simply do a checkout so that way a full copy of the repository isn't on the test machine.
  6. Support for various transport mechanisms - the other two do this as well. You can do all operations over a variety of transport protocols - http, https, sftp (for servers where you don't have Bazaar installed), ssh (where Bazaar is isntalled on the server), plus I think a few others.
  7. Recovery from user error - we've made just about every mistake that there is to make (merging with the wrong branch, pushing code that isn't ready up to the main branch, specifying wrong files to commit, reverting everything not just one change, etc.). While recovery from these problems hasn't always been straight forward, and sometimes is hard to figure out how to do, in the end, we've always been able to recover. I don't have much experience with the other two VCS'es for this, I imagine they would work pretty much the same way too.
Bazaar is certainly not perfect. I've writte a subsequent blog posting Bazaar problems and lessons learned explaining some of the problems we've run in to and how to fix them. But, while git may work best for large open source projects, in my opinion Bazaar is by far the best VCS to use for the vast majority of software development work being done - businesses doing company proprietary development work with a team of developers of varying skill levels.


Jakub Narebski said...

This comment has been removed by the author.

Jakub Narebski said...

Note: I am Git user and developer, and I know Mercurial and Bazaar only from hearsay.

A few comments

Git on Windows. The msysGit project (or rather its "Git for Windows" download) is native Windows implementation of Git. It doesn't use and does not require Cygwin. But I agree that if you need version control system that works well on Windows, Git might not be a best choice.

GUI. gitk and git gui use Tcl/Tk to be portable: you can use them on Linux, on MS Windows, on MacOS X. The Interfaces, frontends, and tools page on Git Wiki provides list of various Git GUIs, including some cross-platform ones. I don't know if any of them is as good as you imply Bazaar Explorer GUI is.

Revision identifiers. First, to have version numbers you need either central authority / central repository (this is what Bazaar does, with its rewriting of version numbers by central repository, and slightly different operations for merging / updating in central repository and in leaf repositories), or have version numbers local to single repository (that is what Mercurial does, and which means that your version 5643 is not mine version 5643).

Second, you don't need to use version identifiers much. You usually use expressions that count version from tip or from some tag (e.g. v1.6.5~10, i.e. 10 commits before v1.6.5).

Also the ease of version numbers is overrated, especially with heavily nonlinear (branchy) history. Is revision number 9471 or 12963 that much easier to use than revision identifier bb9de5a? With nonlinear history you can't say whether revision 7634 is before 7643, or are they on separate branches (the dotted notation in Bazaar could help there, but it has its own disadvantages).

Performance. You complain (a bit) about Mercurial performance and its repository size... while both Git and Mercurial are much, much better on both accounts than Bazaar: see e.g. DVCS Comparison: Git, Mercurial, Bazaar Repository Size Benchmark (from 2008), or DVCS Round-up: One System to Rule Them All? -- Part 3 (from 2009), or its follow-up.

Jakub Narebski said...

One further comment about Git GUIs distributed with Git: You should take into account that it is two programs: gitk, which is graphical history viewer, and git gui, which is commit tool, that are together equivalent of Bazaar Explorer.

Brent said...

I've made a new post explaining some of the problems we've run in to with Bazaar and how to get around the problems, or how to correctly use Bazaar:
Bazaar problems and lessons learned.

Brent said...

So apparently when I migrated comments from the old to it didn't carry over one of my comments:

Originally made September 26, 2010:
Thank you for the comments Jakub. Since I haven't really used git on a collaborative project (I've just used it for personal projects), I appreciate someone with more experience with it sharing your thoughts on it.

Git on Windows - Based on your comments, I re-evaluated msysGit. It had been about a year since I tried it. I don't know if I was just using it wrong a year ago (maybe I was just using the "Git Bash"), or if there wasn't support to run it natively as a regular executable in Windows back then, but I was pleasantly surprised to be able to do a git clone directly from the Windows command prompt. Same thing goes for the other people in the class I took, this was in March, not sure if the native support was there in Windows or not. But most of the people there with Windows laptops still struggled with getting it to work correctly.

GUI - There may be 3rd party GUIs for git that are better than the bazaar explorer GUI. Bazaar explorer isn't particularly great, but it does just about everything you need to do and it looks decent. It comes with Bazaar so you don't have to do your own separate investigation for which GUI to use, as you would have to if you want to use the 3rd party git GUIs.

Revision identifiers - I would definitely argue that 9471 or 12963 is easier to remember bb9de5a. This is a matter of opinion though. And having numbers shows an immediate order to the revisions, you know that 9471 was before 12963, at least on the current branch. Pretty much every developer at my company that I've talked with about changing VCSes says the same thing about git - they don't want to have to use hash tags to refer to revisions.

Performance - Before version 2.0 of Bazaar (which came out in 2009, not sure when in the year) and the "2a" repository format, Bazaar's performance did suck pretty bad, from what I've read. The two comparisons that you linked to are no longer valid for Bazaar, the 2008 comparison was before version 2.0 came out, and the 2009 comparison specifically says that it uses Bazaar 1.10. For performance statistics, I was going by the benchmarks on Bazaar's site:
Granted, since this on on Bazaar's site, the test conditions I'm sure are more favorable for Bazaar.