Sunday, October 3, 2010

Bazaar problems and lessons learned

As promised in my blog post My Distribued Version Control System comparison, I'm going to share some of the problems that I've encountered with Bazaar and how I've gotten around them. To summarize my previous blog posting, I decided to use Bazaar on my current project at work over Git and Mercurial. We've been using it for almost 6 months now, and overall it's worked very well. But we've run in to a few problems, made a few mistakes, so I figured I should share these experiences.

Data transferred for new branch is huge
We had been using sftp as the transport mechanism to push and pull from our central repository server. I noticed that the amount of data transferred to do a new branch just kept getting higher and higher, eventually topping out at 400 megs even though our repository was only 22 megs in size on the server! Two steps fixed this. First, bzr+ssh is more efficient than sftp, only use sftp if you can't install Bazaar on the server. Second, I ran "bzr pack" on the repository on the server. I thought that data would automatically get packed, but it didn't appear as if it was. After these two things, a new branch only transfers about 12 megs. I think both of these may have been tied to using sftp, so I would not recommend ever using sftp, always install bazaar on the server and use bzr+ssh:// instead of sftp://

Revision numbers keep getting renamed in the top level branch
We had been going about getting new changes in to the top level branch incorrectly. Here is our basic setup. A top level branch for our "integration" branch on the server (called trunk), then branches on the server for different tasks that we're working on. Say we have two branches that two different developers are working on - A and B. There is a branch for A and B on the server. As of a week ago, A and B had the same revisions as the trunk branch, up to revision 600. John works on A and pushes changes from his local development branch to a branch A on the server every day. I work on B and push my local changes up to B every day. John completes task A, and then pushes his changes up to trunk (revisions 601-610). Now I'm done with B, I've made my own revisions 601-620. I can't push directly to trunk because trunk has new revisions that I don't have (John's 601-610). When we first used Bazaar, from my local branch, I would have done a merge with trunk, and then pushed the merged changes up to trunk. This is not optimal, because John's revision numbers get renamed (from 601-610 to 600.1.1 to 600.1.10). You don't want revision numbers that were already in trunk to change, this confuses everyone. My local revision numbers should be changed when I push them up. So to solve this, we changed our work flow. We now NEVER push changes in to trunk. From trunk, we always pull or merge in changes from other branches. The easy way to do this is to create a local trunk branch, pull or merge your changes from there, and then push up to trunk on the server.

Fixing accidental pushes, pulls, and merges
What if you accidentally push a revision to another branch too early, or pull revisions that you didn't mean to yet? My first thought was to just do a revert. WRONG. A revert will just make changes to the local file system to put the files back in the state that they were at the revision you specify to revert to, so you must commit again to record these changes as a new revision. Your original changes are STILL there as a revision. So if you later want to get these changes back, it's too late, Bazaar thinks that it has those changes. You'd have to manually reintroduce the changes. If you push the revisions that you reverted out again, Bazaar already has those revisions. The proper way to do this is "bzr uncommit" on the upstream branch on the server. This will step back X number of revisions (do bzr help uncommit to see how to specify more than one revision to step back to). When you're actually ready do push or pull the changes that you pushed or pulled too early, you can just push or pull again. The upstream branch won't have these revisions yet since they were uncommitted so everything should work out.

Making changes after a merge
After you do a merge, the merge changes are sitting on your local file system, and Bazaar is marked that it has an unresolved merge. When you do a merge, do things one at a time - completely resolve the merge, and then do a commit immediately after with ALL changes from the merge. Doing a commit will automatically mark that the merge is resolved. Do NOT attempt to make ANY other changes after you run the merge command without first completely resolving the merge and committing the merge changes.

We ran in to a big problem here by mixing regular changes with a merge change. A merge was done by accident in a local branch. At the same time, a file not related to the merge was changed in that local branch. If a normal commit had been done, all of the merge changes would have been commited. But the developer didn't want the merge to save. Rather than uncommitting, the change was made to the one file unrelated to the merge, and a commit was made specifying just that one file to commit. This commit was pushed up. This caused problems, because a commit was done after the merge, Bazaar marked the merge as resolved. But the actual changes pulled in during the merge were stuck in the local file system, since they weren't included in the commit after the merge. The local branch was then deleted by accident. So all of these changes essentially disappeared. Looking at the log, all of the revisions from the merge were still listed as revisions, but the actual contents of these revisions were gone because the commit after the merge didn't include them. We ended up reverting back to the merged revision number right before the commit after the merge. We lost all changes after the merge, and had to manually re-introduce those changes (this was easier than manually re-introducing the changes from the branch that we merged with). Lesson learned, when committing after a merge, include ALL changes from the merge, and don't make changes unrelated to the merge.