Tuesday, May 27, 2008

Meet the candidates

As a first real post, I’d like to introduce you to the candidate repositories that will be used in the performance tests. This list is not final yet, as the importers can have problems with anything remotely weird. I tried to pick a range of projects from each system.

Bazaar repositories

I had the most trouble finding suitable projects for Bazaar. While there are a lot of small projects, there’s almost no large project that has chosen Bazaar for their version control. The WhoUsesBzr wiki page lists some large projects, for example Drupal. However, their official development still takes place in SVN or CVS, which means these clones miss any branching / merging.

  • Emacs Finally, I settled on emacs. Emacs recently switched to bazaar. Their choice was mostly motivated by political reasons, and there have been some complaints, but most projects seem to have complainers when switching from repository. This is a big repository: it has almost 90000 commits and its repository is 300MB. It has a working tree of 104MB, which perhaps makes it one of the biggest repositories in the test.

  • Pkg-config. This is the smallest repository in the test. The repository is 1.8MB in size, with a working tree of less than a megabyte. It has just 187 commits.

  • Mailman. This is reasonably large repository. It has 6700 commits, a repository of 73MB and a working tree of 20MB.

Mercurial repositories

I found a couple of nice mercurial repositories that were used in the tests:

  • Mozilla-central. This is one of the repositories found on mozilla’s site. It has a repository of 205MB, it has more than 15000 commits and a working tree of 284MB, which makes it the largest repo in the test.

  • dovecot. Dovecot’s repo has 7500 commits, is 14MB with a working tree of 6MB.

  • Octave. Octave is an open-source clone of mathlab, without all the cool packages. It has just 8000 commits, but a repo of 60MB and a working tree of 29MB.

Git repositories

This was somewhat challenging too: while there are a lot of projects using Git, almost all the importers have trouble importing them. I will discuss the importers in another post, so I’ll just list the projects here.

  • Cairo. This is the smallest Git repository I’ve used. Cairo has a mostly linear history, with some merges happening. The repository has only about 5000 commits, but is still 16MB in size. This is probably because the project is quite large: the working directory is 10MB.

  • coreutils. Coreutil’s repository is about 30MB. It has around 25000 commits and a working dir of 9MB. It has some merges, but is mostly linear, like Cairo’s.

And the final candidate…

I had a lot of trouble finding the last repository. At first I wanted to use Git’s repository itself. However, it uses some octopus merges (merges with more than 2 parents) which cannot be imported correctly by “hg convert”, which ignores them. Furthermore, there was a bug in git-fast-export which made “bzr fast-import” crash on them.

Similarly I had troubles importing both the Linux-2.6 and Wine, on which the “hg convert” tool crashes because of an invalid byte encoding issue. Mercurial also had troubles importing the VLC repository, while bzr-fast-import couldn’t handle the Rubinius repository. Therefore, I’m still looking for a third repository to use with Git.

2 comments:

gebi said...

using emacs git repo would be nice, to get compareable results with bzr :).
git://git.sv.gnu.org/emacs.git

or maybe the gcc repository?
git://git.infradead.org/gcc.git

Pieter said...

@gebi: As Emacs has chosen Bazaar for its VCS, I already used it as a Bazaar repository. I converted the Bazaar repo to Git myself in order to assure both had the same (amount of) commits, the Emacs git repository has more commits than the Bazaar one does.

Gcc might have been a good choice. I finally settled on Samba though.