Tuesday, June 3, 2008

On mainline merges and fast forwards

Bazaar has a somewhat different notion of merging in the case of no new commits than Git and Mercurial do. The reason for this is the notion of a “mainline” in Bazaar. This supposed mainline is meant as a silver line through your history. All commits should be merged into this mainline, giving you a nice overview of development. Bazaar has even integrated this into their “log” tool: commits that have been merged “into the mainline” are indented to show this.

Git and Mercurial use another approach, based on the fast-forward method: If there are no new commits on your branch, but there are new ones on the remote, Git and Mercurial just fast-forward you to that commit. No merging or so, your new HEAD will just be the same revision as the remote.

The reason they do this is because Bazaar’s approach does come with some problems. The first and most obvious of this is performance. The “bzr log” command is really slow as it reconstructs history every time, figuring out what the mainline actually is and then showing the history in a neat way. This scales badly: For Cairo, with 4000 commits, “bzr log -l10” takes just over a second to display the first 10 log messages. Mozilla-central, with 15000 commits, already takes 5 seconds. Samba with 24000 commits takes 10 seconds to display the first few log messages, which I would call unacceptable. It gets even worse when you try the Emacs repository.

However, that is not the biggest problem. The problem with explicit merges is the pollution of your branches. The outlining Bazaar makes is only useful if it is “correct” and does not show pollution. This can be a real trouble when two developers work together on a single feature, and merge with each other.

An example

Let’s say that we have a single upstream with some commits. There are two people working on something, one in the branch1 branch and the other in the branch2 branch. Both do a commit on their own branch, then branch1 decides to merge with branch2. After this, branch2 merges with branch1, to get their features.

What this means is that after both have merged, they should have the same tree (given that the merge was conflict-free). One might therefore also assume that they have the same branch history. This is not true however.

Show branch1 log

Show branch2 log

Who is right here? Should the “Add b” commit be indented or the “Add c” commit? As you can see, the notion of “mainline” is then local to a developer, beating the whole point of having a “global line through history”.

It gets even worse. Suppose branch1 merges the changes from branch2 again (just to make sure he has everything) and pushes it to upstream. Then branch2 updates from upstream in order to continue working. The final output of his log looks like this:

--------------------------------------------
revno: 4
committer: Pieter de Bie <pdebie@ai.rug.nl>
branch nick: branch2
timestamp: Tue 2008-06-03 16:25:55 +0200
message:
  Merge with upstream
    --------------------------------------------
    revno: 1.1.3
    committer: Pieter de Bie <pdebie@ai.rug.nl>
    branch nick: branch1
    timestamp: Tue 2008-06-03 16:25:52 +0200
    message:
      Merge with branch2
--------------------------------------------
revno: 3
committer: Pieter de Bie <pdebie@ai.rug.nl>
branch nick: branch2
timestamp: Tue 2008-06-03 16:25:51 +0200
message:
  Merge with branch1
    --------------------------------------------
    revno: 1.1.2
    committer: Pieter de Bie <pdebie@ai.rug.nl>
    branch nick: branch1
    timestamp: Tue 2008-06-03 16:25:49 +0200
    message:
      Merge with branch2
    --------------------------------------------
    revno: 1.1.1
    committer: Pieter de Bie <pdebie@ai.rug.nl>
    branch nick: branch1
    timestamp: Tue 2008-06-03 16:25:45 +0200
    message:
      Add b
--------------------------------------------
revno: 2
committer: Pieter de Bie <pdebie@ai.rug.nl>
branch nick: branch2
timestamp: Tue 2008-06-03 16:25:47 +0200
message:
  Add c
--------------------------------------------
revno: 1
committer: Pieter de Bie <pdebie@ai.rug.nl>
branch nick: upstream
timestamp: Tue 2008-06-03 16:25:42 +0200
message:
  Base commit

Does that look readable to you? This does not show a nice overview of what has happened during the development of a feature. Instead, it is littered with merges. This kind of noise may make developers wary of merging, which is not good.

Compare this to how Git and Mercurial handle this. Even after the to- and fro merging, the log still shows only four commits:

commit 78cd0c44cd93800a169617a72832bcb6a984f9a3
Merge: 598b3f8... ecc0685...
Author: Pieter de Bie <pdebie@ai.rug.nl>
Date:   Tue Jun 3 17:04:27 2008 +0200

    Merge comparison/temp-dir-2/branch2

    * comparison/temp-dir-2/branch2:
      Add c

commit 598b3f859ff9f69733a2ddd1406229cd3c203591
Author: Pieter de Bie <pdebie@ai.rug.nl>
Date:   Tue Jun 3 17:04:26 2008 +0200

    Add b

commit ecc06852b9f76cdcf9285f6eac7c39c002e566e7
Author: Pieter de Bie <pdebie@ai.rug.nl>
Date:   Tue Jun 3 17:04:26 2008 +0200

    Add c

commit 5828b28ff1fc41c674cabe9b839621a72f3effa5
Author: Pieter de Bie <pdebie@ai.rug.nl>
Date:   Tue Jun 3 17:04:26 2008 +0200

    Base commit

Implications

The important point of this is that in the way Git and Mercurial work, it doesn’t matter who does the merge. Their workflow encourages the use of forking, giving more freedom to the developers to do what they want and merge with whom they want. The Bazaar approach, in contrast, discourages merging by anyone else than those in control of the mainline, as otherwise the history will look ugly and unreadable: the first branch is somehow “special” and must be maintained that way. In a distributed VCS, it should not matter who does the merge. By making sure that the maintainers have to merge from you, you get less freedom in deciding how to do your development.

While the outline may seem nice, sometimes having it is just plainly wrong. If I want to update my branch to the latest upstream, I want to be equal to the latest upstream, not to make it a merge. Also as we just saw with cross-developer merging, there sometimes is no way to determine who is “the mainline”. Making a simple branch something that it isn’t is only bound to spread confusion among the users.

See also this reply by Linus Torvalds that basically says the same thing. Also take a look at the rest of that thread if you’re interested.

4 comments:

Anonymous said...

I don't like the Bazaar's log approach either. One thing is that it feels conceptually wrong in a distributed setting. I admid that Bazaar's log is actually nice in certain situations. But in the bigger picture the point is not having basically just one way of viewing the history from single forced point of view. What I prefer is having different ways. Bazaar-like mainline thinking is just one option, and a good VCS tool supports it but doesn't force to it.

It looks like the next git release will add new "git log --graph" option which prints branch lines with ASCII graphic on the left and log messages on the right. With that kind of thing user gets much richer view of project's history and it's completely without any forced mainline business.

Anonymous said...

Like John Meinel wrote (see the link he posted) Bazaar also supports fast-forwards: command "bzr merge --pull" performs a fast-forward merge.

All these popular DVCS tools support many different work models. The original blogger's example about Bazaar was probably the worst-case scenario. The example itself is good: it certainly is possible to create really unreadable logs by doing lots empty merge commits with Bazaar. However, with some discipline one can work better with Bazaar.

I tend to think that fast-forwards should be the default operation because I believe it encourages free merges between anybody without worrying about mainlines and merge noise. Project's policy or the primary maintainer (maybe just a merging robot) may choose not to do fast-forward merges to the mainline branch. This way the mainline will consists of only merges from feature branches and the history is always easy to read.

Jakub Narebski said...

For those interacting with other version control system, there is patch for Git IIRC waiting for an inclusion (after several rounds of refinements) adding fast-forward option '--ff=never' to 'git-merge' (and 'git-pull') to force a merge even in fast-forward case. So you would be able to use "Bazaar-model" (IIRC required to interact with some proprietary centralized VCS) in Git, too.

BTW you can skip merges in git-log output by using '--no-merges' option; you can see (if you use Git in distributed fashion) what is in mainline by checking committer (--committer=name); '--first-parent', due to using fast-forward if possible, isn't usually good representation of "silver mainline".

P.S. BTW. Git allows for easy rebase based workflow via use of "git pull --rebase".

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!