Mercurial > CVu-Mercurial
changeset 5:2ec53c0ed5d8
Musings on Merging and Mergeability.
author | Jim Hague <jim.hague@icc-atcsolutions.com> |
---|---|
date | Fri, 06 Mar 2009 14:07:34 +0000 |
parents | 561edf852797 |
children | a942bf7bc2ab |
files | Hg.txt |
diffstat | 1 files changed, 91 insertions(+), 2 deletions(-) [+] |
line wrap: on
line diff
--- a/Hg.txt Sun Dec 21 21:42:58 2008 +0000 +++ b/Hg.txt Fri Mar 06 14:07:34 2009 +0000 @@ -495,14 +495,22 @@ revlog. A revlog is a thing that holds a file and all the changes in the file history. (Footnote: For any non-trivial file, this will actually be two files on the disc, a data file and an index). The -revlog stores the (compressed) differences between successive versions +revlog stores the differences between successive versions of the file, though it will periodically store a complete version of the file instead of a difference, so that the content of any particular file version can always be reconstructed without excessive effort. Under the secret-squirrel Mercurial .hg directory at the top of your -project is a store which holds a revlog for each file in your project. +project is a store which holds a revlog for each file in your +project. So you have the complete history of the project locally. No +more round trips to the server. + +Both the differences between successive versions and the periodic +complete versions of a file are compressed before storing. This is +surprisingly effective at minimising the storage requirements this +entire history of your project. <!!!Comparison of .svn space +requirements for Waldo>. Any point in the evolution of a revlog can be uniquely identified with a nodeid. This is simply the SHA1 hash of the current file contents @@ -597,3 +605,84 @@ reading the old format comprises 28 lines of Python. Compared with, say, the early tribulations of Subversion and the switch from bdfs to fsfs, this is an impressive record. + +Reflections on going distributed +-------------------------------- + +It's nearly traditional at this stage in an introduction to DVCS to +demonstrate several differenet workflow scanarios that you can build +with a DVCS. Which makes the important point that a DVCS can be +adapted to your workflow in a way that is at best unwieldy with a +CVCS. I intend, though, to break with tradition here. + +By this stage, I hope you can see that distributing version control +works by introducing branches where development takes place in +parallel. Mercurial treats these branches as arising naturally from +the commits made and transferred between repositories. Both Git and +Bazaar take a slightly different viewpoint, and explicitly generate a +fresh branch for work in a particular repositories. But in both cases +the underlying principle of identifying changes by a globally unique +identifier and resolving parallel development by merges between +overlapping changes is the same. And all three can be used in a truly +distributed manner, with full history and the ability to commit being +available locally. + +I want now to reflect on the consequences all this has for that +all-important question of whether a DVCS is a suitable vehicle for +your data. + +The first is a minor and rather obvious point. If you want to store +files that are both very large and which change often in your DVCS, +then all the compression in the world is unlikely to stop the storage +requirements for the full project history from becoming +uncomfortably large. + +The second, and main, point is that there is an important question you +need to ask about your data. We've seen that a DVCS relies on +branching and merging to weave its magic. So take a close look at your +data, and ask: + +Will It Merge? + +The subset of plain old text which comprises program source +code requires some human oversight, but will merge automatically +well enough for the process to be well within the bounds of the +possible. + +Unfortunately when we move further afield mergeability becomes a rarer +commodity. I nearly began the previous paragraph by stating that +plain old text will merge well enough. Then Doubt set in - what about +XML? Or BASE64 encoded content? + +Of course, merge doesn't necessarily have to be textual merge. I am +told that Word can be used to diff and merge two Word .doc files, a +data format notorious for its binary impenetrability. As long as some +suitable merge agent is available, and the DVCS can be configured to +use it for data of a particular type (Footnote: Mercurial can have the +merge and diff tools specified with reference to the file extension on +which they operate - I assume Bazaar and Git are similar.), then there +is no bar to successful DVCS use. + +Before this reliance on mergeability causes you to dismiss DVCS out of +hand, reflect. A CVCS can only handle non-mergeable data by acting as +a versioned file store; in other words, having as the only available +merge option the use of one or other of the merge candidates in its +entireity. Useful though a versioned file store can be, it cannot be +considered a full-featured version control system. By treating the +offending unmergeable files as external to the DVCS, or with careful +workflow - disabling the distributed and mergeable potentials - a DVCS +can deal with these files, but only at a cost of its distributedness +or its version control system-ness. In this it differs little from a +CVCS. + +So, for all data you want to version control, let your battle cry be + +Will It Merge? + +At this point, I have an urge to don lab coat and safety goggles and +be videoed attempting to mechanically merge data in a variety of +different formats. Frankly, this is unlike to be as exciting at +blending iPhones (Ref: www.willitblend.com), but from a system +development point of view it's rather more important. And, I think +gives us a large clue as to one of the reasons for the continuing +popularity of Plain Old Text as a source code representation mechanism.