changeset 5:2ec53c0ed5d8

Musings on Merging and Mergeability.
author Jim Hague <jim.hague@icc-atcsolutions.com>
date Fri, 06 Mar 2009 14:07:34 +0000
parents 561edf852797
children a942bf7bc2ab
files Hg.txt
diffstat 1 files changed, 91 insertions(+), 2 deletions(-) [+]
line wrap: on
line diff
--- a/Hg.txt	Sun Dec 21 21:42:58 2008 +0000
+++ b/Hg.txt	Fri Mar 06 14:07:34 2009 +0000
@@ -495,14 +495,22 @@
 revlog. A revlog is a thing that holds a file and all the changes in
 the file history. (Footnote: For any non-trivial file, this will
 actually be two files on the disc, a data file and an index). The
-revlog stores the (compressed) differences between successive versions
+revlog stores the differences between successive versions
 of the file, though it will periodically store a complete version of
 the file instead of a difference, so that the content of any
 particular file version can always be reconstructed without excessive
 effort.
 
 Under the secret-squirrel Mercurial .hg directory at the top of your
-project is a store which holds a revlog for each file in your project.
+project is a store which holds a revlog for each file in your
+project. So you have the complete history of the project locally. No
+more round trips to the server.
+
+Both the differences between successive versions and the periodic
+complete versions of a file are compressed before storing. This is
+surprisingly effective at minimising the storage requirements this
+entire history of your project. <!!!Comparison of .svn space
+requirements for Waldo>.
 
 Any point in the evolution of a revlog can be uniquely identified with
 a nodeid. This is simply the SHA1 hash of the current file contents
@@ -597,3 +605,84 @@
 reading the old format comprises 28 lines of Python. Compared with,
 say, the early tribulations of Subversion and the switch from bdfs to
 fsfs, this is an impressive record.
+
+Reflections on going distributed
+--------------------------------
+
+It's nearly traditional at this stage in an introduction to DVCS to
+demonstrate several differenet workflow scanarios that you can build
+with a DVCS. Which makes the important point that a DVCS can be
+adapted to your workflow in a way that is at best unwieldy with a
+CVCS. I intend, though, to break with tradition here.
+
+By this stage, I hope you can see that distributing version control
+works by introducing branches where development takes place in
+parallel. Mercurial treats these branches as arising naturally from
+the commits made and transferred between repositories. Both Git and
+Bazaar take a slightly different viewpoint, and explicitly generate a
+fresh branch for work in a particular repositories. But in both cases
+the underlying principle of identifying changes by a globally unique
+identifier and resolving parallel development by merges between
+overlapping changes is the same. And all three can be used in a truly
+distributed manner, with full history and the ability to commit being
+available locally.
+
+I want now to reflect on the consequences all this has for that
+all-important question of whether a DVCS is a suitable vehicle for
+your data.
+
+The first is a minor and rather obvious point. If you want to store
+files that are both very large and which change often in your DVCS,
+then all the compression in the world is unlikely to stop the storage
+requirements for the full project history from becoming
+uncomfortably large.
+
+The second, and main, point is that there is an important question you
+need to ask about your data. We've seen that a DVCS relies on
+branching and merging to weave its magic. So take a close look at your
+data, and ask:
+
+Will It Merge?
+
+The subset of plain old text which comprises program source
+code requires some human oversight, but will merge automatically
+well enough for the process to be well within the bounds of the
+possible.
+
+Unfortunately when we move further afield mergeability becomes a rarer
+commodity. I nearly began the previous paragraph by stating that
+plain old text will merge well enough. Then Doubt set in - what about
+XML? Or BASE64 encoded content?
+
+Of course, merge doesn't necessarily have to be textual merge. I am
+told that Word can be used to diff and merge two Word .doc files, a
+data format notorious for its binary impenetrability. As long as some
+suitable merge agent is available, and the DVCS can be configured to
+use it for data of a particular type (Footnote: Mercurial can have the
+merge and diff tools specified with reference to the file extension on
+which they operate - I assume Bazaar and Git are similar.), then there
+is no bar to successful DVCS use.
+
+Before this reliance on mergeability causes you to dismiss DVCS out of
+hand, reflect. A CVCS can only handle non-mergeable data by acting as
+a versioned file store; in other words, having as the only available
+merge option the use of one or other of the merge candidates in its
+entireity. Useful though a versioned file store can be, it cannot be
+considered a full-featured version control system. By treating the
+offending unmergeable files as external to the DVCS, or with careful
+workflow - disabling the distributed and mergeable potentials - a DVCS
+can deal with these files, but only at a cost of its distributedness
+or its version control system-ness. In this it differs little from a
+CVCS.
+
+So, for all data you want to version control, let your battle cry be
+
+Will It Merge?
+
+At this point, I have an urge to don lab coat and safety goggles and
+be videoed attempting to mechanically merge data in a variety of
+different formats. Frankly, this is unlike to be as exciting at
+blending iPhones (Ref: www.willitblend.com), but from a system
+development point of view it's rather more important. And, I think
+gives us a large clue as to one of the reasons for the continuing
+popularity of Plain Old Text as a source code representation mechanism.