changeset 0:48d338d29ce9

First comitted version.
author Jim Hague <jim.hague@acm.org>
date Thu, 11 Dec 2008 10:15:27 +0000
parents
children 608947872f72
files Hg.txt
diffstat 1 files changed, 568 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Hg.txt	Thu Dec 11 10:15:27 2008 +0000
@@ -0,0 +1,568 @@
+Inside a distributed version control system
+===========================================
+
+Grinton Lodge is a Youth Hostel that sits on an exposed hillside just
+above the small hamlet of Grinton in Swaledale, in the Yorkshire Dales
+National Park. A former Victorian shooting lodge, it now welcomes
+walkers and other travellers from around the world.
+
+Tonight, a Wednesday in mid-November, is not one of its busiest
+nights. Kat, the duty staff member, tells me that there is a small
+corporate team-building group in the annex. There's no sign of them at
+present. Otherwise, that portion of the world that has beaten a path
+to the door of this grand building today consists of just me. And Kat
+goes home soon.
+
+The November CVu, removed from its wrappers and read yesterday, lies
+in my bag. Taunting me. Go on, it says, if you've ever going to put
+finger to keyboard in the name of CVu, well, tonight you are out of
+excuses.
+
+Bugger.
+
+Let's look into Mercurial
+-------------------------
+
+Mercurial is a Distributed Version Control System (DVCS). It's one of a
+number of DVCSs that have gained significant popularity in the
+last few years. I switched a significant work project over to Mercurial
+(from Subversion) over a year ago, because a customer site required
+on-site work but could not allow access back to the company VPN. I
+chose Mercurial for a variety of reasons which I won't bore you with
+here. If you must know, see the box.
+
+What I want to do in this article is give you an insight into how a
+DVCS works. OK, so specifically I'm going to be talking about
+Mercurial, but Git and Bazaar attack the problem in a similar way. But
+first I'd better give you some idea of how you use Mercurial.
+
+::::
+Box: OK, if you must know:
+
+o Implementability. I needed the system to work on Windows, Linux and
+AIX. The latter was not one of the directly supported platforms for
+any of the candidates. Git's implementation uses a horde of
+tools. Bazaar requires only Python, but required Python 2.4 while IBM
+stubbornly still supplies only Python 2.3. Mercurial requires Python
+2.3 or greater, and uses some C for speed.
+
+o Simplicity. From the command line, Mercurial's core operations will
+be familiar to a Subversion user. This is also true of Bazaar, but was
+less true of Git. Git has improved in this matter since then, but a Mr
+Winder of this parish tells me that it's still possible to seriously
+embarass yourself. There was also a lack of Windows support for Git at
+the time.
+
+o Speed. Mercurial is fast. In the same ballpark as Git. Bazaar
+wasn't, and although it has improved significantly, has, in my
+estimation, added user complexity in the process, and is still off the
+pace for some operations.
+
+o Documentation. At the time, Bryan O'Sullivan's excellent Mercurial
+book (http://hgbook.red-bean.com) was a clear winner for best
+documentation.
+::::
+
+The 5 minute Mercurial overview
+-------------------------------
+
+I think it unlikely that someone possessing the taste and discernment
+to be reading CVu would not be familiar with at least one version
+control system. So, while I want to give you a flavour of what it's
+like to use, I'm not going to hang about. If you'd like a proper
+introduction, or you don't follow something, I thoroughly recommend
+you consult the Mercurial book.
+
+To start using Mercurial to keep track of a project.
+
+$ hg init
+$
+
+This creates the repository root in the current directory.
+
+Like CVS with its CVS directory and Subversion with its .svn
+directory, Mercurial keeps its private data in a directory. Mercifully
+there is only one of these, in the top level of your project. And
+rather than holding details of where the actual repository is to be
+found, the .hg directory holds the entire repository.
+
+Next you need to specify the files you want Mercurial to track.
+
+$ echo "There was a gibbon one morning" > pome.txt
+$ hg add pome.txt
+$
+
+As you might expect, this marks the files as to be added. And as you
+might also expect, you need to commit to record the added files in the
+repository. The commit comment can be supplied on the command line; if
+you don't supply a comment, you'll be dropped into an editor to
+provide one.
+
+There is a suggested format for these messages - a one line summary
+followed by any more required detail on following lines. By default
+Mercurial will only display the first line of commit messages when
+listing changes. In these examples I'll stick to terse messages, and
+I'll enter them from the command line.
+
+$ hg commit -m "My Pome" -u "Jim Hague <jim.hague@acm.org>"
+$
+
+Mercurial records the user making the change as part of the change
+information. It is usual to give your name and email address as I've
+done here. You can imagine, though, that constantly having to repeat
+this is a bit tedious, so you can set a default user name in a
+configuration file. Mercurial keeps global, user and repository
+configurations, and it can go in any of those.
+
+As with Subversion, after further edits you see how your working copy
+differs from the repository.
+
+$ hg status
+M pome.txt
+$ hg diff
+diff -r 33596ef855c1 pome.txt
+--- a/pome.txt  Wed Apr 23 22:36:33 2008 +0100
++++ b/pome.txt  Wed Apr 23 22:48:01 2008 +0100
+@@ -1,1 +1,2 @@ There was a gibbon one morning
+ There was a gibbon one morning
++said "I think I will fly to the moon".
+$ hg commit -m "A great second line"
+$
+
+And look through a log of changes.
+
+$ hg log
+changeset:   1:3d65e7a57890
+tag:         tip
+user:        Jim Hague <jim.hague@acm.org>
+date:        Wed Apr 23 22:49:10 2008 +0100
+summary:     A great second line
+
+changeset:   0:33596ef855c1
+user:        Jim Hague <jim.hague@acm.org>
+date:        Wed Apr 23 22:36:33 2008 +0100
+summary:     My Pome
+
+$
+
+There are some items here that need an explanation.
+
+The changeset identifer is in fact two identifiers separated by a
+colon. The first is the sequence number of the changeset in the
+repository, and is directly comparable to the change number in a
+Subversion repository. The second is a globally unique identifier for
+that change. As the change is copied from one repository to another
+(this is a distributed system, remember, even if we haven't come to
+that bit yet), its sequence number in any particular repository will
+change, but the global identifier will always remain the same.
+
+'tip' is a Mercurial term. It means simply the most recent change.
+
+Want to rename a file?
+
+$ hg mv pome.txt poem.txt
+$ hg status
+A poem.txt
+R pome.txt
+$ hg commit -m "Rename my file"
+$
+
+(The command to rename a file is actually 'hg rename', but Mercurial
+saves Unix-trained fingers from typing embarrassment.)
+
+At this point you may be wondering about directories. 'hg mkdir'
+perhaps? Well, no. Mercurial only tracks files. To be sure, the
+directory a file occupies is tracked, but effectively only as a
+component of the file name.  This has the slightly unexpected result
+that you can't record an empty directory in your repository.
+(Footnote: I tripped over this converting a work Subversion
+repository. One possibility is to create a placemaker file in the
+directory. In the event I created the directory (which receives build
+products) as part of the build instead.)
+
+Given this, and the status output above that suggests strongly that
+Mercurial treats a rename as a copy followed by a delete, you may be
+worried that Mercurial won't cope at all well with rearranging your
+repository. Relax. Mercurial does store the details of the rename as
+part of the changeset, and copes very well with rearrangements.
+
+(Footnote: The Mercurial designers justify not dealing with
+directories as first class objects by pointing out that provided you
+can correctly move files about in the tree, the other reasons for
+tracking directories are uncommon and do not in their opinion justify
+the considerable added complexity. So far I've found no reason to
+doubt that judgement.)
+
+Want to rewind the working copy to a previous revision?
+
+$ hg update -r 1
+1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+$
+
+'hg update' updates the working files. In this case I'm specifying
+that I want to go back to local changeset 1. I could also have typed
+'-r 3d65e7a57890', or even '-r 3d'; when specifying the global change
+identifier you only need to type enough digits to make it unique.
+
+This is all very well, but it's not exactly distributed, is it?
+
+Copy an existing repository:
+
+elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem
+updating working directory
+1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+
+(You can access other repositories via the file system, over http or
+over ssh).
+
+elsewhere$ cd Jim-Poem
+elsewhere$  hg log
+changeset:   3:a065eb26e6b9
+tag:         tip
+user:        Jim Hague <jim.hague@acm.org>
+date:        Thu Apr 24 18:52:31 2008 +0100
+summary:     Rename my file
+
+changeset:   2:ff97668b7422
+user:        Jim Hague <jim.hague@acm.org>
+date:        Thu Apr 24 18:50:22 2008 +0100
+summary:     Finished first verse
+
+changeset:   1:3d65e7a57890
+user:        Jim Hague <jim.hague@acm.org>
+date:        Wed Apr 23 22:49:10 2008 +0100
+summary:     A great second line
+
+changeset:   0:33596ef855c1
+user:        Jim Hague <jim.hague@acm.org>
+date:        Wed Apr 23 22:36:33 2008 +0100
+summary:     My Pome
+
+'hg clone' is aptly named. It creates a new repository that contains
+exactly the same changes as the source repository. You can make a
+clone just by copying your project directory, if you're confident
+nothing else will access it during the copy. 'hg clone' saves you this
+worry, and sets the default push/pull location in the new repo to the
+cloned repo.
+
+From that point, you use 'hg pull' to collect changes from other
+places into your repo (though note it does not by default update your
+working copy), and, as you might guess, 'hg push' shoves your changes
+into a foreign repository. By default these will act on the repository
+you cloned from, but you can specify any other repository.
+
+More on those in a moment. First, though, I want to show you something
+you can't do in Subversion. Start with the repository with 4 changes
+we just cloned. Let's focus on the first couple of lines.
+
+$ hg update -r 1
+1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+And make a change.
+
+$ hg diff
+diff -r 3d65e7a57890 pome.txt
+--- a/pome.txt  Wed Apr 23 22:49:10 2008 +0100
++++ b/pome.txt  Thu Apr 24 19:13:14 2008 +0100
+@@ -1,2 +1,2 @@ There was a gibbon one morning
+-There was a gibbon one morning
+-said "I think I will fly to the moon".
++There was a baboon who one afternoon
++said "I think I will fly to the sun".
+$ hg commit -m "Better first two lines"
+$
+
+The alert among you will have sat up at that. Well done! Yes, there's
+something very worrying. How can I commit a change at an old point?
+If you try this in Subversion, it will complain mightily about your
+file being out of date. But Mercurial just went ahead and did
+something.  The Bazaar experts among you will know that in Bazaar, if
+you use 'bzr revert -r' to bring the working copy to a past revision,
+make a change and commit, then your latest version will be the past
+revision plus your change. Perhaps that's what Mercurial did?
+
+No. What Mercurial did is central to Mercurial's view of the
+world. You took your working copy back to an old changeset, and the
+committed a fresh change based at that changeset. Mercurial actually
+did just what you asked it to do, no more and no less. Let's see the
+initial evidence.
+
+$ hg heads
+changeset:   4:267d32f158b3
+tag:         tip
+parent:      1:3d65e7a57890
+user:        Jim Hague <jim.hague@acm.org>
+date:        Thu Apr 24 19:13:59 2008 +0100
+summary:     Better first two lines
+
+changeset:   3:a065eb26e6b9
+user:        Jim Hague <jim.hague@acm.org>
+date:        Thu Apr 24 18:52:31 2008 +0100
+summary:     Rename my file
+
+$
+
+Time for some more Mercurial terminology. You can think of a 'head' in
+Mercurial as the most recent change on a branch. In Mercurial, a
+branch is simply what happens when you commit a change that has as its
+parent a change that already has a child. Mercurial has a standard
+extension 'hg glog' which uses some ASCII art to show the current
+state:
+
+$ hg glog
+@  changeset:   4:267d32f158b3
+|  tag:         tip
+|  parent:      1:3d65e7a57890
+|  user:        Jim Hague <jim.hague@acm.org>
+|  date:        Thu Apr 24 19:13:59 2008 +0100
+|  summary:     Better first two lines
+|
+| o  changeset:   3:a065eb26e6b9
+| |  user:        Jim Hague <jim.hague@acm.org>
+| |  date:        Thu Apr 24 18:52:31 2008 +0100
+| |  summary:     Rename my file
+| |
+| o  changeset:   2:ff97668b7422
+|/   user:        Jim Hague <jim.hague@acm.org>
+|    date:        Thu Apr 24 18:50:22 2008 +0100
+|    summary:     Finished first verse
+|
+o  changeset:   1:3d65e7a57890
+|  user:        Jim Hague <jim.hague@acm.org>
+|  date:        Wed Apr 23 22:49:10 2008 +0100
+|  summary:     A great second line
+|
+o  changeset:   0:33596ef855c1
+   user:        Jim Hague <jim.hague@acm.org>
+   date:        Wed Apr 23 22:36:33 2008 +0100
+   summary:     My Pome
+
+$
+
+'hg view' shows a nicer graphical view. (Footnote: Though, being
+Tcl/Tk based, not that much nicer.)
+
+So the change is in there. It's the latest change, and is simply on a
+different branch to the other changes.
+
+Almost invariably, you will want to bring everything back together and
+merge the branches. A merge is a change that combines two heads back
+into one. It prepares an updated working directory with the merged
+contents of the two heads for you to review and, if satisfactory, commit.
+
+$ hg merge
+merging pome.txt and poem.txt
+0 files updated, 1 files merged, 0 files removed, 0 files unresolved
+(branch merge, don't forget to commit)
+$ cat poem.txt
+There was a baboon who one afternoon
+said "I think I will fly to the sun".
+So with two great palms strapped to his arms,
+he started his takeoff run.
+$ hg commit -m "Merge first line branch"
+$
+
+(Footnote: I'm no poet. The poem is, of course, 'Silly Old Baboon' by
+the late, great, Spike Milligan.)
+
+Here's the ASCII art again showing what just happened. Oh, and notice
+that Mercurial has done the right thing with regard to the rename.
+
+$ hg glog
+@    changeset:   5:792ab970fc80
+|\   tag:         tip
+| |  parent:      4:267d32f158b3
+| |  parent:      3:a065eb26e6b9
+| |  user:        Jim Hague <jim.hague@acm.org>
+| |  date:        Thu Apr 24 19:29:53 2008 +0100
+| |  summary:     Merge first line branch
+| |
+| o  changeset:   4:267d32f158b3
+| |  parent:      1:3d65e7a57890
+| |  user:        Jim Hague <jim.hague@acm.org>
+| |  date:        Thu Apr 24 19:13:59 2008 +0100
+| |  summary:     Better first two lines
+| |
+o |  changeset:   3:a065eb26e6b9
+| |  user:        Jim Hague <jim.hague@acm.org>
+| |  date:        Thu Apr 24 18:52:31 2008 +0100
+| |  summary:     Rename my file
+| |
+o |  changeset:   2:ff97668b7422
+|/   user:        Jim Hague <jim.hague@acm.org>
+|    date:        Thu Apr 24 18:50:22 2008 +0100
+|    summary:     Finished first verse
+|
+o  changeset:   1:3d65e7a57890
+|  user:        Jim Hague <jim.hague@acm.org>
+|  date:        Wed Apr 23 22:49:10 2008 +0100
+|  summary:     A great second line
+|
+o  changeset:   0:33596ef855c1
+   user:        Jim Hague <jim.hague@acm.org>
+   date:        Wed Apr 23 22:36:33 2008 +0100
+   summary:     My Pome
+
+$
+
+So, our little branch change has now been merged back, and we have a
+single line of development again. Notice that unlike the other
+changesets, changeset 5 has two parent changesets, indicating it is a
+merge changeset. You can only merge two branches in one operation; or
+putting it another way, a changeset can have a maximum of two parents.
+
+This behaviour is absolutely central to Mercurial's philosophy. If a
+change is committed that takes as its starting point a change that
+already has a child, then a branch gets created. Working with
+Mercurial, branches get created frequently, and equally frequently
+merged back. As befits any frequent operation, both are easy to do.
+
+You're probably thinking at this point that this making a commit onto
+an old version is a slightly strange thing to do, and you'd be right.
+But that's exactly what's going to happen the moment you go
+distributed. Two people working independently with their own
+repositories are going to make commits based, typically, on the latest
+changes they happen to have incorporated into their tree. To be
+Distributed, a DVCS has to deal with this. Mercurial faces it head-on.
+When you pull changes into your repo (or someone else pushes them), if
+any of the changes overlap - are both based on the same base change -
+you get extra heads, and it's up to you to let these extra heads live
+or merge, as you please.
+
+In practice this is more manageable then you might think. Consider a
+typical Mercurial usage, where the 'master' repo sits on a known
+server, and everyone pulls changes from the master and pushes their
+own efforts the master. But default Mercurial won't let you push if
+the receiving repo will gain an extra head as a result, so you
+typically pull (and do any required merging) just before
+pushing. Subversion users will recognised this pattern. Subversion
+won't let you commit a change if your working copy is not at the very
+latest revision, so the Subversion user will update, and merge if
+necessary, just before committing.
+
+What, then, about a branch in the conventional sense of '1.0
+maintenance branch'? Typically in Mercurial you'd handle this by
+keeping a separate cloned repository for those changes. Cloning is
+fast, and if local uses hard links where possible on filesystems that
+support them, so isn't necessarily extravagant on disc space. You can,
+if you prefer, handle them all in a single repo with 'named
+branches', but cloning is definitely simpler.
+
+OK, so now you know the basics of using Mercurial. We can proceed to
+looking at how this magic is achieved. In particular, where does this
+magic globally unique identifier for a change come from?
+
+Inside the Mercurial repo
+-------------------------
+
+The way Mercurial handles its repo is really quite simple.
+
+That's simple, as in 'most things are simple once you know the
+answer'.  I found the explanation helpful, so this section attempts
+the 10,000ft (FL100 if you prefer) view of Mercurial.
+
+(Foornote: Bryan O'Sullivan's excellent Mercurial book has a chapter
+on the subject, and the Mercurial website has a fair amount of detail
+too. This is 'research', OK?)
+
+First remember that any file or component can only have one or two
+parents. You can't merge more than one other branch at once.
+
+We start with the basic building block, which Mercurial calls a
+revlog. A revlog is a thing that holds a file and all the changes in
+the file history. (Footnote: For any non-trivial file, this will
+actually be two files on the disc, a data file and an index). The
+revlog stores the (compressed) differences between successive versions
+of the file, though it will periodically store a complete version of
+the file instead of a difference, so that the content of any
+particular file version can always be reconstructed without excessive
+effort.
+
+Under the secret-squirrel Mercurial .hg directory at the top of your
+project is a store which holds a revlog for each file in your project.
+
+Any point in the evolution of a revlog can be uniquely identified with
+a nodeid. This is simply the SHA1 hash of the current file contents
+concatenated with the nodeids of one or both parents of the current
+revision. Note that this way, two file states are identical if and
+only if the file contents are the same *and* the file has the
+same history.
+
+Here's a dump of a revlog index:
+
+$ hg debugindex .hg/store/data/pome.txt.i
+   rev    offset  length   base linkrev nodeid       p1           p2
+     0         0      32      0       0 6bbbd5d6cc53 000000000000 000000000000
+     1        32      51      0       1 83d266583303 6bbbd5d6cc53 000000000000
+     2        83      84      0       2 14a54ec34bb6 83d266583303 000000000000
+     3       167      76      3       4 dc4df776b38b 83d266583303 000000000000
+$
+
+Note here that a file state can have two parents. If both the parent
+nodeids are non-null, the file state has two parents, and the state is
+therefore the result of a merge.
+
+Let's dump out a revlog at a particular revision:
+
+$ hg debugdata .hg/store/data/pome.txt.i 2
+There was a gibbon one morning
+said "I think I will fly to the moon".
+So with two great palms strapped to his arms,
+he started his takeoff run.
+$
+
+The next component is the manifest. This is simply a list of all the
+files in the project, together with their current nodeids. The
+manifest is a file, held in a revlog. The nodeid of the manifest,
+therefore, identifies the project filesystem at a particular point.
+
+$ hg debugdata .hg/store/00manifest.i 5
+poem.txt5168b1a5e2f44aa4e0f164e170820845183f50c8
+$
+
+Finally we have the changeset. This is the atomic collection of
+changes to a repository that leads to a new revision. The changeset
+info includes the nodeid of the corresponding manifest, the timestamp
+and committer ID, a list of changed files and a comment. The changeset
+also includes the nodeid of the parent changeset, or the two parents
+if the change is a merge. The changeset description is held in a
+revlog, the changelog.
+
+$ hg debugdata .hg/store/00changelog.i 5
+1ccc11b6f7308cc8fa1573c2f3811a4710c91e3e
+Jim Hague <jim.hague@acm.org>
+1209061793 -3600
+poem.txt
+pome.txt
+
+Merge first line branch
+$
+
+The nodeid of the changeset, therefore, gives us a globally unique
+identifier for any particular change.  Changesets have a
+Subversion-like incrementing change number, but it is peculiar to that
+repository. The nodeid, however, is global.
+
+One more detail remains to complete the picture. How do we get back
+from a particular file change to find the responsible changeset? Each
+revlog change has a linkrev entry that does just this.
+
+So, now we have a repository with a history of the changes applied to
+that repository. Each change has a unique identifier. If we find that
+change in another repository, it means that at the point in the other
+repository we have exactly the same state; the file contents and
+history are identical.
+
+At this point we can see how pulling changes from another repository
+works. Mercurial has to determine which changesets in the source
+repository are missing in the target repository. To do this, for each
+head in the source repo it has to find the most recent change in that
+head that it already present in the target repo, and get any remaining
+changes after that point. These changes are then copied over and
+applied.
+
+The Mercurial revlog format has proved remarkably durable. Over the
+lifetime of Mercurial, there have been just two changes to the file
+format. And one of those (a very recently change at the time of
+writing, yet to appear in a release version) is a very small change to
+filename storage required to deal with Windows-specific issues.