comparison Hg.txt @ 0:48d338d29ce9

First comitted version.
author Jim Hague <jim.hague@acm.org>
date Thu, 11 Dec 2008 10:15:27 +0000
parents
children 608947872f72
comparison
equal deleted inserted replaced
-1:000000000000 0:48d338d29ce9
1 Inside a distributed version control system
2 ===========================================
3
4 Grinton Lodge is a Youth Hostel that sits on an exposed hillside just
5 above the small hamlet of Grinton in Swaledale, in the Yorkshire Dales
6 National Park. A former Victorian shooting lodge, it now welcomes
7 walkers and other travellers from around the world.
8
9 Tonight, a Wednesday in mid-November, is not one of its busiest
10 nights. Kat, the duty staff member, tells me that there is a small
11 corporate team-building group in the annex. There's no sign of them at
12 present. Otherwise, that portion of the world that has beaten a path
13 to the door of this grand building today consists of just me. And Kat
14 goes home soon.
15
16 The November CVu, removed from its wrappers and read yesterday, lies
17 in my bag. Taunting me. Go on, it says, if you've ever going to put
18 finger to keyboard in the name of CVu, well, tonight you are out of
19 excuses.
20
21 Bugger.
22
23 Let's look into Mercurial
24 -------------------------
25
26 Mercurial is a Distributed Version Control System (DVCS). It's one of a
27 number of DVCSs that have gained significant popularity in the
28 last few years. I switched a significant work project over to Mercurial
29 (from Subversion) over a year ago, because a customer site required
30 on-site work but could not allow access back to the company VPN. I
31 chose Mercurial for a variety of reasons which I won't bore you with
32 here. If you must know, see the box.
33
34 What I want to do in this article is give you an insight into how a
35 DVCS works. OK, so specifically I'm going to be talking about
36 Mercurial, but Git and Bazaar attack the problem in a similar way. But
37 first I'd better give you some idea of how you use Mercurial.
38
39 ::::
40 Box: OK, if you must know:
41
42 o Implementability. I needed the system to work on Windows, Linux and
43 AIX. The latter was not one of the directly supported platforms for
44 any of the candidates. Git's implementation uses a horde of
45 tools. Bazaar requires only Python, but required Python 2.4 while IBM
46 stubbornly still supplies only Python 2.3. Mercurial requires Python
47 2.3 or greater, and uses some C for speed.
48
49 o Simplicity. From the command line, Mercurial's core operations will
50 be familiar to a Subversion user. This is also true of Bazaar, but was
51 less true of Git. Git has improved in this matter since then, but a Mr
52 Winder of this parish tells me that it's still possible to seriously
53 embarass yourself. There was also a lack of Windows support for Git at
54 the time.
55
56 o Speed. Mercurial is fast. In the same ballpark as Git. Bazaar
57 wasn't, and although it has improved significantly, has, in my
58 estimation, added user complexity in the process, and is still off the
59 pace for some operations.
60
61 o Documentation. At the time, Bryan O'Sullivan's excellent Mercurial
62 book (http://hgbook.red-bean.com) was a clear winner for best
63 documentation.
64 ::::
65
66 The 5 minute Mercurial overview
67 -------------------------------
68
69 I think it unlikely that someone possessing the taste and discernment
70 to be reading CVu would not be familiar with at least one version
71 control system. So, while I want to give you a flavour of what it's
72 like to use, I'm not going to hang about. If you'd like a proper
73 introduction, or you don't follow something, I thoroughly recommend
74 you consult the Mercurial book.
75
76 To start using Mercurial to keep track of a project.
77
78 $ hg init
79 $
80
81 This creates the repository root in the current directory.
82
83 Like CVS with its CVS directory and Subversion with its .svn
84 directory, Mercurial keeps its private data in a directory. Mercifully
85 there is only one of these, in the top level of your project. And
86 rather than holding details of where the actual repository is to be
87 found, the .hg directory holds the entire repository.
88
89 Next you need to specify the files you want Mercurial to track.
90
91 $ echo "There was a gibbon one morning" > pome.txt
92 $ hg add pome.txt
93 $
94
95 As you might expect, this marks the files as to be added. And as you
96 might also expect, you need to commit to record the added files in the
97 repository. The commit comment can be supplied on the command line; if
98 you don't supply a comment, you'll be dropped into an editor to
99 provide one.
100
101 There is a suggested format for these messages - a one line summary
102 followed by any more required detail on following lines. By default
103 Mercurial will only display the first line of commit messages when
104 listing changes. In these examples I'll stick to terse messages, and
105 I'll enter them from the command line.
106
107 $ hg commit -m "My Pome" -u "Jim Hague <jim.hague@acm.org>"
108 $
109
110 Mercurial records the user making the change as part of the change
111 information. It is usual to give your name and email address as I've
112 done here. You can imagine, though, that constantly having to repeat
113 this is a bit tedious, so you can set a default user name in a
114 configuration file. Mercurial keeps global, user and repository
115 configurations, and it can go in any of those.
116
117 As with Subversion, after further edits you see how your working copy
118 differs from the repository.
119
120 $ hg status
121 M pome.txt
122 $ hg diff
123 diff -r 33596ef855c1 pome.txt
124 --- a/pome.txt Wed Apr 23 22:36:33 2008 +0100
125 +++ b/pome.txt Wed Apr 23 22:48:01 2008 +0100
126 @@ -1,1 +1,2 @@ There was a gibbon one morning
127 There was a gibbon one morning
128 +said "I think I will fly to the moon".
129 $ hg commit -m "A great second line"
130 $
131
132 And look through a log of changes.
133
134 $ hg log
135 changeset: 1:3d65e7a57890
136 tag: tip
137 user: Jim Hague <jim.hague@acm.org>
138 date: Wed Apr 23 22:49:10 2008 +0100
139 summary: A great second line
140
141 changeset: 0:33596ef855c1
142 user: Jim Hague <jim.hague@acm.org>
143 date: Wed Apr 23 22:36:33 2008 +0100
144 summary: My Pome
145
146 $
147
148 There are some items here that need an explanation.
149
150 The changeset identifer is in fact two identifiers separated by a
151 colon. The first is the sequence number of the changeset in the
152 repository, and is directly comparable to the change number in a
153 Subversion repository. The second is a globally unique identifier for
154 that change. As the change is copied from one repository to another
155 (this is a distributed system, remember, even if we haven't come to
156 that bit yet), its sequence number in any particular repository will
157 change, but the global identifier will always remain the same.
158
159 'tip' is a Mercurial term. It means simply the most recent change.
160
161 Want to rename a file?
162
163 $ hg mv pome.txt poem.txt
164 $ hg status
165 A poem.txt
166 R pome.txt
167 $ hg commit -m "Rename my file"
168 $
169
170 (The command to rename a file is actually 'hg rename', but Mercurial
171 saves Unix-trained fingers from typing embarrassment.)
172
173 At this point you may be wondering about directories. 'hg mkdir'
174 perhaps? Well, no. Mercurial only tracks files. To be sure, the
175 directory a file occupies is tracked, but effectively only as a
176 component of the file name. This has the slightly unexpected result
177 that you can't record an empty directory in your repository.
178 (Footnote: I tripped over this converting a work Subversion
179 repository. One possibility is to create a placemaker file in the
180 directory. In the event I created the directory (which receives build
181 products) as part of the build instead.)
182
183 Given this, and the status output above that suggests strongly that
184 Mercurial treats a rename as a copy followed by a delete, you may be
185 worried that Mercurial won't cope at all well with rearranging your
186 repository. Relax. Mercurial does store the details of the rename as
187 part of the changeset, and copes very well with rearrangements.
188
189 (Footnote: The Mercurial designers justify not dealing with
190 directories as first class objects by pointing out that provided you
191 can correctly move files about in the tree, the other reasons for
192 tracking directories are uncommon and do not in their opinion justify
193 the considerable added complexity. So far I've found no reason to
194 doubt that judgement.)
195
196 Want to rewind the working copy to a previous revision?
197
198 $ hg update -r 1
199 1 files updated, 0 files merged, 1 files removed, 0 files unresolved
200 $
201
202 'hg update' updates the working files. In this case I'm specifying
203 that I want to go back to local changeset 1. I could also have typed
204 '-r 3d65e7a57890', or even '-r 3d'; when specifying the global change
205 identifier you only need to type enough digits to make it unique.
206
207 This is all very well, but it's not exactly distributed, is it?
208
209 Copy an existing repository:
210
211 elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem
212 updating working directory
213 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
214
215 (You can access other repositories via the file system, over http or
216 over ssh).
217
218 elsewhere$ cd Jim-Poem
219 elsewhere$ hg log
220 changeset: 3:a065eb26e6b9
221 tag: tip
222 user: Jim Hague <jim.hague@acm.org>
223 date: Thu Apr 24 18:52:31 2008 +0100
224 summary: Rename my file
225
226 changeset: 2:ff97668b7422
227 user: Jim Hague <jim.hague@acm.org>
228 date: Thu Apr 24 18:50:22 2008 +0100
229 summary: Finished first verse
230
231 changeset: 1:3d65e7a57890
232 user: Jim Hague <jim.hague@acm.org>
233 date: Wed Apr 23 22:49:10 2008 +0100
234 summary: A great second line
235
236 changeset: 0:33596ef855c1
237 user: Jim Hague <jim.hague@acm.org>
238 date: Wed Apr 23 22:36:33 2008 +0100
239 summary: My Pome
240
241 'hg clone' is aptly named. It creates a new repository that contains
242 exactly the same changes as the source repository. You can make a
243 clone just by copying your project directory, if you're confident
244 nothing else will access it during the copy. 'hg clone' saves you this
245 worry, and sets the default push/pull location in the new repo to the
246 cloned repo.
247
248 From that point, you use 'hg pull' to collect changes from other
249 places into your repo (though note it does not by default update your
250 working copy), and, as you might guess, 'hg push' shoves your changes
251 into a foreign repository. By default these will act on the repository
252 you cloned from, but you can specify any other repository.
253
254 More on those in a moment. First, though, I want to show you something
255 you can't do in Subversion. Start with the repository with 4 changes
256 we just cloned. Let's focus on the first couple of lines.
257
258 $ hg update -r 1
259 1 files updated, 0 files merged, 1 files removed, 0 files unresolved
260
261 And make a change.
262
263 $ hg diff
264 diff -r 3d65e7a57890 pome.txt
265 --- a/pome.txt Wed Apr 23 22:49:10 2008 +0100
266 +++ b/pome.txt Thu Apr 24 19:13:14 2008 +0100
267 @@ -1,2 +1,2 @@ There was a gibbon one morning
268 -There was a gibbon one morning
269 -said "I think I will fly to the moon".
270 +There was a baboon who one afternoon
271 +said "I think I will fly to the sun".
272 $ hg commit -m "Better first two lines"
273 $
274
275 The alert among you will have sat up at that. Well done! Yes, there's
276 something very worrying. How can I commit a change at an old point?
277 If you try this in Subversion, it will complain mightily about your
278 file being out of date. But Mercurial just went ahead and did
279 something. The Bazaar experts among you will know that in Bazaar, if
280 you use 'bzr revert -r' to bring the working copy to a past revision,
281 make a change and commit, then your latest version will be the past
282 revision plus your change. Perhaps that's what Mercurial did?
283
284 No. What Mercurial did is central to Mercurial's view of the
285 world. You took your working copy back to an old changeset, and the
286 committed a fresh change based at that changeset. Mercurial actually
287 did just what you asked it to do, no more and no less. Let's see the
288 initial evidence.
289
290 $ hg heads
291 changeset: 4:267d32f158b3
292 tag: tip
293 parent: 1:3d65e7a57890
294 user: Jim Hague <jim.hague@acm.org>
295 date: Thu Apr 24 19:13:59 2008 +0100
296 summary: Better first two lines
297
298 changeset: 3:a065eb26e6b9
299 user: Jim Hague <jim.hague@acm.org>
300 date: Thu Apr 24 18:52:31 2008 +0100
301 summary: Rename my file
302
303 $
304
305 Time for some more Mercurial terminology. You can think of a 'head' in
306 Mercurial as the most recent change on a branch. In Mercurial, a
307 branch is simply what happens when you commit a change that has as its
308 parent a change that already has a child. Mercurial has a standard
309 extension 'hg glog' which uses some ASCII art to show the current
310 state:
311
312 $ hg glog
313 @ changeset: 4:267d32f158b3
314 | tag: tip
315 | parent: 1:3d65e7a57890
316 | user: Jim Hague <jim.hague@acm.org>
317 | date: Thu Apr 24 19:13:59 2008 +0100
318 | summary: Better first two lines
319 |
320 | o changeset: 3:a065eb26e6b9
321 | | user: Jim Hague <jim.hague@acm.org>
322 | | date: Thu Apr 24 18:52:31 2008 +0100
323 | | summary: Rename my file
324 | |
325 | o changeset: 2:ff97668b7422
326 |/ user: Jim Hague <jim.hague@acm.org>
327 | date: Thu Apr 24 18:50:22 2008 +0100
328 | summary: Finished first verse
329 |
330 o changeset: 1:3d65e7a57890
331 | user: Jim Hague <jim.hague@acm.org>
332 | date: Wed Apr 23 22:49:10 2008 +0100
333 | summary: A great second line
334 |
335 o changeset: 0:33596ef855c1
336 user: Jim Hague <jim.hague@acm.org>
337 date: Wed Apr 23 22:36:33 2008 +0100
338 summary: My Pome
339
340 $
341
342 'hg view' shows a nicer graphical view. (Footnote: Though, being
343 Tcl/Tk based, not that much nicer.)
344
345 So the change is in there. It's the latest change, and is simply on a
346 different branch to the other changes.
347
348 Almost invariably, you will want to bring everything back together and
349 merge the branches. A merge is a change that combines two heads back
350 into one. It prepares an updated working directory with the merged
351 contents of the two heads for you to review and, if satisfactory, commit.
352
353 $ hg merge
354 merging pome.txt and poem.txt
355 0 files updated, 1 files merged, 0 files removed, 0 files unresolved
356 (branch merge, don't forget to commit)
357 $ cat poem.txt
358 There was a baboon who one afternoon
359 said "I think I will fly to the sun".
360 So with two great palms strapped to his arms,
361 he started his takeoff run.
362 $ hg commit -m "Merge first line branch"
363 $
364
365 (Footnote: I'm no poet. The poem is, of course, 'Silly Old Baboon' by
366 the late, great, Spike Milligan.)
367
368 Here's the ASCII art again showing what just happened. Oh, and notice
369 that Mercurial has done the right thing with regard to the rename.
370
371 $ hg glog
372 @ changeset: 5:792ab970fc80
373 |\ tag: tip
374 | | parent: 4:267d32f158b3
375 | | parent: 3:a065eb26e6b9
376 | | user: Jim Hague <jim.hague@acm.org>
377 | | date: Thu Apr 24 19:29:53 2008 +0100
378 | | summary: Merge first line branch
379 | |
380 | o changeset: 4:267d32f158b3
381 | | parent: 1:3d65e7a57890
382 | | user: Jim Hague <jim.hague@acm.org>
383 | | date: Thu Apr 24 19:13:59 2008 +0100
384 | | summary: Better first two lines
385 | |
386 o | changeset: 3:a065eb26e6b9
387 | | user: Jim Hague <jim.hague@acm.org>
388 | | date: Thu Apr 24 18:52:31 2008 +0100
389 | | summary: Rename my file
390 | |
391 o | changeset: 2:ff97668b7422
392 |/ user: Jim Hague <jim.hague@acm.org>
393 | date: Thu Apr 24 18:50:22 2008 +0100
394 | summary: Finished first verse
395 |
396 o changeset: 1:3d65e7a57890
397 | user: Jim Hague <jim.hague@acm.org>
398 | date: Wed Apr 23 22:49:10 2008 +0100
399 | summary: A great second line
400 |
401 o changeset: 0:33596ef855c1
402 user: Jim Hague <jim.hague@acm.org>
403 date: Wed Apr 23 22:36:33 2008 +0100
404 summary: My Pome
405
406 $
407
408 So, our little branch change has now been merged back, and we have a
409 single line of development again. Notice that unlike the other
410 changesets, changeset 5 has two parent changesets, indicating it is a
411 merge changeset. You can only merge two branches in one operation; or
412 putting it another way, a changeset can have a maximum of two parents.
413
414 This behaviour is absolutely central to Mercurial's philosophy. If a
415 change is committed that takes as its starting point a change that
416 already has a child, then a branch gets created. Working with
417 Mercurial, branches get created frequently, and equally frequently
418 merged back. As befits any frequent operation, both are easy to do.
419
420 You're probably thinking at this point that this making a commit onto
421 an old version is a slightly strange thing to do, and you'd be right.
422 But that's exactly what's going to happen the moment you go
423 distributed. Two people working independently with their own
424 repositories are going to make commits based, typically, on the latest
425 changes they happen to have incorporated into their tree. To be
426 Distributed, a DVCS has to deal with this. Mercurial faces it head-on.
427 When you pull changes into your repo (or someone else pushes them), if
428 any of the changes overlap - are both based on the same base change -
429 you get extra heads, and it's up to you to let these extra heads live
430 or merge, as you please.
431
432 In practice this is more manageable then you might think. Consider a
433 typical Mercurial usage, where the 'master' repo sits on a known
434 server, and everyone pulls changes from the master and pushes their
435 own efforts the master. But default Mercurial won't let you push if
436 the receiving repo will gain an extra head as a result, so you
437 typically pull (and do any required merging) just before
438 pushing. Subversion users will recognised this pattern. Subversion
439 won't let you commit a change if your working copy is not at the very
440 latest revision, so the Subversion user will update, and merge if
441 necessary, just before committing.
442
443 What, then, about a branch in the conventional sense of '1.0
444 maintenance branch'? Typically in Mercurial you'd handle this by
445 keeping a separate cloned repository for those changes. Cloning is
446 fast, and if local uses hard links where possible on filesystems that
447 support them, so isn't necessarily extravagant on disc space. You can,
448 if you prefer, handle them all in a single repo with 'named
449 branches', but cloning is definitely simpler.
450
451 OK, so now you know the basics of using Mercurial. We can proceed to
452 looking at how this magic is achieved. In particular, where does this
453 magic globally unique identifier for a change come from?
454
455 Inside the Mercurial repo
456 -------------------------
457
458 The way Mercurial handles its repo is really quite simple.
459
460 That's simple, as in 'most things are simple once you know the
461 answer'. I found the explanation helpful, so this section attempts
462 the 10,000ft (FL100 if you prefer) view of Mercurial.
463
464 (Foornote: Bryan O'Sullivan's excellent Mercurial book has a chapter
465 on the subject, and the Mercurial website has a fair amount of detail
466 too. This is 'research', OK?)
467
468 First remember that any file or component can only have one or two
469 parents. You can't merge more than one other branch at once.
470
471 We start with the basic building block, which Mercurial calls a
472 revlog. A revlog is a thing that holds a file and all the changes in
473 the file history. (Footnote: For any non-trivial file, this will
474 actually be two files on the disc, a data file and an index). The
475 revlog stores the (compressed) differences between successive versions
476 of the file, though it will periodically store a complete version of
477 the file instead of a difference, so that the content of any
478 particular file version can always be reconstructed without excessive
479 effort.
480
481 Under the secret-squirrel Mercurial .hg directory at the top of your
482 project is a store which holds a revlog for each file in your project.
483
484 Any point in the evolution of a revlog can be uniquely identified with
485 a nodeid. This is simply the SHA1 hash of the current file contents
486 concatenated with the nodeids of one or both parents of the current
487 revision. Note that this way, two file states are identical if and
488 only if the file contents are the same *and* the file has the
489 same history.
490
491 Here's a dump of a revlog index:
492
493 $ hg debugindex .hg/store/data/pome.txt.i
494 rev offset length base linkrev nodeid p1 p2
495 0 0 32 0 0 6bbbd5d6cc53 000000000000 000000000000
496 1 32 51 0 1 83d266583303 6bbbd5d6cc53 000000000000
497 2 83 84 0 2 14a54ec34bb6 83d266583303 000000000000
498 3 167 76 3 4 dc4df776b38b 83d266583303 000000000000
499 $
500
501 Note here that a file state can have two parents. If both the parent
502 nodeids are non-null, the file state has two parents, and the state is
503 therefore the result of a merge.
504
505 Let's dump out a revlog at a particular revision:
506
507 $ hg debugdata .hg/store/data/pome.txt.i 2
508 There was a gibbon one morning
509 said "I think I will fly to the moon".
510 So with two great palms strapped to his arms,
511 he started his takeoff run.
512 $
513
514 The next component is the manifest. This is simply a list of all the
515 files in the project, together with their current nodeids. The
516 manifest is a file, held in a revlog. The nodeid of the manifest,
517 therefore, identifies the project filesystem at a particular point.
518
519 $ hg debugdata .hg/store/00manifest.i 5
520 poem.txt5168b1a5e2f44aa4e0f164e170820845183f50c8
521 $
522
523 Finally we have the changeset. This is the atomic collection of
524 changes to a repository that leads to a new revision. The changeset
525 info includes the nodeid of the corresponding manifest, the timestamp
526 and committer ID, a list of changed files and a comment. The changeset
527 also includes the nodeid of the parent changeset, or the two parents
528 if the change is a merge. The changeset description is held in a
529 revlog, the changelog.
530
531 $ hg debugdata .hg/store/00changelog.i 5
532 1ccc11b6f7308cc8fa1573c2f3811a4710c91e3e
533 Jim Hague <jim.hague@acm.org>
534 1209061793 -3600
535 poem.txt
536 pome.txt
537
538 Merge first line branch
539 $
540
541 The nodeid of the changeset, therefore, gives us a globally unique
542 identifier for any particular change. Changesets have a
543 Subversion-like incrementing change number, but it is peculiar to that
544 repository. The nodeid, however, is global.
545
546 One more detail remains to complete the picture. How do we get back
547 from a particular file change to find the responsible changeset? Each
548 revlog change has a linkrev entry that does just this.
549
550 So, now we have a repository with a history of the changes applied to
551 that repository. Each change has a unique identifier. If we find that
552 change in another repository, it means that at the point in the other
553 repository we have exactly the same state; the file contents and
554 history are identical.
555
556 At this point we can see how pulling changes from another repository
557 works. Mercurial has to determine which changesets in the source
558 repository are missing in the target repository. To do this, for each
559 head in the source repo it has to find the most recent change in that
560 head that it already present in the target repo, and get any remaining
561 changes after that point. These changes are then copied over and
562 applied.
563
564 The Mercurial revlog format has proved remarkably durable. Over the
565 lifetime of Mercurial, there have been just two changes to the file
566 format. And one of those (a very recently change at the time of
567 writing, yet to appear in a release version) is a very small change to
568 filename storage required to deal with Windows-specific issues.