0
|
1 Inside a distributed version control system
|
|
2 ===========================================
|
|
3
|
|
4 Grinton Lodge is a Youth Hostel that sits on an exposed hillside just
|
|
5 above the small hamlet of Grinton in Swaledale, in the Yorkshire Dales
|
|
6 National Park. A former Victorian shooting lodge, it now welcomes
|
|
7 walkers and other travellers from around the world.
|
|
8
|
|
9 Tonight, a Wednesday in mid-November, is not one of its busiest
|
|
10 nights. Kat, the duty staff member, tells me that there is a small
|
|
11 corporate team-building group in the annex. There's no sign of them at
|
|
12 present. Otherwise, that portion of the world that has beaten a path
|
|
13 to the door of this grand building today consists of just me. And Kat
|
|
14 goes home soon.
|
|
15
|
|
16 The November CVu, removed from its wrappers and read yesterday, lies
|
|
17 in my bag. Taunting me. Go on, it says, if you've ever going to put
|
|
18 finger to keyboard in the name of CVu, well, tonight you are out of
|
|
19 excuses.
|
|
20
|
|
21 Bugger.
|
|
22
|
|
23 Let's look into Mercurial
|
|
24 -------------------------
|
|
25
|
|
26 Mercurial is a Distributed Version Control System (DVCS). It's one of a
|
|
27 number of DVCSs that have gained significant popularity in the
|
|
28 last few years. I switched a significant work project over to Mercurial
|
|
29 (from Subversion) over a year ago, because a customer site required
|
|
30 on-site work but could not allow access back to the company VPN. I
|
|
31 chose Mercurial for a variety of reasons which I won't bore you with
|
|
32 here. If you must know, see the box.
|
|
33
|
|
34 What I want to do in this article is give you an insight into how a
|
|
35 DVCS works. OK, so specifically I'm going to be talking about
|
|
36 Mercurial, but Git and Bazaar attack the problem in a similar way. But
|
|
37 first I'd better give you some idea of how you use Mercurial.
|
|
38
|
|
39 ::::
|
|
40 Box: OK, if you must know:
|
|
41
|
|
42 o Implementability. I needed the system to work on Windows, Linux and
|
|
43 AIX. The latter was not one of the directly supported platforms for
|
|
44 any of the candidates. Git's implementation uses a horde of
|
|
45 tools. Bazaar requires only Python, but required Python 2.4 while IBM
|
|
46 stubbornly still supplies only Python 2.3. Mercurial requires Python
|
|
47 2.3 or greater, and uses some C for speed.
|
|
48
|
|
49 o Simplicity. From the command line, Mercurial's core operations will
|
|
50 be familiar to a Subversion user. This is also true of Bazaar, but was
|
|
51 less true of Git. Git has improved in this matter since then, but a Mr
|
|
52 Winder of this parish tells me that it's still possible to seriously
|
|
53 embarass yourself. There was also a lack of Windows support for Git at
|
|
54 the time.
|
|
55
|
|
56 o Speed. Mercurial is fast. In the same ballpark as Git. Bazaar
|
|
57 wasn't, and although it has improved significantly, has, in my
|
|
58 estimation, added user complexity in the process, and is still off the
|
|
59 pace for some operations.
|
|
60
|
|
61 o Documentation. At the time, Bryan O'Sullivan's excellent Mercurial
|
|
62 book (http://hgbook.red-bean.com) was a clear winner for best
|
|
63 documentation.
|
|
64 ::::
|
|
65
|
|
66 The 5 minute Mercurial overview
|
|
67 -------------------------------
|
|
68
|
|
69 I think it unlikely that someone possessing the taste and discernment
|
|
70 to be reading CVu would not be familiar with at least one version
|
|
71 control system. So, while I want to give you a flavour of what it's
|
|
72 like to use, I'm not going to hang about. If you'd like a proper
|
|
73 introduction, or you don't follow something, I thoroughly recommend
|
|
74 you consult the Mercurial book.
|
|
75
|
|
76 To start using Mercurial to keep track of a project.
|
|
77
|
|
78 $ hg init
|
|
79 $
|
|
80
|
|
81 This creates the repository root in the current directory.
|
|
82
|
|
83 Like CVS with its CVS directory and Subversion with its .svn
|
|
84 directory, Mercurial keeps its private data in a directory. Mercifully
|
|
85 there is only one of these, in the top level of your project. And
|
|
86 rather than holding details of where the actual repository is to be
|
|
87 found, the .hg directory holds the entire repository.
|
|
88
|
|
89 Next you need to specify the files you want Mercurial to track.
|
|
90
|
|
91 $ echo "There was a gibbon one morning" > pome.txt
|
|
92 $ hg add pome.txt
|
|
93 $
|
|
94
|
|
95 As you might expect, this marks the files as to be added. And as you
|
|
96 might also expect, you need to commit to record the added files in the
|
|
97 repository. The commit comment can be supplied on the command line; if
|
|
98 you don't supply a comment, you'll be dropped into an editor to
|
|
99 provide one.
|
|
100
|
|
101 There is a suggested format for these messages - a one line summary
|
|
102 followed by any more required detail on following lines. By default
|
|
103 Mercurial will only display the first line of commit messages when
|
|
104 listing changes. In these examples I'll stick to terse messages, and
|
|
105 I'll enter them from the command line.
|
|
106
|
|
107 $ hg commit -m "My Pome" -u "Jim Hague <jim.hague@acm.org>"
|
|
108 $
|
|
109
|
|
110 Mercurial records the user making the change as part of the change
|
|
111 information. It is usual to give your name and email address as I've
|
|
112 done here. You can imagine, though, that constantly having to repeat
|
|
113 this is a bit tedious, so you can set a default user name in a
|
|
114 configuration file. Mercurial keeps global, user and repository
|
|
115 configurations, and it can go in any of those.
|
|
116
|
|
117 As with Subversion, after further edits you see how your working copy
|
|
118 differs from the repository.
|
|
119
|
|
120 $ hg status
|
|
121 M pome.txt
|
|
122 $ hg diff
|
|
123 diff -r 33596ef855c1 pome.txt
|
|
124 --- a/pome.txt Wed Apr 23 22:36:33 2008 +0100
|
|
125 +++ b/pome.txt Wed Apr 23 22:48:01 2008 +0100
|
|
126 @@ -1,1 +1,2 @@ There was a gibbon one morning
|
|
127 There was a gibbon one morning
|
|
128 +said "I think I will fly to the moon".
|
|
129 $ hg commit -m "A great second line"
|
|
130 $
|
|
131
|
|
132 And look through a log of changes.
|
|
133
|
|
134 $ hg log
|
|
135 changeset: 1:3d65e7a57890
|
|
136 tag: tip
|
|
137 user: Jim Hague <jim.hague@acm.org>
|
|
138 date: Wed Apr 23 22:49:10 2008 +0100
|
|
139 summary: A great second line
|
|
140
|
|
141 changeset: 0:33596ef855c1
|
|
142 user: Jim Hague <jim.hague@acm.org>
|
|
143 date: Wed Apr 23 22:36:33 2008 +0100
|
|
144 summary: My Pome
|
|
145
|
|
146 $
|
|
147
|
|
148 There are some items here that need an explanation.
|
|
149
|
|
150 The changeset identifer is in fact two identifiers separated by a
|
|
151 colon. The first is the sequence number of the changeset in the
|
|
152 repository, and is directly comparable to the change number in a
|
|
153 Subversion repository. The second is a globally unique identifier for
|
|
154 that change. As the change is copied from one repository to another
|
|
155 (this is a distributed system, remember, even if we haven't come to
|
|
156 that bit yet), its sequence number in any particular repository will
|
|
157 change, but the global identifier will always remain the same.
|
|
158
|
|
159 'tip' is a Mercurial term. It means simply the most recent change.
|
|
160
|
|
161 Want to rename a file?
|
|
162
|
|
163 $ hg mv pome.txt poem.txt
|
|
164 $ hg status
|
|
165 A poem.txt
|
|
166 R pome.txt
|
|
167 $ hg commit -m "Rename my file"
|
|
168 $
|
|
169
|
|
170 (The command to rename a file is actually 'hg rename', but Mercurial
|
|
171 saves Unix-trained fingers from typing embarrassment.)
|
|
172
|
|
173 At this point you may be wondering about directories. 'hg mkdir'
|
|
174 perhaps? Well, no. Mercurial only tracks files. To be sure, the
|
|
175 directory a file occupies is tracked, but effectively only as a
|
|
176 component of the file name. This has the slightly unexpected result
|
|
177 that you can't record an empty directory in your repository.
|
|
178 (Footnote: I tripped over this converting a work Subversion
|
|
179 repository. One possibility is to create a placemaker file in the
|
|
180 directory. In the event I created the directory (which receives build
|
|
181 products) as part of the build instead.)
|
|
182
|
|
183 Given this, and the status output above that suggests strongly that
|
|
184 Mercurial treats a rename as a copy followed by a delete, you may be
|
|
185 worried that Mercurial won't cope at all well with rearranging your
|
|
186 repository. Relax. Mercurial does store the details of the rename as
|
|
187 part of the changeset, and copes very well with rearrangements.
|
|
188
|
|
189 (Footnote: The Mercurial designers justify not dealing with
|
|
190 directories as first class objects by pointing out that provided you
|
|
191 can correctly move files about in the tree, the other reasons for
|
|
192 tracking directories are uncommon and do not in their opinion justify
|
|
193 the considerable added complexity. So far I've found no reason to
|
|
194 doubt that judgement.)
|
|
195
|
|
196 Want to rewind the working copy to a previous revision?
|
|
197
|
|
198 $ hg update -r 1
|
|
199 1 files updated, 0 files merged, 1 files removed, 0 files unresolved
|
|
200 $
|
|
201
|
|
202 'hg update' updates the working files. In this case I'm specifying
|
|
203 that I want to go back to local changeset 1. I could also have typed
|
|
204 '-r 3d65e7a57890', or even '-r 3d'; when specifying the global change
|
|
205 identifier you only need to type enough digits to make it unique.
|
|
206
|
|
207 This is all very well, but it's not exactly distributed, is it?
|
|
208
|
|
209 Copy an existing repository:
|
|
210
|
|
211 elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem
|
|
212 updating working directory
|
|
213 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
|
|
214
|
|
215 (You can access other repositories via the file system, over http or
|
|
216 over ssh).
|
|
217
|
|
218 elsewhere$ cd Jim-Poem
|
|
219 elsewhere$ hg log
|
|
220 changeset: 3:a065eb26e6b9
|
|
221 tag: tip
|
|
222 user: Jim Hague <jim.hague@acm.org>
|
|
223 date: Thu Apr 24 18:52:31 2008 +0100
|
|
224 summary: Rename my file
|
|
225
|
|
226 changeset: 2:ff97668b7422
|
|
227 user: Jim Hague <jim.hague@acm.org>
|
|
228 date: Thu Apr 24 18:50:22 2008 +0100
|
|
229 summary: Finished first verse
|
|
230
|
|
231 changeset: 1:3d65e7a57890
|
|
232 user: Jim Hague <jim.hague@acm.org>
|
|
233 date: Wed Apr 23 22:49:10 2008 +0100
|
|
234 summary: A great second line
|
|
235
|
|
236 changeset: 0:33596ef855c1
|
|
237 user: Jim Hague <jim.hague@acm.org>
|
|
238 date: Wed Apr 23 22:36:33 2008 +0100
|
|
239 summary: My Pome
|
|
240
|
|
241 'hg clone' is aptly named. It creates a new repository that contains
|
|
242 exactly the same changes as the source repository. You can make a
|
|
243 clone just by copying your project directory, if you're confident
|
|
244 nothing else will access it during the copy. 'hg clone' saves you this
|
|
245 worry, and sets the default push/pull location in the new repo to the
|
|
246 cloned repo.
|
|
247
|
|
248 From that point, you use 'hg pull' to collect changes from other
|
|
249 places into your repo (though note it does not by default update your
|
|
250 working copy), and, as you might guess, 'hg push' shoves your changes
|
|
251 into a foreign repository. By default these will act on the repository
|
|
252 you cloned from, but you can specify any other repository.
|
|
253
|
|
254 More on those in a moment. First, though, I want to show you something
|
|
255 you can't do in Subversion. Start with the repository with 4 changes
|
|
256 we just cloned. Let's focus on the first couple of lines.
|
|
257
|
|
258 $ hg update -r 1
|
|
259 1 files updated, 0 files merged, 1 files removed, 0 files unresolved
|
|
260
|
|
261 And make a change.
|
|
262
|
|
263 $ hg diff
|
|
264 diff -r 3d65e7a57890 pome.txt
|
|
265 --- a/pome.txt Wed Apr 23 22:49:10 2008 +0100
|
|
266 +++ b/pome.txt Thu Apr 24 19:13:14 2008 +0100
|
|
267 @@ -1,2 +1,2 @@ There was a gibbon one morning
|
|
268 -There was a gibbon one morning
|
|
269 -said "I think I will fly to the moon".
|
|
270 +There was a baboon who one afternoon
|
|
271 +said "I think I will fly to the sun".
|
|
272 $ hg commit -m "Better first two lines"
|
|
273 $
|
|
274
|
|
275 The alert among you will have sat up at that. Well done! Yes, there's
|
|
276 something very worrying. How can I commit a change at an old point?
|
|
277 If you try this in Subversion, it will complain mightily about your
|
|
278 file being out of date. But Mercurial just went ahead and did
|
|
279 something. The Bazaar experts among you will know that in Bazaar, if
|
|
280 you use 'bzr revert -r' to bring the working copy to a past revision,
|
|
281 make a change and commit, then your latest version will be the past
|
|
282 revision plus your change. Perhaps that's what Mercurial did?
|
|
283
|
|
284 No. What Mercurial did is central to Mercurial's view of the
|
|
285 world. You took your working copy back to an old changeset, and the
|
|
286 committed a fresh change based at that changeset. Mercurial actually
|
|
287 did just what you asked it to do, no more and no less. Let's see the
|
|
288 initial evidence.
|
|
289
|
|
290 $ hg heads
|
|
291 changeset: 4:267d32f158b3
|
|
292 tag: tip
|
|
293 parent: 1:3d65e7a57890
|
|
294 user: Jim Hague <jim.hague@acm.org>
|
|
295 date: Thu Apr 24 19:13:59 2008 +0100
|
|
296 summary: Better first two lines
|
|
297
|
|
298 changeset: 3:a065eb26e6b9
|
|
299 user: Jim Hague <jim.hague@acm.org>
|
|
300 date: Thu Apr 24 18:52:31 2008 +0100
|
|
301 summary: Rename my file
|
|
302
|
|
303 $
|
|
304
|
|
305 Time for some more Mercurial terminology. You can think of a 'head' in
|
|
306 Mercurial as the most recent change on a branch. In Mercurial, a
|
|
307 branch is simply what happens when you commit a change that has as its
|
|
308 parent a change that already has a child. Mercurial has a standard
|
|
309 extension 'hg glog' which uses some ASCII art to show the current
|
|
310 state:
|
|
311
|
|
312 $ hg glog
|
|
313 @ changeset: 4:267d32f158b3
|
|
314 | tag: tip
|
|
315 | parent: 1:3d65e7a57890
|
|
316 | user: Jim Hague <jim.hague@acm.org>
|
|
317 | date: Thu Apr 24 19:13:59 2008 +0100
|
|
318 | summary: Better first two lines
|
|
319 |
|
|
320 | o changeset: 3:a065eb26e6b9
|
|
321 | | user: Jim Hague <jim.hague@acm.org>
|
|
322 | | date: Thu Apr 24 18:52:31 2008 +0100
|
|
323 | | summary: Rename my file
|
|
324 | |
|
|
325 | o changeset: 2:ff97668b7422
|
|
326 |/ user: Jim Hague <jim.hague@acm.org>
|
|
327 | date: Thu Apr 24 18:50:22 2008 +0100
|
|
328 | summary: Finished first verse
|
|
329 |
|
|
330 o changeset: 1:3d65e7a57890
|
|
331 | user: Jim Hague <jim.hague@acm.org>
|
|
332 | date: Wed Apr 23 22:49:10 2008 +0100
|
|
333 | summary: A great second line
|
|
334 |
|
|
335 o changeset: 0:33596ef855c1
|
|
336 user: Jim Hague <jim.hague@acm.org>
|
|
337 date: Wed Apr 23 22:36:33 2008 +0100
|
|
338 summary: My Pome
|
|
339
|
|
340 $
|
|
341
|
|
342 'hg view' shows a nicer graphical view. (Footnote: Though, being
|
|
343 Tcl/Tk based, not that much nicer.)
|
|
344
|
|
345 So the change is in there. It's the latest change, and is simply on a
|
|
346 different branch to the other changes.
|
|
347
|
|
348 Almost invariably, you will want to bring everything back together and
|
|
349 merge the branches. A merge is a change that combines two heads back
|
|
350 into one. It prepares an updated working directory with the merged
|
|
351 contents of the two heads for you to review and, if satisfactory, commit.
|
|
352
|
|
353 $ hg merge
|
|
354 merging pome.txt and poem.txt
|
|
355 0 files updated, 1 files merged, 0 files removed, 0 files unresolved
|
|
356 (branch merge, don't forget to commit)
|
|
357 $ cat poem.txt
|
|
358 There was a baboon who one afternoon
|
|
359 said "I think I will fly to the sun".
|
|
360 So with two great palms strapped to his arms,
|
|
361 he started his takeoff run.
|
|
362 $ hg commit -m "Merge first line branch"
|
|
363 $
|
|
364
|
|
365 (Footnote: I'm no poet. The poem is, of course, 'Silly Old Baboon' by
|
|
366 the late, great, Spike Milligan.)
|
|
367
|
|
368 Here's the ASCII art again showing what just happened. Oh, and notice
|
|
369 that Mercurial has done the right thing with regard to the rename.
|
|
370
|
|
371 $ hg glog
|
|
372 @ changeset: 5:792ab970fc80
|
|
373 |\ tag: tip
|
|
374 | | parent: 4:267d32f158b3
|
|
375 | | parent: 3:a065eb26e6b9
|
|
376 | | user: Jim Hague <jim.hague@acm.org>
|
|
377 | | date: Thu Apr 24 19:29:53 2008 +0100
|
|
378 | | summary: Merge first line branch
|
|
379 | |
|
|
380 | o changeset: 4:267d32f158b3
|
|
381 | | parent: 1:3d65e7a57890
|
|
382 | | user: Jim Hague <jim.hague@acm.org>
|
|
383 | | date: Thu Apr 24 19:13:59 2008 +0100
|
|
384 | | summary: Better first two lines
|
|
385 | |
|
|
386 o | changeset: 3:a065eb26e6b9
|
|
387 | | user: Jim Hague <jim.hague@acm.org>
|
|
388 | | date: Thu Apr 24 18:52:31 2008 +0100
|
|
389 | | summary: Rename my file
|
|
390 | |
|
|
391 o | changeset: 2:ff97668b7422
|
|
392 |/ user: Jim Hague <jim.hague@acm.org>
|
|
393 | date: Thu Apr 24 18:50:22 2008 +0100
|
|
394 | summary: Finished first verse
|
|
395 |
|
|
396 o changeset: 1:3d65e7a57890
|
|
397 | user: Jim Hague <jim.hague@acm.org>
|
|
398 | date: Wed Apr 23 22:49:10 2008 +0100
|
|
399 | summary: A great second line
|
|
400 |
|
|
401 o changeset: 0:33596ef855c1
|
|
402 user: Jim Hague <jim.hague@acm.org>
|
|
403 date: Wed Apr 23 22:36:33 2008 +0100
|
|
404 summary: My Pome
|
|
405
|
|
406 $
|
|
407
|
|
408 So, our little branch change has now been merged back, and we have a
|
|
409 single line of development again. Notice that unlike the other
|
|
410 changesets, changeset 5 has two parent changesets, indicating it is a
|
|
411 merge changeset. You can only merge two branches in one operation; or
|
|
412 putting it another way, a changeset can have a maximum of two parents.
|
|
413
|
|
414 This behaviour is absolutely central to Mercurial's philosophy. If a
|
|
415 change is committed that takes as its starting point a change that
|
|
416 already has a child, then a branch gets created. Working with
|
|
417 Mercurial, branches get created frequently, and equally frequently
|
|
418 merged back. As befits any frequent operation, both are easy to do.
|
|
419
|
|
420 You're probably thinking at this point that this making a commit onto
|
|
421 an old version is a slightly strange thing to do, and you'd be right.
|
|
422 But that's exactly what's going to happen the moment you go
|
|
423 distributed. Two people working independently with their own
|
|
424 repositories are going to make commits based, typically, on the latest
|
|
425 changes they happen to have incorporated into their tree. To be
|
|
426 Distributed, a DVCS has to deal with this. Mercurial faces it head-on.
|
|
427 When you pull changes into your repo (or someone else pushes them), if
|
|
428 any of the changes overlap - are both based on the same base change -
|
|
429 you get extra heads, and it's up to you to let these extra heads live
|
|
430 or merge, as you please.
|
|
431
|
|
432 In practice this is more manageable then you might think. Consider a
|
|
433 typical Mercurial usage, where the 'master' repo sits on a known
|
|
434 server, and everyone pulls changes from the master and pushes their
|
|
435 own efforts the master. But default Mercurial won't let you push if
|
|
436 the receiving repo will gain an extra head as a result, so you
|
|
437 typically pull (and do any required merging) just before
|
|
438 pushing. Subversion users will recognised this pattern. Subversion
|
|
439 won't let you commit a change if your working copy is not at the very
|
|
440 latest revision, so the Subversion user will update, and merge if
|
|
441 necessary, just before committing.
|
|
442
|
|
443 What, then, about a branch in the conventional sense of '1.0
|
|
444 maintenance branch'? Typically in Mercurial you'd handle this by
|
|
445 keeping a separate cloned repository for those changes. Cloning is
|
|
446 fast, and if local uses hard links where possible on filesystems that
|
|
447 support them, so isn't necessarily extravagant on disc space. You can,
|
|
448 if you prefer, handle them all in a single repo with 'named
|
|
449 branches', but cloning is definitely simpler.
|
|
450
|
|
451 OK, so now you know the basics of using Mercurial. We can proceed to
|
|
452 looking at how this magic is achieved. In particular, where does this
|
|
453 magic globally unique identifier for a change come from?
|
|
454
|
|
455 Inside the Mercurial repo
|
|
456 -------------------------
|
|
457
|
|
458 The way Mercurial handles its repo is really quite simple.
|
|
459
|
|
460 That's simple, as in 'most things are simple once you know the
|
|
461 answer'. I found the explanation helpful, so this section attempts
|
|
462 the 10,000ft (FL100 if you prefer) view of Mercurial.
|
|
463
|
|
464 (Foornote: Bryan O'Sullivan's excellent Mercurial book has a chapter
|
|
465 on the subject, and the Mercurial website has a fair amount of detail
|
|
466 too. This is 'research', OK?)
|
|
467
|
|
468 First remember that any file or component can only have one or two
|
|
469 parents. You can't merge more than one other branch at once.
|
|
470
|
|
471 We start with the basic building block, which Mercurial calls a
|
|
472 revlog. A revlog is a thing that holds a file and all the changes in
|
|
473 the file history. (Footnote: For any non-trivial file, this will
|
|
474 actually be two files on the disc, a data file and an index). The
|
|
475 revlog stores the (compressed) differences between successive versions
|
|
476 of the file, though it will periodically store a complete version of
|
|
477 the file instead of a difference, so that the content of any
|
|
478 particular file version can always be reconstructed without excessive
|
|
479 effort.
|
|
480
|
|
481 Under the secret-squirrel Mercurial .hg directory at the top of your
|
|
482 project is a store which holds a revlog for each file in your project.
|
|
483
|
|
484 Any point in the evolution of a revlog can be uniquely identified with
|
|
485 a nodeid. This is simply the SHA1 hash of the current file contents
|
|
486 concatenated with the nodeids of one or both parents of the current
|
|
487 revision. Note that this way, two file states are identical if and
|
|
488 only if the file contents are the same *and* the file has the
|
|
489 same history.
|
|
490
|
|
491 Here's a dump of a revlog index:
|
|
492
|
|
493 $ hg debugindex .hg/store/data/pome.txt.i
|
|
494 rev offset length base linkrev nodeid p1 p2
|
|
495 0 0 32 0 0 6bbbd5d6cc53 000000000000 000000000000
|
|
496 1 32 51 0 1 83d266583303 6bbbd5d6cc53 000000000000
|
|
497 2 83 84 0 2 14a54ec34bb6 83d266583303 000000000000
|
|
498 3 167 76 3 4 dc4df776b38b 83d266583303 000000000000
|
|
499 $
|
|
500
|
|
501 Note here that a file state can have two parents. If both the parent
|
|
502 nodeids are non-null, the file state has two parents, and the state is
|
|
503 therefore the result of a merge.
|
|
504
|
|
505 Let's dump out a revlog at a particular revision:
|
|
506
|
|
507 $ hg debugdata .hg/store/data/pome.txt.i 2
|
|
508 There was a gibbon one morning
|
|
509 said "I think I will fly to the moon".
|
|
510 So with two great palms strapped to his arms,
|
|
511 he started his takeoff run.
|
|
512 $
|
|
513
|
|
514 The next component is the manifest. This is simply a list of all the
|
|
515 files in the project, together with their current nodeids. The
|
|
516 manifest is a file, held in a revlog. The nodeid of the manifest,
|
|
517 therefore, identifies the project filesystem at a particular point.
|
|
518
|
|
519 $ hg debugdata .hg/store/00manifest.i 5
|
|
520 poem.txt5168b1a5e2f44aa4e0f164e170820845183f50c8
|
|
521 $
|
|
522
|
|
523 Finally we have the changeset. This is the atomic collection of
|
|
524 changes to a repository that leads to a new revision. The changeset
|
|
525 info includes the nodeid of the corresponding manifest, the timestamp
|
|
526 and committer ID, a list of changed files and a comment. The changeset
|
|
527 also includes the nodeid of the parent changeset, or the two parents
|
|
528 if the change is a merge. The changeset description is held in a
|
|
529 revlog, the changelog.
|
|
530
|
|
531 $ hg debugdata .hg/store/00changelog.i 5
|
|
532 1ccc11b6f7308cc8fa1573c2f3811a4710c91e3e
|
|
533 Jim Hague <jim.hague@acm.org>
|
|
534 1209061793 -3600
|
|
535 poem.txt
|
|
536 pome.txt
|
|
537
|
|
538 Merge first line branch
|
|
539 $
|
|
540
|
|
541 The nodeid of the changeset, therefore, gives us a globally unique
|
|
542 identifier for any particular change. Changesets have a
|
|
543 Subversion-like incrementing change number, but it is peculiar to that
|
|
544 repository. The nodeid, however, is global.
|
|
545
|
|
546 One more detail remains to complete the picture. How do we get back
|
|
547 from a particular file change to find the responsible changeset? Each
|
|
548 revlog change has a linkrev entry that does just this.
|
|
549
|
|
550 So, now we have a repository with a history of the changes applied to
|
|
551 that repository. Each change has a unique identifier. If we find that
|
|
552 change in another repository, it means that at the point in the other
|
|
553 repository we have exactly the same state; the file contents and
|
|
554 history are identical.
|
|
555
|
|
556 At this point we can see how pulling changes from another repository
|
|
557 works. Mercurial has to determine which changesets in the source
|
|
558 repository are missing in the target repository. To do this, for each
|
|
559 head in the source repo it has to find the most recent change in that
|
|
560 head that it already present in the target repo, and get any remaining
|
|
561 changes after that point. These changes are then copied over and
|
|
562 applied.
|
|
563
|
|
564 The Mercurial revlog format has proved remarkably durable. Over the
|
|
565 lifetime of Mercurial, there have been just two changes to the file
|
|
566 format. And one of those (a very recently change at the time of
|
|
567 writing, yet to appear in a release version) is a very small change to
|
|
568 filename storage required to deal with Windows-specific issues.
|