Mercurial > CVu-Mercurial
comparison Hg.txt @ 0:48d338d29ce9
First comitted version.
| author | Jim Hague <jim.hague@acm.org> |
|---|---|
| date | Thu, 11 Dec 2008 10:15:27 +0000 |
| parents | |
| children | 608947872f72 |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:48d338d29ce9 |
|---|---|
| 1 Inside a distributed version control system | |
| 2 =========================================== | |
| 3 | |
| 4 Grinton Lodge is a Youth Hostel that sits on an exposed hillside just | |
| 5 above the small hamlet of Grinton in Swaledale, in the Yorkshire Dales | |
| 6 National Park. A former Victorian shooting lodge, it now welcomes | |
| 7 walkers and other travellers from around the world. | |
| 8 | |
| 9 Tonight, a Wednesday in mid-November, is not one of its busiest | |
| 10 nights. Kat, the duty staff member, tells me that there is a small | |
| 11 corporate team-building group in the annex. There's no sign of them at | |
| 12 present. Otherwise, that portion of the world that has beaten a path | |
| 13 to the door of this grand building today consists of just me. And Kat | |
| 14 goes home soon. | |
| 15 | |
| 16 The November CVu, removed from its wrappers and read yesterday, lies | |
| 17 in my bag. Taunting me. Go on, it says, if you've ever going to put | |
| 18 finger to keyboard in the name of CVu, well, tonight you are out of | |
| 19 excuses. | |
| 20 | |
| 21 Bugger. | |
| 22 | |
| 23 Let's look into Mercurial | |
| 24 ------------------------- | |
| 25 | |
| 26 Mercurial is a Distributed Version Control System (DVCS). It's one of a | |
| 27 number of DVCSs that have gained significant popularity in the | |
| 28 last few years. I switched a significant work project over to Mercurial | |
| 29 (from Subversion) over a year ago, because a customer site required | |
| 30 on-site work but could not allow access back to the company VPN. I | |
| 31 chose Mercurial for a variety of reasons which I won't bore you with | |
| 32 here. If you must know, see the box. | |
| 33 | |
| 34 What I want to do in this article is give you an insight into how a | |
| 35 DVCS works. OK, so specifically I'm going to be talking about | |
| 36 Mercurial, but Git and Bazaar attack the problem in a similar way. But | |
| 37 first I'd better give you some idea of how you use Mercurial. | |
| 38 | |
| 39 :::: | |
| 40 Box: OK, if you must know: | |
| 41 | |
| 42 o Implementability. I needed the system to work on Windows, Linux and | |
| 43 AIX. The latter was not one of the directly supported platforms for | |
| 44 any of the candidates. Git's implementation uses a horde of | |
| 45 tools. Bazaar requires only Python, but required Python 2.4 while IBM | |
| 46 stubbornly still supplies only Python 2.3. Mercurial requires Python | |
| 47 2.3 or greater, and uses some C for speed. | |
| 48 | |
| 49 o Simplicity. From the command line, Mercurial's core operations will | |
| 50 be familiar to a Subversion user. This is also true of Bazaar, but was | |
| 51 less true of Git. Git has improved in this matter since then, but a Mr | |
| 52 Winder of this parish tells me that it's still possible to seriously | |
| 53 embarass yourself. There was also a lack of Windows support for Git at | |
| 54 the time. | |
| 55 | |
| 56 o Speed. Mercurial is fast. In the same ballpark as Git. Bazaar | |
| 57 wasn't, and although it has improved significantly, has, in my | |
| 58 estimation, added user complexity in the process, and is still off the | |
| 59 pace for some operations. | |
| 60 | |
| 61 o Documentation. At the time, Bryan O'Sullivan's excellent Mercurial | |
| 62 book (http://hgbook.red-bean.com) was a clear winner for best | |
| 63 documentation. | |
| 64 :::: | |
| 65 | |
| 66 The 5 minute Mercurial overview | |
| 67 ------------------------------- | |
| 68 | |
| 69 I think it unlikely that someone possessing the taste and discernment | |
| 70 to be reading CVu would not be familiar with at least one version | |
| 71 control system. So, while I want to give you a flavour of what it's | |
| 72 like to use, I'm not going to hang about. If you'd like a proper | |
| 73 introduction, or you don't follow something, I thoroughly recommend | |
| 74 you consult the Mercurial book. | |
| 75 | |
| 76 To start using Mercurial to keep track of a project. | |
| 77 | |
| 78 $ hg init | |
| 79 $ | |
| 80 | |
| 81 This creates the repository root in the current directory. | |
| 82 | |
| 83 Like CVS with its CVS directory and Subversion with its .svn | |
| 84 directory, Mercurial keeps its private data in a directory. Mercifully | |
| 85 there is only one of these, in the top level of your project. And | |
| 86 rather than holding details of where the actual repository is to be | |
| 87 found, the .hg directory holds the entire repository. | |
| 88 | |
| 89 Next you need to specify the files you want Mercurial to track. | |
| 90 | |
| 91 $ echo "There was a gibbon one morning" > pome.txt | |
| 92 $ hg add pome.txt | |
| 93 $ | |
| 94 | |
| 95 As you might expect, this marks the files as to be added. And as you | |
| 96 might also expect, you need to commit to record the added files in the | |
| 97 repository. The commit comment can be supplied on the command line; if | |
| 98 you don't supply a comment, you'll be dropped into an editor to | |
| 99 provide one. | |
| 100 | |
| 101 There is a suggested format for these messages - a one line summary | |
| 102 followed by any more required detail on following lines. By default | |
| 103 Mercurial will only display the first line of commit messages when | |
| 104 listing changes. In these examples I'll stick to terse messages, and | |
| 105 I'll enter them from the command line. | |
| 106 | |
| 107 $ hg commit -m "My Pome" -u "Jim Hague <jim.hague@acm.org>" | |
| 108 $ | |
| 109 | |
| 110 Mercurial records the user making the change as part of the change | |
| 111 information. It is usual to give your name and email address as I've | |
| 112 done here. You can imagine, though, that constantly having to repeat | |
| 113 this is a bit tedious, so you can set a default user name in a | |
| 114 configuration file. Mercurial keeps global, user and repository | |
| 115 configurations, and it can go in any of those. | |
| 116 | |
| 117 As with Subversion, after further edits you see how your working copy | |
| 118 differs from the repository. | |
| 119 | |
| 120 $ hg status | |
| 121 M pome.txt | |
| 122 $ hg diff | |
| 123 diff -r 33596ef855c1 pome.txt | |
| 124 --- a/pome.txt Wed Apr 23 22:36:33 2008 +0100 | |
| 125 +++ b/pome.txt Wed Apr 23 22:48:01 2008 +0100 | |
| 126 @@ -1,1 +1,2 @@ There was a gibbon one morning | |
| 127 There was a gibbon one morning | |
| 128 +said "I think I will fly to the moon". | |
| 129 $ hg commit -m "A great second line" | |
| 130 $ | |
| 131 | |
| 132 And look through a log of changes. | |
| 133 | |
| 134 $ hg log | |
| 135 changeset: 1:3d65e7a57890 | |
| 136 tag: tip | |
| 137 user: Jim Hague <jim.hague@acm.org> | |
| 138 date: Wed Apr 23 22:49:10 2008 +0100 | |
| 139 summary: A great second line | |
| 140 | |
| 141 changeset: 0:33596ef855c1 | |
| 142 user: Jim Hague <jim.hague@acm.org> | |
| 143 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 144 summary: My Pome | |
| 145 | |
| 146 $ | |
| 147 | |
| 148 There are some items here that need an explanation. | |
| 149 | |
| 150 The changeset identifer is in fact two identifiers separated by a | |
| 151 colon. The first is the sequence number of the changeset in the | |
| 152 repository, and is directly comparable to the change number in a | |
| 153 Subversion repository. The second is a globally unique identifier for | |
| 154 that change. As the change is copied from one repository to another | |
| 155 (this is a distributed system, remember, even if we haven't come to | |
| 156 that bit yet), its sequence number in any particular repository will | |
| 157 change, but the global identifier will always remain the same. | |
| 158 | |
| 159 'tip' is a Mercurial term. It means simply the most recent change. | |
| 160 | |
| 161 Want to rename a file? | |
| 162 | |
| 163 $ hg mv pome.txt poem.txt | |
| 164 $ hg status | |
| 165 A poem.txt | |
| 166 R pome.txt | |
| 167 $ hg commit -m "Rename my file" | |
| 168 $ | |
| 169 | |
| 170 (The command to rename a file is actually 'hg rename', but Mercurial | |
| 171 saves Unix-trained fingers from typing embarrassment.) | |
| 172 | |
| 173 At this point you may be wondering about directories. 'hg mkdir' | |
| 174 perhaps? Well, no. Mercurial only tracks files. To be sure, the | |
| 175 directory a file occupies is tracked, but effectively only as a | |
| 176 component of the file name. This has the slightly unexpected result | |
| 177 that you can't record an empty directory in your repository. | |
| 178 (Footnote: I tripped over this converting a work Subversion | |
| 179 repository. One possibility is to create a placemaker file in the | |
| 180 directory. In the event I created the directory (which receives build | |
| 181 products) as part of the build instead.) | |
| 182 | |
| 183 Given this, and the status output above that suggests strongly that | |
| 184 Mercurial treats a rename as a copy followed by a delete, you may be | |
| 185 worried that Mercurial won't cope at all well with rearranging your | |
| 186 repository. Relax. Mercurial does store the details of the rename as | |
| 187 part of the changeset, and copes very well with rearrangements. | |
| 188 | |
| 189 (Footnote: The Mercurial designers justify not dealing with | |
| 190 directories as first class objects by pointing out that provided you | |
| 191 can correctly move files about in the tree, the other reasons for | |
| 192 tracking directories are uncommon and do not in their opinion justify | |
| 193 the considerable added complexity. So far I've found no reason to | |
| 194 doubt that judgement.) | |
| 195 | |
| 196 Want to rewind the working copy to a previous revision? | |
| 197 | |
| 198 $ hg update -r 1 | |
| 199 1 files updated, 0 files merged, 1 files removed, 0 files unresolved | |
| 200 $ | |
| 201 | |
| 202 'hg update' updates the working files. In this case I'm specifying | |
| 203 that I want to go back to local changeset 1. I could also have typed | |
| 204 '-r 3d65e7a57890', or even '-r 3d'; when specifying the global change | |
| 205 identifier you only need to type enough digits to make it unique. | |
| 206 | |
| 207 This is all very well, but it's not exactly distributed, is it? | |
| 208 | |
| 209 Copy an existing repository: | |
| 210 | |
| 211 elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem | |
| 212 updating working directory | |
| 213 1 files updated, 0 files merged, 0 files removed, 0 files unresolved | |
| 214 | |
| 215 (You can access other repositories via the file system, over http or | |
| 216 over ssh). | |
| 217 | |
| 218 elsewhere$ cd Jim-Poem | |
| 219 elsewhere$ hg log | |
| 220 changeset: 3:a065eb26e6b9 | |
| 221 tag: tip | |
| 222 user: Jim Hague <jim.hague@acm.org> | |
| 223 date: Thu Apr 24 18:52:31 2008 +0100 | |
| 224 summary: Rename my file | |
| 225 | |
| 226 changeset: 2:ff97668b7422 | |
| 227 user: Jim Hague <jim.hague@acm.org> | |
| 228 date: Thu Apr 24 18:50:22 2008 +0100 | |
| 229 summary: Finished first verse | |
| 230 | |
| 231 changeset: 1:3d65e7a57890 | |
| 232 user: Jim Hague <jim.hague@acm.org> | |
| 233 date: Wed Apr 23 22:49:10 2008 +0100 | |
| 234 summary: A great second line | |
| 235 | |
| 236 changeset: 0:33596ef855c1 | |
| 237 user: Jim Hague <jim.hague@acm.org> | |
| 238 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 239 summary: My Pome | |
| 240 | |
| 241 'hg clone' is aptly named. It creates a new repository that contains | |
| 242 exactly the same changes as the source repository. You can make a | |
| 243 clone just by copying your project directory, if you're confident | |
| 244 nothing else will access it during the copy. 'hg clone' saves you this | |
| 245 worry, and sets the default push/pull location in the new repo to the | |
| 246 cloned repo. | |
| 247 | |
| 248 From that point, you use 'hg pull' to collect changes from other | |
| 249 places into your repo (though note it does not by default update your | |
| 250 working copy), and, as you might guess, 'hg push' shoves your changes | |
| 251 into a foreign repository. By default these will act on the repository | |
| 252 you cloned from, but you can specify any other repository. | |
| 253 | |
| 254 More on those in a moment. First, though, I want to show you something | |
| 255 you can't do in Subversion. Start with the repository with 4 changes | |
| 256 we just cloned. Let's focus on the first couple of lines. | |
| 257 | |
| 258 $ hg update -r 1 | |
| 259 1 files updated, 0 files merged, 1 files removed, 0 files unresolved | |
| 260 | |
| 261 And make a change. | |
| 262 | |
| 263 $ hg diff | |
| 264 diff -r 3d65e7a57890 pome.txt | |
| 265 --- a/pome.txt Wed Apr 23 22:49:10 2008 +0100 | |
| 266 +++ b/pome.txt Thu Apr 24 19:13:14 2008 +0100 | |
| 267 @@ -1,2 +1,2 @@ There was a gibbon one morning | |
| 268 -There was a gibbon one morning | |
| 269 -said "I think I will fly to the moon". | |
| 270 +There was a baboon who one afternoon | |
| 271 +said "I think I will fly to the sun". | |
| 272 $ hg commit -m "Better first two lines" | |
| 273 $ | |
| 274 | |
| 275 The alert among you will have sat up at that. Well done! Yes, there's | |
| 276 something very worrying. How can I commit a change at an old point? | |
| 277 If you try this in Subversion, it will complain mightily about your | |
| 278 file being out of date. But Mercurial just went ahead and did | |
| 279 something. The Bazaar experts among you will know that in Bazaar, if | |
| 280 you use 'bzr revert -r' to bring the working copy to a past revision, | |
| 281 make a change and commit, then your latest version will be the past | |
| 282 revision plus your change. Perhaps that's what Mercurial did? | |
| 283 | |
| 284 No. What Mercurial did is central to Mercurial's view of the | |
| 285 world. You took your working copy back to an old changeset, and the | |
| 286 committed a fresh change based at that changeset. Mercurial actually | |
| 287 did just what you asked it to do, no more and no less. Let's see the | |
| 288 initial evidence. | |
| 289 | |
| 290 $ hg heads | |
| 291 changeset: 4:267d32f158b3 | |
| 292 tag: tip | |
| 293 parent: 1:3d65e7a57890 | |
| 294 user: Jim Hague <jim.hague@acm.org> | |
| 295 date: Thu Apr 24 19:13:59 2008 +0100 | |
| 296 summary: Better first two lines | |
| 297 | |
| 298 changeset: 3:a065eb26e6b9 | |
| 299 user: Jim Hague <jim.hague@acm.org> | |
| 300 date: Thu Apr 24 18:52:31 2008 +0100 | |
| 301 summary: Rename my file | |
| 302 | |
| 303 $ | |
| 304 | |
| 305 Time for some more Mercurial terminology. You can think of a 'head' in | |
| 306 Mercurial as the most recent change on a branch. In Mercurial, a | |
| 307 branch is simply what happens when you commit a change that has as its | |
| 308 parent a change that already has a child. Mercurial has a standard | |
| 309 extension 'hg glog' which uses some ASCII art to show the current | |
| 310 state: | |
| 311 | |
| 312 $ hg glog | |
| 313 @ changeset: 4:267d32f158b3 | |
| 314 | tag: tip | |
| 315 | parent: 1:3d65e7a57890 | |
| 316 | user: Jim Hague <jim.hague@acm.org> | |
| 317 | date: Thu Apr 24 19:13:59 2008 +0100 | |
| 318 | summary: Better first two lines | |
| 319 | | |
| 320 | o changeset: 3:a065eb26e6b9 | |
| 321 | | user: Jim Hague <jim.hague@acm.org> | |
| 322 | | date: Thu Apr 24 18:52:31 2008 +0100 | |
| 323 | | summary: Rename my file | |
| 324 | | | |
| 325 | o changeset: 2:ff97668b7422 | |
| 326 |/ user: Jim Hague <jim.hague@acm.org> | |
| 327 | date: Thu Apr 24 18:50:22 2008 +0100 | |
| 328 | summary: Finished first verse | |
| 329 | | |
| 330 o changeset: 1:3d65e7a57890 | |
| 331 | user: Jim Hague <jim.hague@acm.org> | |
| 332 | date: Wed Apr 23 22:49:10 2008 +0100 | |
| 333 | summary: A great second line | |
| 334 | | |
| 335 o changeset: 0:33596ef855c1 | |
| 336 user: Jim Hague <jim.hague@acm.org> | |
| 337 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 338 summary: My Pome | |
| 339 | |
| 340 $ | |
| 341 | |
| 342 'hg view' shows a nicer graphical view. (Footnote: Though, being | |
| 343 Tcl/Tk based, not that much nicer.) | |
| 344 | |
| 345 So the change is in there. It's the latest change, and is simply on a | |
| 346 different branch to the other changes. | |
| 347 | |
| 348 Almost invariably, you will want to bring everything back together and | |
| 349 merge the branches. A merge is a change that combines two heads back | |
| 350 into one. It prepares an updated working directory with the merged | |
| 351 contents of the two heads for you to review and, if satisfactory, commit. | |
| 352 | |
| 353 $ hg merge | |
| 354 merging pome.txt and poem.txt | |
| 355 0 files updated, 1 files merged, 0 files removed, 0 files unresolved | |
| 356 (branch merge, don't forget to commit) | |
| 357 $ cat poem.txt | |
| 358 There was a baboon who one afternoon | |
| 359 said "I think I will fly to the sun". | |
| 360 So with two great palms strapped to his arms, | |
| 361 he started his takeoff run. | |
| 362 $ hg commit -m "Merge first line branch" | |
| 363 $ | |
| 364 | |
| 365 (Footnote: I'm no poet. The poem is, of course, 'Silly Old Baboon' by | |
| 366 the late, great, Spike Milligan.) | |
| 367 | |
| 368 Here's the ASCII art again showing what just happened. Oh, and notice | |
| 369 that Mercurial has done the right thing with regard to the rename. | |
| 370 | |
| 371 $ hg glog | |
| 372 @ changeset: 5:792ab970fc80 | |
| 373 |\ tag: tip | |
| 374 | | parent: 4:267d32f158b3 | |
| 375 | | parent: 3:a065eb26e6b9 | |
| 376 | | user: Jim Hague <jim.hague@acm.org> | |
| 377 | | date: Thu Apr 24 19:29:53 2008 +0100 | |
| 378 | | summary: Merge first line branch | |
| 379 | | | |
| 380 | o changeset: 4:267d32f158b3 | |
| 381 | | parent: 1:3d65e7a57890 | |
| 382 | | user: Jim Hague <jim.hague@acm.org> | |
| 383 | | date: Thu Apr 24 19:13:59 2008 +0100 | |
| 384 | | summary: Better first two lines | |
| 385 | | | |
| 386 o | changeset: 3:a065eb26e6b9 | |
| 387 | | user: Jim Hague <jim.hague@acm.org> | |
| 388 | | date: Thu Apr 24 18:52:31 2008 +0100 | |
| 389 | | summary: Rename my file | |
| 390 | | | |
| 391 o | changeset: 2:ff97668b7422 | |
| 392 |/ user: Jim Hague <jim.hague@acm.org> | |
| 393 | date: Thu Apr 24 18:50:22 2008 +0100 | |
| 394 | summary: Finished first verse | |
| 395 | | |
| 396 o changeset: 1:3d65e7a57890 | |
| 397 | user: Jim Hague <jim.hague@acm.org> | |
| 398 | date: Wed Apr 23 22:49:10 2008 +0100 | |
| 399 | summary: A great second line | |
| 400 | | |
| 401 o changeset: 0:33596ef855c1 | |
| 402 user: Jim Hague <jim.hague@acm.org> | |
| 403 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 404 summary: My Pome | |
| 405 | |
| 406 $ | |
| 407 | |
| 408 So, our little branch change has now been merged back, and we have a | |
| 409 single line of development again. Notice that unlike the other | |
| 410 changesets, changeset 5 has two parent changesets, indicating it is a | |
| 411 merge changeset. You can only merge two branches in one operation; or | |
| 412 putting it another way, a changeset can have a maximum of two parents. | |
| 413 | |
| 414 This behaviour is absolutely central to Mercurial's philosophy. If a | |
| 415 change is committed that takes as its starting point a change that | |
| 416 already has a child, then a branch gets created. Working with | |
| 417 Mercurial, branches get created frequently, and equally frequently | |
| 418 merged back. As befits any frequent operation, both are easy to do. | |
| 419 | |
| 420 You're probably thinking at this point that this making a commit onto | |
| 421 an old version is a slightly strange thing to do, and you'd be right. | |
| 422 But that's exactly what's going to happen the moment you go | |
| 423 distributed. Two people working independently with their own | |
| 424 repositories are going to make commits based, typically, on the latest | |
| 425 changes they happen to have incorporated into their tree. To be | |
| 426 Distributed, a DVCS has to deal with this. Mercurial faces it head-on. | |
| 427 When you pull changes into your repo (or someone else pushes them), if | |
| 428 any of the changes overlap - are both based on the same base change - | |
| 429 you get extra heads, and it's up to you to let these extra heads live | |
| 430 or merge, as you please. | |
| 431 | |
| 432 In practice this is more manageable then you might think. Consider a | |
| 433 typical Mercurial usage, where the 'master' repo sits on a known | |
| 434 server, and everyone pulls changes from the master and pushes their | |
| 435 own efforts the master. But default Mercurial won't let you push if | |
| 436 the receiving repo will gain an extra head as a result, so you | |
| 437 typically pull (and do any required merging) just before | |
| 438 pushing. Subversion users will recognised this pattern. Subversion | |
| 439 won't let you commit a change if your working copy is not at the very | |
| 440 latest revision, so the Subversion user will update, and merge if | |
| 441 necessary, just before committing. | |
| 442 | |
| 443 What, then, about a branch in the conventional sense of '1.0 | |
| 444 maintenance branch'? Typically in Mercurial you'd handle this by | |
| 445 keeping a separate cloned repository for those changes. Cloning is | |
| 446 fast, and if local uses hard links where possible on filesystems that | |
| 447 support them, so isn't necessarily extravagant on disc space. You can, | |
| 448 if you prefer, handle them all in a single repo with 'named | |
| 449 branches', but cloning is definitely simpler. | |
| 450 | |
| 451 OK, so now you know the basics of using Mercurial. We can proceed to | |
| 452 looking at how this magic is achieved. In particular, where does this | |
| 453 magic globally unique identifier for a change come from? | |
| 454 | |
| 455 Inside the Mercurial repo | |
| 456 ------------------------- | |
| 457 | |
| 458 The way Mercurial handles its repo is really quite simple. | |
| 459 | |
| 460 That's simple, as in 'most things are simple once you know the | |
| 461 answer'. I found the explanation helpful, so this section attempts | |
| 462 the 10,000ft (FL100 if you prefer) view of Mercurial. | |
| 463 | |
| 464 (Foornote: Bryan O'Sullivan's excellent Mercurial book has a chapter | |
| 465 on the subject, and the Mercurial website has a fair amount of detail | |
| 466 too. This is 'research', OK?) | |
| 467 | |
| 468 First remember that any file or component can only have one or two | |
| 469 parents. You can't merge more than one other branch at once. | |
| 470 | |
| 471 We start with the basic building block, which Mercurial calls a | |
| 472 revlog. A revlog is a thing that holds a file and all the changes in | |
| 473 the file history. (Footnote: For any non-trivial file, this will | |
| 474 actually be two files on the disc, a data file and an index). The | |
| 475 revlog stores the (compressed) differences between successive versions | |
| 476 of the file, though it will periodically store a complete version of | |
| 477 the file instead of a difference, so that the content of any | |
| 478 particular file version can always be reconstructed without excessive | |
| 479 effort. | |
| 480 | |
| 481 Under the secret-squirrel Mercurial .hg directory at the top of your | |
| 482 project is a store which holds a revlog for each file in your project. | |
| 483 | |
| 484 Any point in the evolution of a revlog can be uniquely identified with | |
| 485 a nodeid. This is simply the SHA1 hash of the current file contents | |
| 486 concatenated with the nodeids of one or both parents of the current | |
| 487 revision. Note that this way, two file states are identical if and | |
| 488 only if the file contents are the same *and* the file has the | |
| 489 same history. | |
| 490 | |
| 491 Here's a dump of a revlog index: | |
| 492 | |
| 493 $ hg debugindex .hg/store/data/pome.txt.i | |
| 494 rev offset length base linkrev nodeid p1 p2 | |
| 495 0 0 32 0 0 6bbbd5d6cc53 000000000000 000000000000 | |
| 496 1 32 51 0 1 83d266583303 6bbbd5d6cc53 000000000000 | |
| 497 2 83 84 0 2 14a54ec34bb6 83d266583303 000000000000 | |
| 498 3 167 76 3 4 dc4df776b38b 83d266583303 000000000000 | |
| 499 $ | |
| 500 | |
| 501 Note here that a file state can have two parents. If both the parent | |
| 502 nodeids are non-null, the file state has two parents, and the state is | |
| 503 therefore the result of a merge. | |
| 504 | |
| 505 Let's dump out a revlog at a particular revision: | |
| 506 | |
| 507 $ hg debugdata .hg/store/data/pome.txt.i 2 | |
| 508 There was a gibbon one morning | |
| 509 said "I think I will fly to the moon". | |
| 510 So with two great palms strapped to his arms, | |
| 511 he started his takeoff run. | |
| 512 $ | |
| 513 | |
| 514 The next component is the manifest. This is simply a list of all the | |
| 515 files in the project, together with their current nodeids. The | |
| 516 manifest is a file, held in a revlog. The nodeid of the manifest, | |
| 517 therefore, identifies the project filesystem at a particular point. | |
| 518 | |
| 519 $ hg debugdata .hg/store/00manifest.i 5 | |
| 520 poem.txt5168b1a5e2f44aa4e0f164e170820845183f50c8 | |
| 521 $ | |
| 522 | |
| 523 Finally we have the changeset. This is the atomic collection of | |
| 524 changes to a repository that leads to a new revision. The changeset | |
| 525 info includes the nodeid of the corresponding manifest, the timestamp | |
| 526 and committer ID, a list of changed files and a comment. The changeset | |
| 527 also includes the nodeid of the parent changeset, or the two parents | |
| 528 if the change is a merge. The changeset description is held in a | |
| 529 revlog, the changelog. | |
| 530 | |
| 531 $ hg debugdata .hg/store/00changelog.i 5 | |
| 532 1ccc11b6f7308cc8fa1573c2f3811a4710c91e3e | |
| 533 Jim Hague <jim.hague@acm.org> | |
| 534 1209061793 -3600 | |
| 535 poem.txt | |
| 536 pome.txt | |
| 537 | |
| 538 Merge first line branch | |
| 539 $ | |
| 540 | |
| 541 The nodeid of the changeset, therefore, gives us a globally unique | |
| 542 identifier for any particular change. Changesets have a | |
| 543 Subversion-like incrementing change number, but it is peculiar to that | |
| 544 repository. The nodeid, however, is global. | |
| 545 | |
| 546 One more detail remains to complete the picture. How do we get back | |
| 547 from a particular file change to find the responsible changeset? Each | |
| 548 revlog change has a linkrev entry that does just this. | |
| 549 | |
| 550 So, now we have a repository with a history of the changes applied to | |
| 551 that repository. Each change has a unique identifier. If we find that | |
| 552 change in another repository, it means that at the point in the other | |
| 553 repository we have exactly the same state; the file contents and | |
| 554 history are identical. | |
| 555 | |
| 556 At this point we can see how pulling changes from another repository | |
| 557 works. Mercurial has to determine which changesets in the source | |
| 558 repository are missing in the target repository. To do this, for each | |
| 559 head in the source repo it has to find the most recent change in that | |
| 560 head that it already present in the target repo, and get any remaining | |
| 561 changes after that point. These changes are then copied over and | |
| 562 applied. | |
| 563 | |
| 564 The Mercurial revlog format has proved remarkably durable. Over the | |
| 565 lifetime of Mercurial, there have been just two changes to the file | |
| 566 format. And one of those (a very recently change at the time of | |
| 567 writing, yet to appear in a release version) is a very small change to | |
| 568 filename storage required to deal with Windows-specific issues. |
