Mercurial > CVu-Mercurial
comparison Hg.tex @ 9:2155510c62f3
A version formatted with Latex. And spellchecked.
| author | Jim Hague <jim.hague@acm.org> |
|---|---|
| date | Fri, 22 May 2009 10:23:40 +0100 |
| parents | |
| children | 2e4d690ffabb |
comparison
equal
deleted
inserted
replaced
| 8:abca12aaa38d | 9:2155510c62f3 |
|---|---|
| 1 \documentclass[a4paper]{article} | |
| 2 \usepackage{pslatex} | |
| 3 \usepackage{url} | |
| 4 | |
| 5 \newcommand{\standout}[1]{ | |
| 6 {\begin{center} \large \textbf{#1} \end{center}} | |
| 7 } | |
| 8 | |
| 9 \setlength{\parskip}{2mm} | |
| 10 \setlength{\parindent}{0mm} | |
| 11 | |
| 12 \begin{document} | |
| 13 \title{Inside a distributed version control system} | |
| 14 \author{Jim Hague\\ | |
| 15 \texttt{jim.hague@acm.org}} | |
| 16 \date{May 2009} | |
| 17 \maketitle | |
| 18 | |
| 19 \section{Preamble} | |
| 20 Grinton Lodge is a Youth Hostel that sits on an exposed hillside just | |
| 21 above the small hamlet of Grinton in Swaledale, in the Yorkshire Dales | |
| 22 National Park. A former Victorian shooting lodge, it now welcomes | |
| 23 walkers and other travellers from around the world. | |
| 24 | |
| 25 Tonight, a Wednesday in mid-November, is not one of its busiest | |
| 26 nights. Kat, the duty staff member, tells me that there is a small | |
| 27 corporate team-building group in the annex. There's no sign of them at | |
| 28 present. Otherwise, that portion of the world that has beaten a path | |
| 29 to the door of this grand building today consists of just me. And Kat | |
| 30 goes home soon. | |
| 31 | |
| 32 The November CVu, removed from its wrappers and read yesterday, lies | |
| 33 in my bag. Taunting me. Go on, it says, if you've ever going to put | |
| 34 finger to keyboard in the name of CVu, well, tonight you are out of | |
| 35 excuses. | |
| 36 | |
| 37 Bugger. | |
| 38 | |
| 39 \section{Let's look into Mercurial} | |
| 40 If you're at all interested in version control systems~--- and any | |
| 41 software developer not using one daily is a strange beast indeed~--- | |
| 42 you'll at least have become vaguely aware in the last few years of the | |
| 43 growing maturity of the latest group of version control systems | |
| 44 offering funky new stuff. These are the distributed version control | |
| 45 systems (DVCS). There is more to them than just their headline | |
| 46 attributes, being able to check history and do checkins while | |
| 47 disconnected from a central server, but these are damm useful to start | |
| 48 with. | |
| 49 | |
| 50 When I first heard about DVCS, it wasn't immediately obvious to me (to | |
| 51 put it mildly) how they would work. After years of using a centralised | |
| 52 version control system, I had rough mental model of what went on. But | |
| 53 how do you cope without the central server forcing ordering onto the | |
| 54 changes? | |
| 55 | |
| 56 Since then I've started using Mercurial\footnote{ | |
| 57 \url{http://www.selenic.com/mercurial}}. | |
| 58 Mercurial is a DVCS. It's one of | |
| 59 three DVCSs that have gained significant popularity in the last few | |
| 60 years, the other two being Git\footnote{\url{http://git-scm.com}} and | |
| 61 Bazaar\footnote{\url{http://bazaar-vcs.org/}}. | |
| 62 I switched a significant work project over | |
| 63 to Mercurial (from Subversion) in mid-2007, because a customer site | |
| 64 required on-site work but could not allow access back to the company | |
| 65 VPN. I chose Mercurial for a variety of reasons which I won't bore you | |
| 66 with here\footnote{ | |
| 67 OK, if you must know: | |
| 68 \begin{itemize} | |
| 69 \item Implementability. I needed the system to work on Windows, Linux and | |
| 70 AIX. The latter was not one of the directly supported platforms for | |
| 71 any of the candidates. Git's implementation uses a horde of | |
| 72 tools. Bazaar requires only Python, but required Python 2.4 while IBM | |
| 73 stubbornly still supplies only Python 2.3. Mercurial requires Python | |
| 74 2.3 or greater, and uses some C for speed. | |
| 75 \item Simplicity. My users used Subversion daily, but did not generally | |
| 76 have much experience with other VCS. From the command line, | |
| 77 Mercurial's core operations will be familiar to a Subversion | |
| 78 user. This is also true of Bazaar, but was less true of Git. Git has | |
| 79 improved in this matter since then, but a Mr Winder of this parish | |
| 80 tells me that it's still possible to seriously embarrass | |
| 81 yourself. There was also a lack of Windows support for Git at the | |
| 82 time. | |
| 83 \item Speed. Mercurial is fast. In the same ballpark as Git. Bazaar | |
| 84 wasn't, and although it has improved significantly, has, in my | |
| 85 estimation, added user complexity in the process, and at the time | |
| 86 of writing is still off the pace for some operations. | |
| 87 \item Documentation. At the time, Bryan O'Sullivan's excellent Mercurial | |
| 88 book (\url{http://hgbook.red-bean.com}) was a clear winner for best | |
| 89 documentation. | |
| 90 \end{itemize}}. | |
| 91 | |
| 92 What I want to do in this article is give you an insight into how a | |
| 93 DVCS works. OK, so specifically I'm going to be talking about | |
| 94 Mercurial, but Git and Bazaar attack the problem in a similar way. But | |
| 95 first I'd better give you some idea of how you use Mercurial. | |
| 96 | |
| 97 \subsection{The 5 minute Mercurial overview} | |
| 98 \subsubsection{The basics} | |
| 99 I think it unlikely that someone possessing the taste and discernment | |
| 100 to be reading CVu would not be familiar with at least one version | |
| 101 control system. So, while I want to give you a flavour of what it's | |
| 102 like to use, I'm not going to hang about. If you'd like a proper | |
| 103 introduction, or you don't follow something, I thoroughly recommend | |
| 104 you consult the Mercurial book. | |
| 105 | |
| 106 To start using Mercurial to keep track of a project. | |
| 107 | |
| 108 \begin{verbatim} | |
| 109 $ hg init | |
| 110 $ | |
| 111 \end{verbatim} | |
| 112 | |
| 113 This creates the repository root in the current directory. | |
| 114 | |
| 115 Like CVS\footnote{\url{http://www.nongnu.org/cvs/}} | |
| 116 with its \texttt{CVS} directory and | |
| 117 Subversion\footnote{\url{http://subversion.tigris.org/}} | |
| 118 with its \texttt{.svn} | |
| 119 directory, Mercurial keeps its private data in a directory. Mercifully there is | |
| 120 only one of these, in the top level of your project. And rather than | |
| 121 holding details of where the actual repository is to be found, the \texttt{.hg} | |
| 122 directory holds the entire repository. | |
| 123 | |
| 124 Next you need to specify the files you want Mercurial to track. | |
| 125 | |
| 126 \begin{verbatim} | |
| 127 $ echo "There was a gibbon one morning" > pome.txt | |
| 128 $ hg add pome.txt | |
| 129 $ | |
| 130 \end{verbatim} | |
| 131 | |
| 132 As you might expect, this marks the files as to be added. And as you | |
| 133 might also expect, you need to commit to record the added files in the | |
| 134 repository. The commit comment can be supplied on the command line; if | |
| 135 you don't supply a comment, you'll be dropped into an editor to | |
| 136 provide one. | |
| 137 | |
| 138 There is a suggested format for these messages~--- a one line summary | |
| 139 followed by any more required detail on following lines. By default | |
| 140 Mercurial will only display the first line of commit messages when | |
| 141 listing changes. In these examples I'll stick to terse messages, and | |
| 142 I'll enter them from the command line. | |
| 143 | |
| 144 \begin{verbatim} | |
| 145 $ hg commit -m "My Pome" -u "Jim Hague <jim.hague@acm.org>" | |
| 146 $ | |
| 147 \end{verbatim} | |
| 148 | |
| 149 Mercurial records the user making the change as part of the change | |
| 150 information. It is usual to give your name and email address as I've | |
| 151 done here. You can imagine, though, that constantly having to repeat | |
| 152 this is a bit tedious, so you can set a default user name in a | |
| 153 configuration file. Mercurial keeps global, user and repository | |
| 154 configurations, and it can go in any of those. | |
| 155 | |
| 156 As with Subversion, after further edits you see how your working copy | |
| 157 differs from the repository. | |
| 158 | |
| 159 \begin{verbatim} | |
| 160 $ hg status | |
| 161 M pome.txt | |
| 162 $ hg diff | |
| 163 diff -r 33596ef855c1 pome.txt | |
| 164 --- a/pome.txt Wed Apr 23 22:36:33 2008 +0100 | |
| 165 +++ b/pome.txt Wed Apr 23 22:48:01 2008 +0100 | |
| 166 @@ -1,1 +1,2 @@ There was a gibbon one morning | |
| 167 There was a gibbon one morning | |
| 168 +said "I think I will fly to the moon". | |
| 169 $ hg commit -m "A great second line" | |
| 170 $ | |
| 171 \end{verbatim} | |
| 172 | |
| 173 And look through a log of changes. | |
| 174 | |
| 175 \begin{verbatim} | |
| 176 $ hg log | |
| 177 changeset: 1:3d65e7a57890 | |
| 178 tag: tip | |
| 179 user: Jim Hague <jim.hague@acm.org> | |
| 180 date: Wed Apr 23 22:49:10 2008 +0100 | |
| 181 summary: A great second line | |
| 182 | |
| 183 changeset: 0:33596ef855c1 | |
| 184 user: Jim Hague <jim.hague@acm.org> | |
| 185 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 186 summary: My Pome | |
| 187 | |
| 188 $ | |
| 189 \end{verbatim} | |
| 190 | |
| 191 There are some items here that need an explanation. | |
| 192 | |
| 193 The changeset identifier is in fact two identifiers separated by a | |
| 194 colon. The first is the sequence number of the changeset in the | |
| 195 repository, and is directly comparable to the change number in a | |
| 196 Subversion repository. The second is a globally unique identifier for | |
| 197 that change. As the change is copied from one repository to another | |
| 198 (this is a distributed system, remember, even if we haven't come to | |
| 199 that bit yet), its sequence number in any particular repository will | |
| 200 change, but the global identifier will always remain the same. | |
| 201 | |
| 202 \texttt{tip} is a Mercurial term. It means simply the most recent change. | |
| 203 | |
| 204 Want to rename a file? | |
| 205 | |
| 206 \begin{verbatim} | |
| 207 $ hg mv pome.txt poem.txt | |
| 208 $ hg status | |
| 209 A poem.txt | |
| 210 R pome.txt | |
| 211 $ hg commit -m "Rename my file" | |
| 212 $ | |
| 213 \end{verbatim} | |
| 214 (The command to rename a file is actually \texttt{hg rename}, | |
| 215 but Mercurial saves Unix-trained fingers from | |
| 216 typing embarrassment.) | |
| 217 | |
| 218 At this point you may be wondering about directories. \texttt{hg mkdir} | |
| 219 perhaps? Well, no. Mercurial only tracks files. To be sure, the | |
| 220 directory a file occupies is tracked, but effectively only as a | |
| 221 component of the file name. This has the slightly unexpected result | |
| 222 that you can't record an empty directory in your repository.\footnote{ | |
| 223 I tripped over this converting a work Subversion | |
| 224 repository. One possibility is to create a placeholder file in the | |
| 225 directory. In the event I created the directory (which receives build | |
| 226 products) as part of the build instead.} | |
| 227 | |
| 228 Given this, and the status output above that suggests strongly that | |
| 229 Mercurial treats a rename as a copy followed by a delete, you may be | |
| 230 worried that Mercurial won't cope at all well with rearranging your | |
| 231 repository. Relax. Mercurial does store the details of the rename as | |
| 232 part of the changeset, and copes very well with rearrangements\footnote{ | |
| 233 The Mercurial designers justify not dealing with | |
| 234 directories as first class objects by pointing out that provided you | |
| 235 can correctly move files about in the tree, the other reasons for | |
| 236 tracking directories are uncommon and do not in their opinion justify | |
| 237 the considerable added complexity. So far I've found no reason to | |
| 238 doubt that judgement.}. | |
| 239 | |
| 240 Want to rewind the working copy to a previous revision? | |
| 241 | |
| 242 \begin{verbatim} | |
| 243 $ hg update -r 1 | |
| 244 1 files updated, 0 files merged, 1 files removed, 0 files unresolved | |
| 245 $ | |
| 246 \end{verbatim} | |
| 247 | |
| 248 \texttt{hg update} updates the working files. In this case I'm specifying | |
| 249 that I want to go back to local changeset 1. I could also have typed | |
| 250 \texttt{-r 3d65e7a57890}, or even \texttt{-r 3d}; | |
| 251 when specifying the global change | |
| 252 identifier you only need to type enough digits to make it unique. | |
| 253 | |
| 254 This is all very well, but it's not exactly distributed, is it? | |
| 255 | |
| 256 \subsubsection{Going distributed} | |
| 257 A version control system goes Distributed by allowing multiple copies | |
| 258 of the repository to exist, and work to be done in all those | |
| 259 repositories in parallel. So when you start work on an existing | |
| 260 project, the first thing to do is to get your own copy of the | |
| 261 repository. | |
| 262 | |
| 263 \begin{verbatim} | |
| 264 elsewhere$ hg clone ssh://jim.home.net/Poem Jim-Poem | |
| 265 updating working directory | |
| 266 1 files updated, 0 files merged, 0 files removed, 0 files unresolved | |
| 267 \end{verbatim} | |
| 268 | |
| 269 Mercurial lets you access other repositories via the file system, over http or | |
| 270 over ssh. | |
| 271 | |
| 272 \begin{verbatim} | |
| 273 elsewhere$ cd Jim-Poem | |
| 274 elsewhere$ hg log | |
| 275 changeset: 3:a065eb26e6b9 | |
| 276 tag: tip | |
| 277 user: Jim Hague <jim.hague@acm.org> | |
| 278 date: Thu Apr 24 18:52:31 2008 +0100 | |
| 279 summary: Rename my file | |
| 280 | |
| 281 changeset: 2:ff97668b7422 | |
| 282 user: Jim Hague <jim.hague@acm.org> | |
| 283 date: Thu Apr 24 18:50:22 2008 +0100 | |
| 284 summary: Finished first verse | |
| 285 | |
| 286 changeset: 1:3d65e7a57890 | |
| 287 user: Jim Hague <jim.hague@acm.org> | |
| 288 date: Wed Apr 23 22:49:10 2008 +0100 | |
| 289 summary: A great second line | |
| 290 | |
| 291 changeset: 0:33596ef855c1 | |
| 292 user: Jim Hague <jim.hague@acm.org> | |
| 293 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 294 summary: My Pome | |
| 295 | |
| 296 $ | |
| 297 \end{verbatim} | |
| 298 | |
| 299 \texttt{hg clone} is aptly named. It creates a new repository that contains | |
| 300 exactly the same changes as the source repository. You can make a | |
| 301 clone just by copying your project directory, if you're confident | |
| 302 nothing else will access it during the copy. \texttt{hg clone} saves you this | |
| 303 worry, and sets the default push/pull location in the new repo to the | |
| 304 cloned repo. | |
| 305 | |
| 306 From that point, you use \texttt{hg pull} to collect changes from other | |
| 307 places into your repo (though note it does not by default update your | |
| 308 working copy), and, as you might guess, \texttt{hg push} shoves your changes | |
| 309 into a foreign repository. By default these will act on the repository | |
| 310 you cloned from, but you can specify any other repository. | |
| 311 | |
| 312 More on those in a moment. First, though, I want to show you something | |
| 313 you can't do in Subversion. Start with the repository with 4 changes | |
| 314 we just cloned. I want to focus on the first couple of lines, so I'll | |
| 315 wind the working copy back to the point where only those lines exist. | |
| 316 | |
| 317 \begin{verbatim} | |
| 318 $ hg update -r 1 | |
| 319 1 files updated, 0 files merged, 1 files removed, 0 files unresolved | |
| 320 $ | |
| 321 \end{verbatim} | |
| 322 | |
| 323 And make a change. | |
| 324 | |
| 325 \begin{verbatim} | |
| 326 $ hg diff | |
| 327 diff -r 3d65e7a57890 pome.txt | |
| 328 --- a/pome.txt Wed Apr 23 22:49:10 2008 +0100 | |
| 329 +++ b/pome.txt Thu Apr 24 19:13:14 2008 +0100 | |
| 330 @@ -1,2 +1,2 @@ There was a gibbon one morning | |
| 331 -There was a gibbon one morning | |
| 332 -said "I think I will fly to the moon". | |
| 333 +There was a baboon who one afternoon | |
| 334 +said "I think I will fly to the sun". | |
| 335 $ hg commit -m "Better first two lines" | |
| 336 $ | |
| 337 \end{verbatim} | |
| 338 | |
| 339 The alert among you will have sat up at that. Well done! Yes, there's | |
| 340 something very worrying. How can I commit a change at an old point? | |
| 341 If you try this in Subversion, it will complain mightily about your | |
| 342 file being out of date. But Mercurial just went ahead and did | |
| 343 something. The Bazaar experts among you will know that in Bazaar, if | |
| 344 you use \texttt{bzr revert -r} to bring the working copy to a past revision, | |
| 345 make a change and commit, then your latest version will be the past | |
| 346 revision plus your change. Perhaps that's what Mercurial did? | |
| 347 | |
| 348 No. What Mercurial did is central to Mercurial's view of the | |
| 349 world. You took your working copy back to an old changeset, and then | |
| 350 committed a fresh change based at that changeset. Mercurial actually | |
| 351 did just what you asked it to do, no more and no less. Let's see the | |
| 352 initial evidence. | |
| 353 | |
| 354 \begin{verbatim} | |
| 355 $ hg heads | |
| 356 changeset: 4:267d32f158b3 | |
| 357 tag: tip | |
| 358 parent: 1:3d65e7a57890 | |
| 359 user: Jim Hague <jim.hague@acm.org> | |
| 360 date: Thu Apr 24 19:13:59 2008 +0100 | |
| 361 summary: Better first two lines | |
| 362 | |
| 363 changeset: 3:a065eb26e6b9 | |
| 364 user: Jim Hague <jim.hague@acm.org> | |
| 365 date: Thu Apr 24 18:52:31 2008 +0100 | |
| 366 summary: Rename my file | |
| 367 | |
| 368 $ | |
| 369 \end{verbatim} | |
| 370 | |
| 371 Time for some more Mercurial terminology. You can think of a \texttt{head} in | |
| 372 Mercurial as the most recent change on a branch. In Mercurial, a | |
| 373 branch is simply what happens when you commit a change that has as its | |
| 374 parent a change that already has a child. Mercurial has a standard | |
| 375 extension \texttt{hg glog} which uses some ASCII art to show the current | |
| 376 state: | |
| 377 | |
| 378 \begin{verbatim} | |
| 379 $ hg glog | |
| 380 @ changeset: 4:267d32f158b3 | |
| 381 | tag: tip | |
| 382 | parent: 1:3d65e7a57890 | |
| 383 | user: Jim Hague <jim.hague@acm.org> | |
| 384 | date: Thu Apr 24 19:13:59 2008 +0100 | |
| 385 | summary: Better first two lines | |
| 386 | | |
| 387 | o changeset: 3:a065eb26e6b9 | |
| 388 | | user: Jim Hague <jim.hague@acm.org> | |
| 389 | | date: Thu Apr 24 18:52:31 2008 +0100 | |
| 390 | | summary: Rename my file | |
| 391 | | | |
| 392 | o changeset: 2:ff97668b7422 | |
| 393 |/ user: Jim Hague <jim.hague@acm.org> | |
| 394 | date: Thu Apr 24 18:50:22 2008 +0100 | |
| 395 | summary: Finished first verse | |
| 396 | | |
| 397 o changeset: 1:3d65e7a57890 | |
| 398 | user: Jim Hague <jim.hague@acm.org> | |
| 399 | date: Wed Apr 23 22:49:10 2008 +0100 | |
| 400 | summary: A great second line | |
| 401 | | |
| 402 o changeset: 0:33596ef855c1 | |
| 403 user: Jim Hague <jim.hague@acm.org> | |
| 404 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 405 summary: My Pome | |
| 406 | |
| 407 $ | |
| 408 \end{verbatim} | |
| 409 | |
| 410 \texttt{hg view} shows a nicer graphical view\footnote{Though, being | |
| 411 Tcl/Tk based, not that much nicer.}. | |
| 412 | |
| 413 So the change is in there. It's the latest change, and is simply on a | |
| 414 different branch to the other changes. | |
| 415 | |
| 416 Almost invariably, you will want to bring everything back together and | |
| 417 merge the branches. A merge is a change that combines two heads back | |
| 418 into one. It prepares an updated working directory with the merged | |
| 419 contents of the two heads for you to review and, if satisfactory, | |
| 420 commit. | |
| 421 | |
| 422 \begin{verbatim} | |
| 423 $ hg merge | |
| 424 merging pome.txt and poem.txt | |
| 425 0 files updated, 1 files merged, 0 files removed, 0 files unresolved | |
| 426 (branch merge, don't forget to commit) | |
| 427 $ cat poem.txt | |
| 428 There was a baboon who one afternoon | |
| 429 said "I think I will fly to the sun". | |
| 430 So with two great palms strapped to his arms, | |
| 431 he started his takeoff run. | |
| 432 $ hg commit -m "Merge first line branch" | |
| 433 $ | |
| 434 \end{verbatim} | |
| 435 | |
| 436 (I'm no poet. The poem is, of | |
| 437 course, \textit{Silly Old Baboon} by the late, great, Spike | |
| 438 Milligan. From \textit{A Book of Milliganimals}, Puffin, 1971.) | |
| 439 | |
| 440 Here's the ASCII art again showing what just happened. | |
| 441 Oh, and notice in the above that Mercurial has done the | |
| 442 right thing with regard to the rename. | |
| 443 | |
| 444 \begin{verbatim} | |
| 445 $ hg glog | |
| 446 @ changeset: 5:792ab970fc80 | |
| 447 |\ tag: tip | |
| 448 | | parent: 4:267d32f158b3 | |
| 449 | | parent: 3:a065eb26e6b9 | |
| 450 | | user: Jim Hague <jim.hague@acm.org> | |
| 451 | | date: Thu Apr 24 19:29:53 2008 +0100 | |
| 452 | | summary: Merge first line branch | |
| 453 | | | |
| 454 | o changeset: 4:267d32f158b3 | |
| 455 | | parent: 1:3d65e7a57890 | |
| 456 | | user: Jim Hague <jim.hague@acm.org> | |
| 457 | | date: Thu Apr 24 19:13:59 2008 +0100 | |
| 458 | | summary: Better first two lines | |
| 459 | | | |
| 460 o | changeset: 3:a065eb26e6b9 | |
| 461 | | user: Jim Hague <jim.hague@acm.org> | |
| 462 | | date: Thu Apr 24 18:52:31 2008 +0100 | |
| 463 | | summary: Rename my file | |
| 464 | | | |
| 465 o | changeset: 2:ff97668b7422 | |
| 466 |/ user: Jim Hague <jim.hague@acm.org> | |
| 467 | date: Thu Apr 24 18:50:22 2008 +0100 | |
| 468 | summary: Finished first verse | |
| 469 | | |
| 470 o changeset: 1:3d65e7a57890 | |
| 471 | user: Jim Hague <jim.hague@acm.org> | |
| 472 | date: Wed Apr 23 22:49:10 2008 +0100 | |
| 473 | summary: A great second line | |
| 474 | | |
| 475 o changeset: 0:33596ef855c1 | |
| 476 user: Jim Hague <jim.hague@acm.org> | |
| 477 date: Wed Apr 23 22:36:33 2008 +0100 | |
| 478 summary: My Pome | |
| 479 | |
| 480 $ | |
| 481 \end{verbatim} | |
| 482 | |
| 483 So, our little branch change has now been merged back, and we have a | |
| 484 single line of development again. Notice that unlike the other | |
| 485 changesets, changeset 5 has two parent changesets, indicating it is a | |
| 486 merge changeset. You can only merge two branches in one operation; or | |
| 487 putting it another way, a changeset can have a maximum of two parents. | |
| 488 | |
| 489 This behaviour is absolutely central to Mercurial's philosophy. If a | |
| 490 change is committed that takes as its starting point a change that | |
| 491 already has a child, then a branch gets created. Working with | |
| 492 Mercurial, branches get created frequently, and equally frequently | |
| 493 merged back. As befits any frequent operation, both are easy to do. | |
| 494 | |
| 495 You're probably thinking at this point that this making a commit onto | |
| 496 an old version is a slightly strange thing to do, and you'd be right. | |
| 497 But that's exactly what's going to happen the moment you go | |
| 498 distributed. Two people working independently with their own | |
| 499 repositories are going to make commits based, typically, on the latest | |
| 500 changes they happen to have incorporated into their tree. To be | |
| 501 Distributed, a DVCS has to deal with this. Mercurial faces it head-on. | |
| 502 When you pull changes into your repo (or someone else pushes them), if | |
| 503 any of the changes overlap~--- are both based on the same base change~--- | |
| 504 you get extra heads, and it's up to you to let these extra heads live | |
| 505 or merge, as you please. | |
| 506 | |
| 507 In practice this is more manageable then you might think. Consider a | |
| 508 typical Mercurial usage, where the 'master' repo sits on a known | |
| 509 server, and everyone pulls changes from the master and pushes their | |
| 510 own efforts to the master. But default Mercurial won't let you push if | |
| 511 the receiving repo will gain an extra head as a result, so you | |
| 512 typically pull (and do any required merging) just before | |
| 513 pushing. Subversion users will recognised this pattern. Subversion | |
| 514 won't let you commit a change if your working copy is not at the very | |
| 515 latest revision, so the Subversion user will update, and merge if | |
| 516 necessary, just before committing. | |
| 517 | |
| 518 What, then, about a branch in the conventional sense of '1.0 | |
| 519 maintenance branch'? Typically in Mercurial you'd handle this by | |
| 520 keeping a separate cloned repository for those changes. Cloning is | |
| 521 fast, and if local uses hard links where possible on filesystems that | |
| 522 support them, so isn't necessarily extravagant on disc space. You can, | |
| 523 if you prefer, handle them all in a single repo with 'named | |
| 524 branches', but cloning is definitely simpler. | |
| 525 | |
| 526 OK, so now you know the basics of using Mercurial. We can proceed to | |
| 527 looking at how this magic is achieved. In particular, where does this | |
| 528 magic globally unique identifier for a change come from? | |
| 529 | |
| 530 \subsection{Inside the Mercurial repo} | |
| 531 The way Mercurial handles its repo is really quite simple. | |
| 532 | |
| 533 That's simple, as in 'most things are simple once you know the | |
| 534 answer'. I found the explanation helpful\footnote{For the curious, | |
| 535 Bryan O'Sullivan's excellent Mercurial book | |
| 536 has a chapter on the subject, and the Mercurial website has a fair amount | |
| 537 of detail too.}, so this section attempts | |
| 538 the 10,000ft (FL100 if you prefer) view of Mercurial. | |
| 539 | |
| 540 First remember that any file or component can only have one or two | |
| 541 parents. You can't merge more than one other branch at once. | |
| 542 | |
| 543 We start with the basic building block, which Mercurial calls a | |
| 544 revlog. A revlog is a thing that holds a file and all the changes in | |
| 545 the file history\footnote{For any non-trivial file, this will | |
| 546 actually be two files on the disc, a data file and an index.}. The | |
| 547 revlog stores the differences between successive versions | |
| 548 of the file, though it will periodically store a complete version of | |
| 549 the file instead of a difference, so that the content of any | |
| 550 particular file version can always be reconstructed without excessive | |
| 551 effort. | |
| 552 | |
| 553 Under the secret-squirrel Mercurial \texttt{.hg} directory at the top of your | |
| 554 project is a store which holds a revlog for each file in your | |
| 555 project. So you have the complete history of the project locally. No | |
| 556 more round trips to the server. | |
| 557 | |
| 558 Both the differences between successive versions and the periodic | |
| 559 complete versions of a file are compressed before storing. This is | |
| 560 surprisingly effective at minimising the storage requirements this | |
| 561 entire history of your project. I have a small Java project handy, | |
| 562 comprising a little over 300 source modules. There are 5 branches plus | |
| 563 the mainline, and some 1920 commits in all. A Subversion checkout of | |
| 564 the current mainline takes 51Mb. Converting the project to Mercurial | |
| 565 yields a Mercurial repository that takes 60Mb, so a little | |
| 566 bigger. Remember, though, that the Mercurial repository includes not | |
| 567 just the working copy, but also the entire history of the project. | |
| 568 | |
| 569 Any point in the evolution of a revlog can be uniquely identified with | |
| 570 a nodeid. This is simply the SHA1 hash of the current file contents | |
| 571 concatenated with the nodeids of one or both parents of the current | |
| 572 revision. Note that this way, two file states are identical if and | |
| 573 only if the file contents are the same *and* the file has the | |
| 574 same history. | |
| 575 | |
| 576 Here's a dump of a revlog index: | |
| 577 | |
| 578 \begin{verbatim} | |
| 579 $ hg debugindex .hg/store/data/pome.txt.i | |
| 580 rev offset length base linkrev nodeid p1 p2 | |
| 581 0 0 32 0 0 6bbbd5d6cc53 000000000000 000000000000 | |
| 582 1 32 51 0 1 83d266583303 6bbbd5d6cc53 000000000000 | |
| 583 2 83 84 0 2 14a54ec34bb6 83d266583303 000000000000 | |
| 584 3 167 76 3 4 dc4df776b38b 83d266583303 000000000000 | |
| 585 $ | |
| 586 \end{verbatim} | |
| 587 | |
| 588 Note here that a file state can have two parents. If both the parent | |
| 589 nodeids are non-null, the file state has two parents, and the state is | |
| 590 therefore the result of a merge. | |
| 591 | |
| 592 Let's dump out a revlog at a particular revision: | |
| 593 | |
| 594 \begin{verbatim} | |
| 595 $ hg debugdata .hg/store/data/pome.txt.i 2 | |
| 596 There was a gibbon one morning | |
| 597 said "I think I will fly to the moon". | |
| 598 So with two great palms strapped to his arms, | |
| 599 he started his takeoff run. | |
| 600 $ | |
| 601 \end{verbatim} | |
| 602 | |
| 603 The next component is the manifest. This is simply a list of all the | |
| 604 files in the project, together with their current nodeids. The | |
| 605 manifest is a file, held in a revlog. The nodeid of the manifest, | |
| 606 therefore, identifies the project filesystem at a particular point. | |
| 607 | |
| 608 \begin{verbatim} | |
| 609 $ hg debugdata .hg/store/00manifest.i 5 | |
| 610 poem.txt5168b1a5e2f44aa4e0f164e170820845183f50c8 | |
| 611 $ | |
| 612 \end{verbatim} | |
| 613 | |
| 614 Finally we have the changeset. This is the atomic collection of | |
| 615 changes to a repository that leads to a new revision. The changeset | |
| 616 info includes the nodeid of the corresponding manifest, the timestamp | |
| 617 and committer ID, a list of changed files and a comment. The changeset | |
| 618 also includes the nodeid of the parent changeset, or the two parents | |
| 619 if the change is a merge. The changeset description is held in a | |
| 620 revlog, the changelog. | |
| 621 | |
| 622 \begin{verbatim} | |
| 623 $ hg debugdata .hg/store/00changelog.i 5 | |
| 624 1ccc11b6f7308cc8fa1573c2f3811a4710c91e3e | |
| 625 Jim Hague <jim.hague@acm.org> | |
| 626 1209061793 -3600 | |
| 627 poem.txt | |
| 628 pome.txt | |
| 629 | |
| 630 Merge first line branch | |
| 631 $ | |
| 632 \end{verbatim} | |
| 633 | |
| 634 The nodeid of the changeset, therefore, gives us a globally unique | |
| 635 identifier for any particular change. Changesets have a | |
| 636 Subversion-like incrementing change number, but it is peculiar to that | |
| 637 repository. The nodeid, however, is global. | |
| 638 | |
| 639 One more detail remains to complete the picture. How do we get back | |
| 640 from a particular file change to find the responsible changeset? Each | |
| 641 revlog change has a linkrev entry that does just this. | |
| 642 | |
| 643 So, now we have a repository with a history of the changes applied to | |
| 644 that repository. Each change has a unique identifier. If we find that | |
| 645 change in another repository, it means that at the point in the other | |
| 646 repository we have exactly the same state; the file contents and | |
| 647 history are identical. | |
| 648 | |
| 649 At this point we can see how pulling changes from another repository | |
| 650 works. Mercurial has to determine which changesets in the source | |
| 651 repository are missing in the target repository. To do this, for each | |
| 652 head in the source repo it has to find the most recent change in that | |
| 653 head that it already present in the target repo, and get any remaining | |
| 654 changes after that point. These changes are then copied over and | |
| 655 applied. | |
| 656 | |
| 657 The Mercurial revlog format has proved remarkably durable. Since the | |
| 658 first release of Mercurial in April 2005, these have been a total of 5 | |
| 659 changes to the file format. However, of those, all but one have been | |
| 660 changes to the handling of file names. The most recent change, in | |
| 661 October 2008, and its predecessor in December 2006, were both | |
| 662 introduced purely to cope with Windows specific issues. The one change | |
| 663 that touched the data structures described above was in April 2006. The | |
| 664 format introduced, RevLogNG, changed only the details of index data | |
| 665 held, not the overall design. The chief Mercurial developer, Matt | |
| 666 Mackall, notes that the code in present-day Mercurial devoted to | |
| 667 reading the old format comprises 28 lines of Python. Compared with, | |
| 668 say, the early tribulations of Subversion and the switch from \texttt{bdfs} to | |
| 669 \texttt{fsfs}, this is an impressive record. | |
| 670 | |
| 671 \section{Reflections on going distributed} | |
| 672 It's nearly traditional at this stage in an introduction to DVCS to | |
| 673 demonstrate several different workflow scenarios that you can build | |
| 674 with a DVCS. Which makes the important point that a DVCS can be | |
| 675 adapted to your workflow in a way that is at best unwieldy with a | |
| 676 CVCS. I intend, though, to break with tradition here. | |
| 677 | |
| 678 By this stage, I hope you can see that distributing version control | |
| 679 works by introducing branches where development takes place in | |
| 680 parallel. Mercurial treats these branches as arising naturally from | |
| 681 the commits made and transferred between repositories. Both Git and | |
| 682 Bazaar take a slightly different viewpoint, and explicitly generate a | |
| 683 fresh branch for work in a particular repositories. But in both cases | |
| 684 the underlying principle of identifying changes by a globally unique | |
| 685 identifier and resolving parallel development by merges between | |
| 686 overlapping changes is the same. And all three can be used in a truly | |
| 687 distributed manner, with full history and the ability to commit being | |
| 688 available locally. | |
| 689 | |
| 690 So instead of chatter on about workflows, I want instead to reflect on | |
| 691 the consequences all this has for that all-important question of | |
| 692 whether a DVCS is a suitable vehicle for your data. | |
| 693 | |
| 694 The first is a minor and rather obvious point. If you want to store | |
| 695 files that are very large and which change often in your DVCS, then | |
| 696 all the compression in the world is unlikely to stop the storage | |
| 697 requirements for the full project history from becoming uncomfortably | |
| 698 large, particularly if the files are not very compressible to start | |
| 699 with. | |
| 700 | |
| 701 The second, and main, point is that there is an important question you | |
| 702 need to ask about your data. We've seen that a DVCS relies on | |
| 703 branching and merging to weave its magic. So take a close look at your | |
| 704 data, and ask: | |
| 705 | |
| 706 \standout{Will It Merge?} | |
| 707 | |
| 708 The subset of plain old text which comprises program source | |
| 709 code requires some human oversight, but will merge automatically | |
| 710 well enough for the process to be well within the bounds of the | |
| 711 possible. | |
| 712 | |
| 713 Unfortunately when we move further afield mergeability becomes a rarer | |
| 714 commodity. I nearly began the previous paragraph by stating that | |
| 715 plain old text will merge well enough. Then Doubt set in~--- what about | |
| 716 XML? Or BASE64 encoded content? | |
| 717 | |
| 718 Of course, merge doesn't necessarily have to be textual merge. I am | |
| 719 told that Word can be used to diff and merge two Word \texttt{.doc} files, a | |
| 720 data format notorious for its binary impenetrability. As long as some | |
| 721 suitable merge agent is available, and the DVCS can be configured to | |
| 722 use it for data of a particular type\footnote{Mercurial can have the | |
| 723 merge and diff tools specified with reference to the file extension on | |
| 724 which they operate~--- I assume Bazaar and Git are similar.}, then there | |
| 725 is no bar to successful DVCS use. | |
| 726 | |
| 727 Before this reliance on mergeability causes you to dismiss DVCS out of | |
| 728 hand, reflect. A CVCS can only handle non-mergeable data by acting as | |
| 729 a versioned file store; in other words, having as the only available | |
| 730 merge option the use of one or other of the merge candidates in its | |
| 731 entirety. Useful though a versioned file store can be, it cannot be | |
| 732 considered a full-featured version control system. By treating the | |
| 733 offending unmergeable files as external to the DVCS, or with careful | |
| 734 workflow~--- disabling the distributed and mergeable potentials~--- a DVCS | |
| 735 can deal with these files, but only at a cost of its distributedness | |
| 736 or its version control system-ness. In this it differs little from a | |
| 737 CVCS. | |
| 738 | |
| 739 So, for all data you want to version control, let your battle cry be: | |
| 740 | |
| 741 \standout{Will It Merge?} | |
| 742 | |
| 743 At this point, I have an urge to don lab coat and safety goggles and | |
| 744 be videoed attempting to mechanically merge data in a variety of | |
| 745 different formats. Frankly, this is unlikely to be as exciting at | |
| 746 blending iPhones\footnote{\url{http://www.willitblend.com}}, | |
| 747 but from a system development point of view it's rather more | |
| 748 important. And, I think gives us a large clue as to one of the | |
| 749 reasons for the continuing | |
| 750 popularity of Plain Old Text as a source code representation mechanism. | |
| 751 | |
| 752 \end{document} |
