Mercurial > CVu-Mercurial
comparison Hg.txt @ 5:2ec53c0ed5d8
Musings on Merging and Mergeability.
author | Jim Hague <jim.hague@icc-atcsolutions.com> |
---|---|
date | Fri, 06 Mar 2009 14:07:34 +0000 |
parents | 561edf852797 |
children | a942bf7bc2ab |
comparison
equal
deleted
inserted
replaced
4:561edf852797 | 5:2ec53c0ed5d8 |
---|---|
493 | 493 |
494 We start with the basic building block, which Mercurial calls a | 494 We start with the basic building block, which Mercurial calls a |
495 revlog. A revlog is a thing that holds a file and all the changes in | 495 revlog. A revlog is a thing that holds a file and all the changes in |
496 the file history. (Footnote: For any non-trivial file, this will | 496 the file history. (Footnote: For any non-trivial file, this will |
497 actually be two files on the disc, a data file and an index). The | 497 actually be two files on the disc, a data file and an index). The |
498 revlog stores the (compressed) differences between successive versions | 498 revlog stores the differences between successive versions |
499 of the file, though it will periodically store a complete version of | 499 of the file, though it will periodically store a complete version of |
500 the file instead of a difference, so that the content of any | 500 the file instead of a difference, so that the content of any |
501 particular file version can always be reconstructed without excessive | 501 particular file version can always be reconstructed without excessive |
502 effort. | 502 effort. |
503 | 503 |
504 Under the secret-squirrel Mercurial .hg directory at the top of your | 504 Under the secret-squirrel Mercurial .hg directory at the top of your |
505 project is a store which holds a revlog for each file in your project. | 505 project is a store which holds a revlog for each file in your |
506 project. So you have the complete history of the project locally. No | |
507 more round trips to the server. | |
508 | |
509 Both the differences between successive versions and the periodic | |
510 complete versions of a file are compressed before storing. This is | |
511 surprisingly effective at minimising the storage requirements this | |
512 entire history of your project. <!!!Comparison of .svn space | |
513 requirements for Waldo>. | |
506 | 514 |
507 Any point in the evolution of a revlog can be uniquely identified with | 515 Any point in the evolution of a revlog can be uniquely identified with |
508 a nodeid. This is simply the SHA1 hash of the current file contents | 516 a nodeid. This is simply the SHA1 hash of the current file contents |
509 concatenated with the nodeids of one or both parents of the current | 517 concatenated with the nodeids of one or both parents of the current |
510 revision. Note that this way, two file states are identical if and | 518 revision. Note that this way, two file states are identical if and |
595 held, not the overall design. The chief Mercurial developer, Matt | 603 held, not the overall design. The chief Mercurial developer, Matt |
596 Mackall, notes that the code in present-day Mercurial devoted to | 604 Mackall, notes that the code in present-day Mercurial devoted to |
597 reading the old format comprises 28 lines of Python. Compared with, | 605 reading the old format comprises 28 lines of Python. Compared with, |
598 say, the early tribulations of Subversion and the switch from bdfs to | 606 say, the early tribulations of Subversion and the switch from bdfs to |
599 fsfs, this is an impressive record. | 607 fsfs, this is an impressive record. |
608 | |
609 Reflections on going distributed | |
610 -------------------------------- | |
611 | |
612 It's nearly traditional at this stage in an introduction to DVCS to | |
613 demonstrate several differenet workflow scanarios that you can build | |
614 with a DVCS. Which makes the important point that a DVCS can be | |
615 adapted to your workflow in a way that is at best unwieldy with a | |
616 CVCS. I intend, though, to break with tradition here. | |
617 | |
618 By this stage, I hope you can see that distributing version control | |
619 works by introducing branches where development takes place in | |
620 parallel. Mercurial treats these branches as arising naturally from | |
621 the commits made and transferred between repositories. Both Git and | |
622 Bazaar take a slightly different viewpoint, and explicitly generate a | |
623 fresh branch for work in a particular repositories. But in both cases | |
624 the underlying principle of identifying changes by a globally unique | |
625 identifier and resolving parallel development by merges between | |
626 overlapping changes is the same. And all three can be used in a truly | |
627 distributed manner, with full history and the ability to commit being | |
628 available locally. | |
629 | |
630 I want now to reflect on the consequences all this has for that | |
631 all-important question of whether a DVCS is a suitable vehicle for | |
632 your data. | |
633 | |
634 The first is a minor and rather obvious point. If you want to store | |
635 files that are both very large and which change often in your DVCS, | |
636 then all the compression in the world is unlikely to stop the storage | |
637 requirements for the full project history from becoming | |
638 uncomfortably large. | |
639 | |
640 The second, and main, point is that there is an important question you | |
641 need to ask about your data. We've seen that a DVCS relies on | |
642 branching and merging to weave its magic. So take a close look at your | |
643 data, and ask: | |
644 | |
645 Will It Merge? | |
646 | |
647 The subset of plain old text which comprises program source | |
648 code requires some human oversight, but will merge automatically | |
649 well enough for the process to be well within the bounds of the | |
650 possible. | |
651 | |
652 Unfortunately when we move further afield mergeability becomes a rarer | |
653 commodity. I nearly began the previous paragraph by stating that | |
654 plain old text will merge well enough. Then Doubt set in - what about | |
655 XML? Or BASE64 encoded content? | |
656 | |
657 Of course, merge doesn't necessarily have to be textual merge. I am | |
658 told that Word can be used to diff and merge two Word .doc files, a | |
659 data format notorious for its binary impenetrability. As long as some | |
660 suitable merge agent is available, and the DVCS can be configured to | |
661 use it for data of a particular type (Footnote: Mercurial can have the | |
662 merge and diff tools specified with reference to the file extension on | |
663 which they operate - I assume Bazaar and Git are similar.), then there | |
664 is no bar to successful DVCS use. | |
665 | |
666 Before this reliance on mergeability causes you to dismiss DVCS out of | |
667 hand, reflect. A CVCS can only handle non-mergeable data by acting as | |
668 a versioned file store; in other words, having as the only available | |
669 merge option the use of one or other of the merge candidates in its | |
670 entireity. Useful though a versioned file store can be, it cannot be | |
671 considered a full-featured version control system. By treating the | |
672 offending unmergeable files as external to the DVCS, or with careful | |
673 workflow - disabling the distributed and mergeable potentials - a DVCS | |
674 can deal with these files, but only at a cost of its distributedness | |
675 or its version control system-ness. In this it differs little from a | |
676 CVCS. | |
677 | |
678 So, for all data you want to version control, let your battle cry be | |
679 | |
680 Will It Merge? | |
681 | |
682 At this point, I have an urge to don lab coat and safety goggles and | |
683 be videoed attempting to mechanically merge data in a variety of | |
684 different formats. Frankly, this is unlike to be as exciting at | |
685 blending iPhones (Ref: www.willitblend.com), but from a system | |
686 development point of view it's rather more important. And, I think | |
687 gives us a large clue as to one of the reasons for the continuing | |
688 popularity of Plain Old Text as a source code representation mechanism. |