版本管理 Comments on (OSS/FS) Software Configuration Management (SCM) Systems

laofo · 2008年09月22日 · 9 次阅读

Comments on Open Source Software / Free Software (OSS/FS) Software Configuration Management (SCM) Systems Also available in [url=http://subversion.bluegate.org/wheeler.htmlJapanese[/urlComments] on Open Source Software / Free Software (OSS/FS) Software Configuration Management (SCM) Systems[i] by David A. Wheeler [i] April 10, 2004; lightly revised May 18, 2005

[i] This paper is getting increasingly obsolete, but I'm leaving it herebecause there are some broader principles noted here. Enjoy.More recent articles include[url=http://blogs.gnome.org/newren/2008/03/01/happenings-in-the-vcs-world/Elijah's] 2008-03-01 "Happenings in the VCS World"[/url] Today, subversion (svn) is widely used by those who need a simple centralized SCM;git is powerful but has a hideous user interface;mercurial (Hg), bazaar (bzr), and Monotone (among others) have their supporters and major-project users.

With the release of Subversion 1.0, lots of people arediscussing the pros and cons of various software configurationmanagement (SCM) / version control systems available asopen source software / Free Software (OSS/FS).Indeed, the problem is now an embarassment of reasonable choices:there are several OSS/FS SCM systems available today.Here's some information about SCM systemsthat I've learned that you may find helpful;I'll discuss four options (CVS, Subversion, GNU arch, and Monotone),the differences between centralized and decentralized SCM,a discussion about using GNU arch to support centralized development,and a few links to other reviews.I think future SCM systems will need to counter more threats thantoday's SCM systems are designed to do;[url=http://www.dwheeler.com/essays/scm-security.htmlfeel] free to also look at my paperon SCM security[/url].CVS, Subversion, GNU Arch, and MonotoneIn my opinion three OSS/FS SCM systems got the most discussionin April 2004: CVS, Subversion, and GNU Arch.Two other SCM systems that are getting more than a little attention areMonotone and Bazaar-NG, so I have a few comments about them.As of April 2005, git/Cogito have entered the arena with a bang, sincethis pair of tools is being developed specifically for Linux kerneldevelopment (this is a large number of smart, motivated developers whohave the most experience of anyone with distributed SCMs).There are certainly other SCM tools (such as Aegis and CodeVille),and I don't mean to intentionallyexclude them, but I just haven't had the time to examine the others in asmuch depth.Besides, knowing about these four will help you understand the rest.So, here's a brief discussion about each.CVS[url=http://www.cvshome.org/CVS[/urlis] extremely popular, and it does the job.In fact, when CVS was released,[url=http://www.dwheeler.com/innovation/innovation.htmlCVS] wasa major new innovation in software configuration management[/url].However, CVS is now showing its age through a number of awkward limitations:changes are tracked per-file instead of per-change, commits aren'tatomic, renaming files and directories is awkward, and its branchinglimitations mean that you'd better faithfully tag things or there'llbe trouble later.Some of the maintainers of the original CVShave declared that the CVS code has becometoo crusty to effectively maintain.These problems led the main CVS developers to start over andcreate Subversion.Subversion[url=http://subversion.tigris.org/Subversion[/url]] (SVN) isa new system, intending to be a simple replacement of CVS.I looked at Subversion 1.0, released February 24, 2004.Subversion is basically a re-implementation of CVS with itswarts fixed, and it still works the same basic way (supporting acentralized repository).Like CVS,subversion by itself is intended to support a centralizedrepository for developers and doesn't handle decentralized development well;the [url=http://svk.elixus.org/svk] project[/url] extends subversion to support decentralized development.From a technology point-of-view you can definitely argue with someof subversion's decisions.For example, they don't handle changesetsas directly as you'd expect given their centrality to the problem.But technical advancement is not the same as utility; for many peoplewho currently use CVS and just want an incremental improvement,subversion is probably more or less what they were expecting andlooking for.But there are weaknesses, for example, Subversion doesn't keep trackof "which patches have already been applied" on a given branch, andtrying to reapply a patch more than once causes problems.Thus, subversion has trouble with history-sensitive merging of brancheswhere the branches share parts (GNU arch doesn't have this problem,because it [i] does track what merges have been applied). In 2004there were concerns by some about Subversion's use of db to store data(rather than the safer flat files), since in a few cases this canlet things get "stuck".In practice this doesn't seem to beso bad (in part because the data can be extracted), but certainlysome are concerned.In newer versions, there is a database backend called fsfs which usesflat files.The fsf backend was created because subversion hadhad some problems with the DB backend in debian-installer (a fairly large repository); fsfs works withoutany problems in that case. Subversion uses a BSD-old-like license that, while OSS/FS, isGPL-incompatible, and that's unfortunate([url=http://www.dwheeler.com/essays/gpl-compatible.htmlGPLincompatibility] can be a problem[/url]).Subversion [i] can be used to maintain GPL software or anyother kind, without restrictions. Subversion depends on a large number of libraries and programs(and can be perceived as rather "heavyweight"), so it can take someeffort to install currently; distributions will probably be quickto include it, so that problem should go away relatively soon.[url=http://svnbook.red-bean.com/This] book on Subversion[/url] gives more information about it. By the way, there's a general problem with Subversion that isshared by many other SCM tools: Subversion tracks file contents,but it doesn't track the modification date/timestamp of individual files(i.e., it fails to record important metainformation).Generated files can store the date/timestamp of the retrieval, ormaybe of the changeset, but the latter is not the default.This can produce extra build work, or inaccurate builds.See the email[url=http://mail.python.org/pipermail/python-dev/2005-December/058641.htmlShould]" I really have to install Python before Ican build it?[/url] onDecember 13, 2005, for a more detailed explanation.SCM tools that record modification times, as well as the file names andcontents, don't have this problem, though they can have a different problem:if a users' clock is severely off, they can cause serious build problems.This can be partly but not completely alleviated by performing extrachecks when the files are transferred, but some designs make this hard.Of course, this presumes that all times are for a common standard(e.g., UTC); if clock times are recorded in LOCAL time you have evenmore trouble. If you're using CVS and want a simple upgrade path to something better,Subversion appears to be the simplest approach.It works in a very similar way to CVS(in particular through a centralized repository), allowing any of theauthorized developers to immediately modify a shared repository(with a record that it was done so and rollback capability).Subversion is what it intends to be: an improved CVS. GNU Arch[url=http://www.gnuarch.org/GNU] arch[/url] is a very interestingcompetitor, and works in a completely different way from CVS and Subversion.GNU Arch is released under the GNU GPL.I looked at GNU Arch version 1.2, released February 26, 2004.GNU arch is fully decentralized, which makes it very work well fordecentralized development (like the Linux kernel's development process).It has a very clever and remarkably simple approach to handling data,so it works very easily with many other tools.The "smarts" are in the client tools, not the server, so a simple secure ftpsite or shared directory can serve as the repository, an intriguingcapability for such a powerful SCM system.It has simple dependencies, so it's easy to set up too.Decentralized development has its strengths, particularly in allowingdifferent people to try different approaches (e.g., independentbranches and forks) independently and then bringing them together later.This ability to scale and support "survival of the fittest" is whatmakes decentralized development so important for Linux kernel maintenance.Arch can also be used for centralized development, but seemy discussion below about that. There are also a number of people who have built support tools and suchthat support arch.For example,[url=http://wiki.gnuarch.org/moin.cgi/graphing_20relationstla-graph[/urlcan] create a graph of the patchlogs in archives. Indeed, I really like arch, yet I'm also frustrated by it.It has so many positive strengths, so it might be confusing whyI think it has some problems. So, here's a discussion of its problems,which basically show GNU arch is a tool that's already very usable butneeds some maturing. A serious weakness of arch is thatit doesn't work well on Windows-based systems,and it's not clear if that will ever change.There are ports of arch, both non-native (Cygwin and Services for Unix) and a native port too.However, the current win32 port is only in its early stages,and the[url=http://wiki.gnuarch.org/moin.cgi/Native_20WIN32_20SupportWin32] page on the Arch wiki says "Arch was neverintended to run on a non-POSIX system.Don't expect to have a full blown arch on your Microsoft computer."[/url] At least part of the problem is the long filenames used internally by arch;arch could certainly be modified to help, though there doesn'tseem to be much movement in that direction.Other problematic areas include[url=http://mail.gnu.org/archive/html/gnu-arch-users/2004-02/msg00211.htmlsymbolic] links, proper file permissions, and newline problems[/url],as well as the general immaturity of the port as of March 2004.Some people don't think that poor Windows support isa problem; to me (and others!), that's a serious problem.Even if [i] you don't use any Microsoft Windows systems,people don't want touse many different SCM systems, so if one can handle many environmentsand the other can't, people will use the one that can handle more environments.I think GNU Arch's use will be hampered by this lack of support aslong as this is true, even for people who never use Windows;good [i] native Windows support is very important for an SCM tool. Arch has some awkward weaknesses involving filenames.Arch uses[url=http://wiki.gnuarch.org/moin.cgi/FunkyFileNamesextremely] odd filenaming conventions[/url] that causetrouble for scripts, command-line use, and many common tools.Its "+" prefixes cause problems with extremely common tools likevi, vim, and the pagermore (this is especially a problem when trying toenter change log information - why choose a convention that'sinconvenient for one of the world's most popular text editors?).Its "=" prefixes expose a bug in bash filename completion(this bug will eventually be fixed in bash, but buggy implementationswill be around for a long time to come because this is such a rare needand bash is the default shell for many systems).And although this is less of a problem, it stores data in an "{arch}"directory, but the "{}" characters cause problems formany shells (particularly C shells) because they have aspecial meaning (they're filename globbing characters like "*").For example, in C shells you can't "cd {arch}" or "vi {arch}/whatever";you must quote the directory name.The problem isn't that filename conventions are a bad idea; most CMsystems have them! The problem is that some ofthe conventions chosen by arch seem to be [i] designed to interfere withcommonly-used tools, and thus require using many work-aroundswhen using common tools(such as prefixing the filename with "./" or using the "--" option).That's unfortunate since GNU Arch's underlying conceptswork [i] well with other tools; if the developers had chosenbetter conventions these problems would never have occurred.I suspect these poorly-chosen conventions aretoo ingrained to be easily changed now, but there's always hope.There are ways to override the defaults in some cases, but not inmany, and tools should choose [i] good defaults.It's too bad, because nothing in arch's fundamental design[i] requires these particular filename conventions.In February 2004 arch couldn't handle spaces in filenames,but this significant defect has been fixed; version 1.2.1 and latersupport spaces in filenames. GNU arch gives you a lot of control using lower-level commands, but itdoesn't (yet) automate a number of tasks that it really should be automating.Many common operations require multiple commands, when instead asingle command and reasonable options should be enough for most people.If you use a single archive for a long time in GNU arch,it eventually accumulates a very large amount of dataand becomes inconvenient to work with.[url=http://www.gnu.org/software/gnu-arch/tutorial/new-archive.html#Creating_a_New_Archivearch's] developer suggestsdividing archives by time and including a date in the archive name[/url].I think handling this accumulation is a nuisance;this kind of manual work is exactly what an SCM should handle automatically(e.g., perhaps arch could hide branches that have beenunused in more than a year, by default).Arch has nice caching facilities (both in archives and on individualworkstations) which can speed access tospecific versions.However, these caches often have to be created by hand(by default the tool should automatically create caches, and remove oldautomatically-created caches, as well).Arch works slowly if the {arch} directory is on NFS; thetool should be able to detect slow execution and automatically try to findan efficient alternative, instead of requiring user workarounds.Many arch developers seem to create a similar set of higher-levelspecialized scripts to automate common tasks,but that's missing the point: you shouldn't have towrite scripts to make a tool automate common tasks.An SCM tool should include commands that,through automation and good defaults,"do the right thing" for common tasks.The good news is that the arch developers are realizing that thisis a problem and correcting it.The "rm" (delete) command deletes both theid and the corresponding file automatically (instead of requiring two steps);that capability was only added on February 23, 2004, though, so clearlyautomating steps has only begun.The documentation notes that automatic cache management isdesirable; it just hasn't been done.The mirroring capability is clever, but if you downloada mirror and make a change, you can't commit the change andthe tool isn't smart enough to automatically help(even though the tool [i] does have information on the mirror's source).The website described a[url=http://wiki.gnuarch.org/moin.cgi/Arch_20Recipes#head-2c5332402d44ef87a1bb8d4fdf2ccaf758a57334complicated] workaround using undo and redo[/url],and Jan Huldec described a simpler approach (using tag, sync-tree, andset-tree-version),but the tool should be able to help commit changes even if youdownloaded from a mirror. Arch will sometimes allow dangerous or problematicoperations that just shouldn't be allowed.For example, branches should be either commit-based branches(all revisions after base-0 are created by commit) or tag-based branches (all revisions are created by tag);[url=http://www.gnu.org/software/gnu-arch/tutorial/symbolic-tags.html#Symbolic_Tagsmerging] commands will not work otherwise[/url],yet the tool doesn't enforce this limitation.The tla tool doesn't check if there are still pending mergerejections (.rej reject files), so operations such ascommit, update, replay, or star-merge produce a scrambled workarea;users make mistakes, and an SCM system should work to [i] protect data. The user interface also has some problems.Under the user nightmare clause, the "mv" and "move" commands dodifferent things: "mv" moves moves both the id and the file, while"move" only moves the id.This user interface seems designed for confusion;why not make "move" and "mv" the same, and make "mv-id" the only commandthat only manipulates id's?Many commands are aliases, which simply makes documentationunnecessarily complicated. The arch documentation is weak and needs more work;that's especially unfortunate, because the documentation issuescan hamper early adopters who want to start using it today.A careful reading of what's available on-line should be enoughfor at least basic use of arch, though.Much of the documentation emphasizes lower-level implementation details(e.g., exactly how a command is implemented in the localfilesystem) instead of emphasizing the higher-level constructs.Some of the documentation emphasize aliases, which isextremely distracting; if "add" and "add-id" mean the same thing,just document "add" (and later on, in an ignorable note, list the aliases).In some cases the documentation needs to be updated for what thesoftware actually does.The on-line tutorial at the[url=http://www.gnu.org/software/gnu-archFSF] GNU arch website[/url] is a good place to start, and the[url=http://wiki.gnuarch.org/Arch] Wiki[/url] is an especially goodplace to find some more detailed reference material. In general, GNU arch isn't currently as mature as subversion.Its implementation needs more shaking down, its weird filenamelimitations should be fixed, and it sometimes requires users to dooptimizations "by hand" when the tool should be handling it automatically.As noted above,its commands are sometimes on the low-level side; it can take severalsimple commands to set up values that should be defaults orbuilt-in recipes/commands.And the documentation needs work. But don't count out GNU arch for the long term based on these problems,most of which are short-term.Many of these problems simply reflect the fact that GNU arch hasn't had as muchtime to mature as other tools like subversion.I'm documenting these problems because, in fact, GNU arch has a lotgoing for it.In my opinion, the GNU arch developers have emphasized simplicity,openness of design, and power (ability to handle complex situations),and have paid less attention so farto ease of use (especially for simple situations).Thus, although it has problems as noted above,GNU arch is extremely powerful and its basicconcepts are very flexible.More time and tools that build on top of GNU arch can resolve these issues.Arch is also endorsed by the Free Software Foundation (FSF) anddirectly supported by their Savannah system; that'scertainly no guarantee of success, but endorsements like that often bringusers and developers to a project, increasing its likelihood of success.GNU arch is a frankly more interesting approach to the problem,and it has a [i] lot of promise. [url=http://lists.seyza.com/pipermail/gnu-arch-dev/2005-April/001001.htmlThis] open letter from Tom Lord (GNU Arch's developer) to Linus Torvaldsexplains the basic concepts behind GNU Arch, in more detail.[/url] Unfortunately, events in 2004 and 2005 make it a little less clearhow things well GNU Arch will move forward.Many developers seem to like many of the [i] ideas in GNU Arch,but not the implementation.As a result, several other projectshave been started which take some of the [i] ideas of GNU Arch, but are separate projects which aim to be muchmore user-friendly, portable to Microsoft Windows as well as Unix-like systems,and so on.SCM projects that are conceptual descendents of GNU arch include[url=http://www.nongnu.org/arx/Arx[/url(which] has poor Windows support),[url=http://bazaar.canonical.com/Bazaar[/url]] (also named baz) which is essentially a friendly fork of GNU Arch to improve it(primarily its UI),and especially[url=http://bazaar-ng.org/Bazaar-NG[/url]] (also named bzr).The Bazaar folks are working to ensure a smooth transition toBazaar-NG once that becomes ready. Bazaar-NGThus[url=http://bazaar-ng.org/Bazaar-NG[/url]] (also named bzr) is a new distributed SCM system thatbuilds on the ideas of Bazaar (which extended GNU Arch),but it's essentially a new project.[url=http://bazaar-ng.org/todo-from-arch.htmlHere's] how theBazaar-NG developers compare their work with GNU arch[/url].Bazaar-NG is trying toexploit some of the major innovations in arch, but by providing aninterface that's easier to use (e.g., "doing the right thing" andeasily supporting common operations), trying to make it easierto transition to, and it borrows many ideas from elsewhere. I like much of what I see in Bazaar-NG.The main developer is developing the user documentation and codesimultaneously (an approach I heartily recommend),and emphasizing common use cases.As a result, it appears that the most common use caseswill be especially easy to do -- something very important in SCM systems.I like it when people write user documentation simultaneously, becauseif a common operation is hard to explain, that's a good signal that thetool isn't user-friendly enough.GNU Arch is an unfortunate example -- it needs good documentation becausesome of its operations are more complicated or awkward than necessary(some would say Arch has "unnecessary user-hostile complexity").The Bazaar-NG developers plan to cryptographically sign changes to counter thedangers of repository subversion (see my[url=http://www.dwheeler.com/essays/scm-security.htmlcompanion] paperon software configuration management (SCM) security[/url] for more information). It's developed in Python, which means it should easily port to any system.Some may be concerned that the resulting system will be too slow;I suspect that concern isn't well-founded, and portions could berewritten for speed if that becomes a problem, but that remainsto be seen. Other SCM systems, such as CodeVille, are written in Python,so this isn't a strange choice. Bazaar-NG is far less mature than many other projects.So keep that in mind; as of April 2005 I wouldn't commit a large,pre-existing project to Bazaar-NG! But since Bazaar-NG hasfinancial backing from the company Canonical, who commercially supportUbuntu, it may catch up very rapidly.Its emphasis on ease-of-use is quite heartening. Monotone[url=http://www.venge.net/monotoneMonotone[/urlis] another decentralized SCM.It's released under the GPL; it uses the programming language Lua(e.g., for hooks), whose implementation has been released under the MIT license(historically it was released under a zlib-like license).I looked at version 0.10, released March 1, 2004.Monotone is interesting because it'sdifferent approach to a distributed SCM.As Shlomi Fish describes it,[indent]"changesets are posted to a depot(that can be a CGI script, an NNTP newsgroup or a mailing list),which collects changesets from various sources.Afterwards, each developer commits the desirable changesetsinto his own private repository....Monotone identifies the versions of files and directoriesusing their SHA1 checksum. Thus, it can identify when a filewas copied or moved, if the signature is identicaland merge the two copies.It also has a command set that tries to emulate CVS as much as possible."[/indent] Monotone basically has a three-layer structure(working copy, local database, and net server).This is different from GNU Arch, which basically has only two layers(working copy and archive), though GNU Arch has a few toolsthat make archives work together in special cases (e.g., for mirroring).In few cases this is more convenient than GNU Arch; GNU Arch sometimes makesyou enter hand-wringingly long commands to copy data between archives(say from "my local archive" to a "master shared archive").If in contrast you're simply posting data from a local databaseto a net server in Monotone, it works well.Monotone is based by using SHA-1 hashes for everything;specific file versions are identified with hashes, and sets of filesare identified through the hash of its manifest.That means that SHA-1 hashes are evenused as a global namespace for version id's.This has some nice technical properties, but it also means that thenormal version numbers used in Monotone aren't meaningful to humans.Thankfully, you don't have to type in long SHA-1 hashes everywhere, onlyenough to be unique. In Monotone, each person manages their own local database,and never automatically trusts anything sent by the net server.That can be a little disconcerting, and doesn't appear to beas strong a support if you want to implement centralized development.Internally Monotone uses an underlying simple SQL database (SQLite).It's hard to say if that's good or bad. One very nice property of Monotone is that it has good supportfor recording status about approvals and disapprovals, as well as fortest results (this is something GNU Arch doesn't do well).Monotone can generate ancestry graphsin xvcg graph visualization format (a separate tool forGNU Arch can create graphs too). Monotone supports handling file metadata like file permissions (whichones can be executed) and symbolic links by creating and editing aspecial file (.mt-attrs).This works, but it's nowhere near as convenient as other tools likeGNU Arch (which handle this automatically).Monotone requires you to "add" and "drop" each file to state which filesin a working copy must be managed.GNU Arch has this mode, but can also be used in a mode where the simplefilenames are enough to determine this.I prefer explicit add and drop commands, so I think thisis fine, but some may not like this choice.Monotone can only commit entire sets of files;GNU Arch can also commit specific named files as well.This is an advantage for GNU Arch;if you found a minor unrelated problem while working onsomething else, in GNU Arch (and BitKeeper) you can make that small fix and commit just that one file. There's current work to port Monotone to Windows (using MinGW and Cygwin),but this work in 2004 was very preliminary.This lack of a Windows port is a problem, as I noted earlier with GNU Arch.As of 2005 this appears to have gotten better, but I haven't checked indetail. Monotone has recently fixed some of its problemsin handling unusual filenames(this seems to be a common problem in SCM systems).Monotone's emphasis on security, and its clear concepts, make it anotherSCM worth considering.Monotone's approach to merging is based on three-way merging and SHA-1 hashes.The Monotone folks argue that the Arch approach is somewhat weakerthan Monotone's approach, but note that Monotone isn't nearlyas good as Arch in supporting some kinds of "cherry-picking"(see the [url=http://www.venge.net/monotone/faq.htmlMonotone] FAQ[/url] for more information), so it's hard for me to declare either one a"winner" in terms of merge capabilities. The Monotone command sets are intentionally similar to CVS, and that canhelp old CVS users somewhat. But only to a point! The underlying conceptsof Monotone are so different that the "same" commands aren't really the same.Monotone's documentation needs work too, but I can say thatit was easy to get the current "depot" of Monotone -- while GNU Archdidn't have clear instructions for the equivalent action. One unfortunate thing: if you forget to commit before merging, and there'sa conflict, you could be in for a lot of problems.Here's what their documentation says: [indent] Monotone makes very little distinction between a"pre-commit" merge (an update) and a "post-commit" merge. Both sorts ofmerge use the exact same algorithm. The major difference concerns therecoverability of the pre-merge state: if you commit your work first,and merge after committing, the merge can fail (due to difficulty in amanual merge step) and your committed state is still safe. It istherefore recommended that you commit your work first, before merging. [/indent] Shame, shame! SCM systems should work very hard to [i] prevent data loss or scrambling.Please, SCM authors, build in protection mechanisms or do an automaticcommit-before-merge or something else to keep developers out of trouble.They're only human, and commands that can cause data loss or scramblingshould require explicit requests, not through the useof normal (and commonly-used) commands.In 2004 Monotone was experimenting with a "netsync" protocol for synchronizingtwo databases, which was clever but needed shaking out.As of April 2005, Monotone has switched to using netsync exclusively.However, Monotone can't use a simple repository (like sftp) for centralizedrepository, which is a minor negative compared to GNU Arch.In 2004 Monotone had nice email support, which I thought was a nice plus(GNU Arch, for example, doesn'tdo a very good job supporting email automatically).Monotone still supports some email work (e.g., using its Packet I/Ocapabilities) but it's not clear that it's as good as it was.Not everyone can run a server, and it's nice to allow for the use of emailas a transport (because [i] everyone can get email). Monotone does appear to be less popular than GNU Arch(as determined by Google link counts), for what that's worth.Since Monotone seems to be less popular than GNU Arch, and hasa version number less than one (suggesting that it's "not as ready"),I'm going concentrate more on GNU Arch as an example of adecentralized SCM for the rest of the paper.But Monotone can't be counted out for the future. Centralized vs. Decentralized SCMAs you can tell, there seems to be two different schools of thoughton how SCM systems should work.Some people believe SCM systems should primarily aid incontrolling a centralized repository, and so they design theirtool to support a centralized repository (such as CVS and Subversion).Others believe SCM systems should primarily aid in allowingindependent developers to work asynchronously, and then synchronizeand pull in changes from each others, so they develop tools tosupport a decentralized approach(like GNU arch, monotone, darcs, Bazaar-NG, and Bitkeeper).Tools built to support one approach [i] can be used to support theother approach, but it's still important to understand the difference. Tools built to support one camp can sometimessupport the other approach, to at least some extent.However, it's not as clear to me that these supports for the "other approach"are always as good as a tool made to do the same thing natively.That's particularly true when centralized systemstry to support decentralized development(in theory a distributed system should be able toeasily support centralization easily, though a particular tool maynot do a good job).Subversion has svk, which builds a distributed SCM systemon top of subversion.However, implementing svk on top of subversion is a very heavyweightway to create a distributed SCM system, far exceedingwhat it takes to implement a natively distributed SCM system.GNU arch can easily support a centralized repository by having developersshare read/write privileges to a directory that implements therepository, but see the discussion below about security concerns I have(due to the direct control over the repository by users).There's also the extra tool[url=http://web.verbum.org/arch-pqm/arch-pqm[/url]] which can helpmitigate some of my security concerns,though it's not currently integrated into GNU arch.The various projects' supporters all seem to feel that "their side"does adequately support the other approach, though.I [i] do expect that the different projects will continue workingto get better at supporting the "other" approach,so in a few years this distinction may get [i] really fuzzy. A collection of[url=http://www.kerneltraffic.org/kernel-traffic/kt20030323_210.txtmessages] in Kernel Traffic illuminate some of the advantages ofdistributed SCM, and some of the challenges in implementing such systems.[/url] In particular, Larry McVoy identifies some of the challenges he facedimplementing BitKeeper: rename handling in a distributed system,security semantics (since each user controls their own area),and time semantics (time moves all around).He also claims that merging branches when things are truly distributed,in a way that eliminates unnecessary manual repairs and re-repairs,is not easy. [url=http://lwn.net/Articles/72498/A] posting by Bastiaan Veelo atLinux Weekly News[/url] has a nice summary: [indent]"The most important thing to be aware of though is thatArch and Subversion differ in fundamental ways.Arch works in a decentralized way, while Subversion is designedon a client/server model.Indeed with Arch you can start codingand using version control without first applyingfor access to the server.However, [merging] your code with the main branch hasto be done by the one project maintainer....Development with Subversion (and CVS for that matter) is centralized in the sense that there is just one repository,but it is actually more decentralized in a social sense sincethere are as many code integrators as there are developerswith write access to the repository. In short, one could say that Arch is centralized around acode integrator, and that Subversion (like CVS) is centralizedaround a repository.You decide what fits best. If you are a heavy user of CVS...chances are that Subversion actually fits your needs best. [/indent][url=http://lwn.net/Articles/246381/Linus] Torvalds has aninteresting post about the advantages of distributed development[/url]. The subversion developers have a very enlightened post about this titled[url=http://subversion.tigris.org/subversion-linus.htmlPlease] Stop Bugging Linus Torvalds About Subversion[/url].In it, they say:"We, the Subversion development team, would like to explain why we agreethat Subversion would not be the right choice for the Linux kernel.Subversion was primarily designed as a replacement for CVS.It is a centralized version control system.It does not support distributed repositories, nor foreign branching,nor tracking of dependencies between changesets.Given the way Linus and the kernel team work, using patch swappingand decentralized development, Subversion would simply not be much help.While Subversion has been well-received by many open source projects,that doesn't mean it's right for every project."In short, tools are typically developed to support certain approaches,and if you want to work in a certain way you need to choose tools thathelp (not hurt) the process, create those tools,or change your process to better fit the tools available. Using Arch to Support Centralized DevelopmentAs I noted above,conceptually a distributed approach should be able to fully implement thecentralized approach.I do have some concerns about the recommended method for usingGNU arch to support a centralized repository of multiple developers.It appears that some support tools will deal with my concerns, thoughusing them takes much more effort. [url=http://wiki.gnuarch.org/moin.cgi/Centralized_20DevelopmentThe] GNU Arch wiki site provides basic information on how to use archin a centralized way.[/url] It's easy to use GNU arch to implement a centralized repository: aparticularly simple way is togrant all developers read/write access to a shared filesystem(say secure ftp) used to create the centralized repository.The "repository" is in some sense a pseudo-user that everyone can write to.Systems hosting many project repositories that need to be protectedfrom each other will need to define users or groups (say one per project) to provide that separation.This can viewed as a minor problem (now the system administratoror a special group management tool needs to get involved whenever anew project or new developer joins a project) or a big plus(operating system controls are heavily tested and far more reliablethan application-level access controls).Once set up, there are certainly many advantages to this scheme.For example, it'soften easier to set up a shared directory than a more complex server. However, I think there are problems when using arch this way.This approach presumes that all the clients "work perfectly;"if there are manydevelopers, the odds increase that some developer is usingan older client with a bug or subtle semantic differencethat could screw up the whole repository.More importantly, it presumes that developers, and attackerswho temporarily gain developer privileges, are never malicious.Since a developer has complete unfettered read/write access toa shared repository,a malicious developer (or attacker taking the developer'scredentials) could stomp over ashared arch repository, changing supposedly unchanging datato make the repository quite different than expected.Unless there's something to counteract it, a malicious developer or attackerwith their privilegescould insert malicious code without making it clear that they inserted it,make it appear that some other developer inserted malicious code,or erase data in a way that makes it unrecoverable.Obviously, malicious developers are bad thing, but an SCM system shouldalways be able identify exactly who inserted any malicious code(in a nonrepudiable way), and protect the integrity of the SCMhistory so that changes can be easily undone (and re-checked, once you'vefound a culprit).In today's unfriendly world, where you're often working with people you don't[i] really know, protection against malicious attack is important. The recommended GNU arch setup for a central repositoryhas all users sharing a single account,so the operating system and arch haveno way to even distinguish between the users when they log in! It's possible to set up a shared directoryrepository so that users authenticate individually,and then set up a shared directory (using groups),but users can then accidentally (or intentionally) set their accesscontrol bits so that later developers won't be able toread or modify the files.So, the recommended approach has a lot of drawbacks if a clientmisbehaves, or you don't fully trust your developers,or an attacker might gain developer privileges. You can make backups and compare them with the original, whichwould at least detect malicious changes to the repository historyif they happen after the backup.Backups would also allow people to replace the malicious change withthe correct version.Note, however, that arch doesn't currently include tools to do thischecking automatically(I don't think you can use arch's mirroring capability,since the arch data itself is suspect).So, you'll have to know a lot about arch's internals to do this currently,until arch adds such tools.This approach would [i] not identify exactly who made themalicious change, even when the culprit could have been requiredto log in as a specific developer.But possibly more importantly, a malicious developer could triviallycreate a malicious change and forge it as though someone else made the change.A backup could only tell you that an addition had been made, butit can't say if the data in the addition is correct.So backups definitely help, but attackers can get around them. Another partial (but significant) counter to these problems are the new[url=http://wiki.gnuarch.org/moin.cgi/Signing_20Archivessigningarchives[/url]] capabilities added to arch 1.2.You can optionally make an archive a "signed" archive, in which thechanges are cryptographically signed.I've looked into this (my thanks toColin Walters who helped me understand details of the signature process).When enabled arch can sign MD5 hashes, which are cryptographically muchweaker than SHA-1 hashes, but that's certainly a step forwardfrom having no cryptographic signatures.Some effort is definitely requiredto set up signed archives (e.g., now you need public keys of alldevelopers), though it's a good idea for security-minded systems.The signatures sign the revision number as well as the change itself(they're both encoded in the signed tarball),so an attacker can't just change the patch orderand can't silently remove a patch and renumber the later patcheswithout detection.However, it appears to me that such signatures (at least ascurrently implemented) cannot detect themalicious substitution of whole signed patches (such asthe silent replacement of a previous security fix with a non-fix),or removal of the "latest" fix before anyone else uses it.Unlike backups, signatures can detect many problems [i] without comparingan external source (so it'll likely be faster to detect problems),and it's built-in to the tool already, which increases the likelihoodit'll be used.For many developers, backups and signing archives may be enough.However, this mechanism still doesn't expose who madecertain kinds of malicious changes (such as silent removal and replacement),in the case where the developer could have been identified. Arch-pqm (patch queue manager) is an arch extension thatcreates a central repository out of a decentralized tool.It allows developers to send their requests (such as changes) to a central location,then arch-pqm queues up those requests and has themautomatically performed.Arch-pqm first checks the GNUPG signatures of the requeststo determine if the requester is an authorized developer for thatrepository, and rejects changes by anyone else.This is closer in approach to how centralized toolslike CVS and subversion work.I've had several email conversations with arch-pqm's developer,Colin Walters,and found that arch-pqm only permits operations that protect thehistory of the repository.In particular, arch-pqm supports the star-merge operation to mergein new changes,caching, uncaching, making new categories / branches / versions,and tagging -- none of which erase the history in the repository. Thus, it currently appears to me that combining signed archives,backups, and arch-pqm will probably address my concerns.Arch-pqm prevents arbitrary developers, who have rights to the repository,from arbitrarily changing the frozen repository values.Signed archives and comparisons withbackups allow the detection and repair of malicious changes to therepository if the attackers work around or subvert arch-pqm.If a malicious developer's changes can always be recorded correctlyas theirs and undone later (by forcing them to sign their changes),and at least detected when the infrastructure can't do otherwise,then my concerns disappear.One caveat: I haven't done a detailed security analysis, and arch-pqmwasn't originally designed specifically to provide this security.For example, perhaps creating odd filenames or trying to change settings mightsubvert this protection.There may be ways to create to exploit a buffer overflow or othertechnique to subvert these checks.Still, the basic concepts seem sound, and some security analysis at leasthas a chance with this setup.Unfortunately, using arch-pqm isn't yet built into arch, and thebackup checking isn't built into arch either, so there's more thana little "rolling your own" effort to implement and use this approach.Also, the documentation doesn't lay out a simple step-by-step methodfor setting it up. I should note that currently I don't think Arch supports signingof signatures. In other words, if B accepts A's work, and C acceptsB's work (which included A's work), then I should see signaturesby A of A's work, and signatures of B indicating that they accepted A's work.To be fair, few SCM systems support that.But centralized systems have an easier time providing equivalentfunctionality; distributed systems should record more of thiskind of information, because there's no central place to get it or trust it. Note that Colin Walters is also creating a[url=http://mail.gnu.org/archive/html/gnu-arch-users/2004-01/msg01069.htmlsmart]" server" for arch named "archd" and a protocol to support the server[/url].In some ways this appears to be similar in concept to arch-pqm; itwould be a program that would automatically execute SCM commands fromauthorized users.However, archd would use a specialized protocol designedfor the purpose to transfer the data, rather than using email.It appears that it will have similar protections (it willlimit the commands that can be executed),and if that's true, the same comments would probably apply.But this would be for the future; it's not ready for use at this time. In all SCMs, if you're worried about malicious developers,you have to be careful about who can define "hooks" andthe permissions they have when they run.Whenever GNU arch runs a command,[url=http://regexps.srparish.net/tutorial-tla/using-hooks.htmlGNU] arch runs the program ~/.arch-params/hook (if it exists)[/url] to run additional actions ("hooks").In other words, the hooks are defined on aper-user basis, not per-project basis.That design has some advantages from a security point-of-view;since the hook is [i] not insidethe maintained development area (normally), editing files shouldn'ttrick the CM system into running new commands.However, that has disadvantages if there's ashared repository, because that means that the shared repositorycan't run commands to enforce some requirements(e.g., to require that there be no compiler warnings, run regression tests,announce a change via email, or require two-person authorization beforechecking in).This can also be solved by arch-pqm or a smart server, sincethe server can run the hooks on its own in its own environment.

Other OSS/FS SCM systemsBesides[url=http://www.cvshome.org/CVS[/url,[url=http://subversion.tigris.org/Subversion[/url]] (SVN),[url=http://svk.elixus.org/svk[/url,[url=http://www.gnuarch.org/GNU] arch[/url], and[url=http://www.venge.net/monotoneMonotone[/url,there] are many other OSS/FS SCM systems, such as[url=http://aegis.sourceforge.net/Aegis[/url,[url=http://www.cvsnt.org/CVSNT[/url,[url=http://abridgegame.org/darcs/Darcs[/url,[url=http://www.zedshaw.com/projects/fastcst/index.htmlFastCST[/url,[url=http://www.opencm.org/OpenCM[/url,[url=http://www.vestasys.org/Vesta[/url,[url=http://www.superversion.org/Superversion[/url,[url=http://codeville.org/Codeville[/url]], andgit/[url=http://kernel.org/pub/software/scm/cogito/Cogito[/url,[url=http://selenic.com/mercurial/Mercurial[/urlI've] already mentioned[url=http://bazaar.canonical.com/Bazaar[/url,[url=http://www.nongnu.org/arx/Arx[/url]], and[url=http://bazaar-ng.org/Bazaar-NG[/url.That's] not even a complete list! I'm not trying to completely exclude these others from consideration;I just don't have enough time to analyze them too, though for severalof them I gathered enough information to decide that I wasn'tas interested in learning more.You should certainly investigate the various alternatives before pickingan SCM system, since your desires might be different than mine.For use right now, Aegis is reported to be quite mature and wouldbe worth a look;Codeville looks like it will be ready soon and has some interestingmerging capabilities;Bazaar-NG (as I mentioned earlier) emphasizes both ease-of-use and good technology, andits corporate backing may speed its development;Darcs is really interesting for its technology. Here's some information I gathered on some of them: [list=1][][url=http://aegis.sourceforge.net/Aegis[/url.The] [url=http://better-scm.berlios.de/better] SCM initiative[/url]'sinitial information about Aegis made me decide to skip it,but perhaps that was too hurried.The better SCM initiative claimed thatAegis requires running as root, which in my mind isan unfortunate security weakness that immediately turned me off.It also reported that it was very hard it is to install,which again made me not very interested in examining it further.On the other hand, some Aegis users have since told methat Aegis is better than that review claims, so this may have been too harsh.Aegis has been around a long time (first released in 1991), and it'sbeen widely reported as being mature (with lots of functionality) and very reliable;obviously those are important attributes in an SCM system! Aegis can validate received transactions before accepting them, which isan excellent capability; on bigger systems you often don't want toaccept changes unless they pass a battery of tests in many environments.Aegis is released under the GNU GPL, the most common OSS/FS license(an advantage over some OSS/FS SCM systems such as CVS,which use odd one-off licenses that make merging functionalityfrom elsewhere more complicated).Aegis supports "both push and pull" models; it's not clear to methat it supports fully distributed development, but it appears to bemore flexible than the strictly centralized models supported by, say, CVS.Aegis' direct support of Windows is very poor, unfortunately; they say that"Most sites using Aegis and Windows together do so byrunning Aegis on the Unix systems, but building and testingon the NT systems. The work areas and repository areaccessed via Samba or NFS" (that works, but it's awkward).Aegis suports many security capabilities (see their documentation for more).I hope to take a further look at Aegis in the future; I've received someemails from happy Aegis users, and its strengths are certainly worthconsidering.[*][url=http://www.cvsnt.org/CVSNT[/url.CVSNT] is an active fork of CVS.It began life as a port of CVS to Windows NT; it now works onboth Windows and Unix-like systems.And it has since added several features beyond the original CVS,such as better handling of merges without tagging requirements,per-branch access control, support for Unicode,more efficient binary diff storage, additional server triggers,and additional protocols.But it appears that CVSNT currentlyhas some of the same limitations as the original CVS,such as not handling renaming well.If you look at this, be sure to check out other alternatives suchas Subversion.[*][url=http://www.zedshaw.com/projects/fastcst/index.htmlFastCST[/url.As] of June 2004, FastCST is an interesting project inits early stages; only time will tell if it becomesa major project or not.The author's goal is to create a "completely distributed, fast, andsecure revision control tool"but as of release 0.4 only its non-distributed parts are functional.It uses a novel delta algorithm (to minimize the size of a change),it focuses on security at every point,and tries to balance security, collaboration, and control.License: GPL.[*][url=http://www.opencm.org/OpenCM[/url.OpenCM] looks very interesting; it's paid special attention to security,which I appreciate.But there is very little evidence that OpenCM is being maintained or will bemaintained for the future.As of April 2004, it was only at version"0.1.2alpha7pl1" (a version number that doesn't inspire confidence!).Worse, that version was released 10 months earlier (on June 20, 2003).The mailing list archives show very little activity.I made a phone call to Jonathan S. Shapiroand learned that there was a small effort to"finish" a few things in OpenCM and call it a "version 1.0" release.But frankly, that doesn't bode well for future maintenance.This is too bad, because there's actually a lot of technical promise inOpenCM.OpenCM may get more support if they produce a "1.0" release.Indeed, it may just take one person to try it out and decide to run with it;there's a lot of technical merit in it.But OpenCM is hard to recommend right now unless you're willing to takethe project on.[] RCS and SCCS.RCS is a much older SCM system, as is SCCS which came before it.There is a GNU implementation of SCCS, named cssc, but GNU only recommendsit when interoperating with old SCCS data.The lock-based approach used by RCS and SCCS just doesn't work wellwith today's fast development cycles and large development groups.Some SCM systems (like Bitkeeper) use one of these as an infrastructurecomponent to build their SCM system, but at that point they're justlower-level libraries.[][url=http://www.vestasys.org/Vesta[/url.The] [url=http://better-scm.berlios.de/better] SCM initiative[/url] reviewreported that "Vesta is reported to be mature", and Vesta has beenused in many large projects.Vesta is a centralized SCM system with a built-in build system as well,and uses the older "locking style" for editing files.Vesta only supports Unix-like systems; there's no evidence at allthat it [i] could run on Windows.A major difference between Vesta and other tools is that Vesta isboth an SCM and a build tool (like make plus related dependency-computingtools).There are many advantages to this approach;"make" has many known weaknesses, and Vesta automatesmore of the build process than make does.In particular, Vesta does automatic dependency detection, so youdon't have to use a combination of other tools(like makedepend along with make) to build results.However, "make" is extremely popular and common, and that isa turnoff to some potential users.In 2004 I noted thatbecause only Vesta can be used to build Vesta,I expect that it'll be hard for it to attract new users and developers.As of April 2005 I've been told that "bowing to popular demand"they've developed a "Make-based source distribution of Vesta",which eliminates one concern that I had. Vesta uses the older, traditional method of handling SCM.It controls a central repository (so it's a centralized system likeCVS, Subversion, and Aegis), and you mustlocks files while they're being edited.Even more oddly, locking is at thegranularity of "packages" (not individual files), which in some waysappears even more constricting.Unlike some older systems, that doesn't mean you [i] can't editfiles simultaneously.Instead, when two developers need to change files in the samepackage concurrently, at least one must create a branchin the version number sequence.Locking files for editingis an old, traditional (pre-CVS) way of handling multiple edits to thesame file, and if people are essentially assigned to given files thiscan often work out okay.Old, traditional approaches aren't necessarily bad; many largesystems have been created that way, and they work find if you'reused to them.However, having to handle locks can slow down development, especiallyif there are a large number of people who might need toedit a particular file.CVS' approach that eliminated the need for locks was CVS' major achievement.Vesta's alternative solution -- creating new branches --appears to me to be a little more cumbersome than CVS'sif you have to do it a lot, [i] especially since Vestadoesn't seem to have built-in support for merging branches later.Vesta [i] does includes several features to supportgroups in geographically distributed sites to share development,in particular, there's a tool for replicating sources between repositories. Vesta is probably a reasonable choice for those who wish to usethe locking style of SCM,and its build systems appears to be much easier to use than make.If groups of files tend to be "owned" by particular individuals whoare typically the only ones who make chances to the files, Vestamay work quite well.In fact, if that's how you work, Vesta may support your approach well.However, I suspect many developers (who are used to the freedom ofmaking arbitrary changes and merging later with help from theirSCM tool) may find Vesta a little constricting.For some projects, Vesta may be a great choice; for others, it won't be. [*][url=http://codeville.org/Codeville[/url.Codeville] is a decentralized system.It has some very interesting technical ideasfor merging changes much more effectively.In particular, it has a clever way to eliminate unnecessary merge conflicts.Codeville creates an identifier for each change, andremembers the list of all changes which have been appliedto each file and the last change which modified each line in each file.When there's a conflict, it checks to see if one of the two sideshas already been applied to the other one, and if so makes the other sidewin automatically.If that doesn't work, it backs off to a CVS-like patch strategy.It also versions "spaces between the lines", for reasons they describe.Codeville is implemented in Python, which should speed development, andit's a relatively well-known language so it shouldn't have some of thechallenges of Darcs (as I'll explain below).Currently it's immature, but it's growing.[*][url=http://www.superversion.org/Supervision[/url]] (GPL).Superversion 1.2 is a single-machine, single-developer SCM system.That can be useful, for example, to allow a developer to easily backout of an approach, or to see what changed when.One nifty thing is that it has built-in support for nifty graphs showingthe relationship between versions.However, I'm primarily interested in SCM systems that handle many developers,so I didn't find this one so interesting.As of April 2005, they have an upcoming version 2 thatwill support multiple users, and thus is more interesting from my point of view.Version 2 is designed to work as a centralized serverwith clients, so it appears to be designed to support centralizeddevelopment; peer-to-peer development might be added later.It runs on at least Unix-like systems and Windows.It depends on Java; that may mean that it requires the use of theproprietary Sun JVM, which is an issue for many(for this perspective, see[url=http://www.gnu.org/philosophy/java-trap.htmlFree] But Shackled - The Java Trap[/url]).As[url=http://www.dwheeler.com/java-imp.htmlOSS/FS] Java implementationsbecome more capable[/url] this concern may go away.[] git and[url=http://kernel.org/pub/software/scm/cogito/Cogito[/url.Linus] Torvalds and other Linux kernel developersabandoned BitKeeper, and decided to write their own distributed SCM system.Linus created a low-level system called "git", with the intentionof having higher-level SCM services be built on top of it.The most popular higher-level service built specificallyto run on top of git is Petr Baudis' "Cogito" (formerly known as git-pasky).The development of Cogito and git has moved very rapidly;as of the time of this writing it's still fast-changing and notvery mature.git is specifically designed to support Linux kernel development(see [url=http://marc.theaimsgroup.com/?l=git&m=111464974422367&w=2this] email by Linus Torvalds about git's design[/url]),but it's clear it could be used by at least some others as well.The primary focus of git is performing distributed development with[i] extremely fast merging (about 1 "patch" per second) for large programs(e.g., the Linux kernel).The lower-level "git" is designed to simply store a large number ofdifferent static views of each version of a tree.It does this through the concepts of a "blob" (a versioned file),"tree" (a set of all files for a given version), and "commit"(a description of what changed between two trees).Each of these is referenced using its SHA-1 hash.It's presumed that disk space is not critical; each versioned fileis stored as a separate compressed file, and [i] not as a delta.This approach simplifies many tasks at the cost of some storage space,but this is viewed as a reasonable trade-off(there's ongoing work to add "deltification" as a localized option).It is presumed that some operations (such as identifying exactlywho last modified every given line in a file) are [i] not important;these are not implemented in the current implementation, and implementingthem given the current approach may be quite resource-intensive. Cogito does not work on Windows natively(there are reports it work on top of Cygwin), primarilybecause much of it is implemented using bash shell scripts.I strongly suspect git won't work on Windows natively.However, the underlying file structure should work just fine on Windows.Making it work on Windows mightsimply require moving the shell code to something more portable(say Python or Perl), and sincethere's relatively little code that might not take too long.It's also conceivable that a port of bash and many otherUnix tools might work too (short of Cygwin), though I know of no onewho's tried that approach. Currently git-based tools handle renamed files and directories very poorly.Changes do not get applied correctly when a file is renamed but isedited by another branch(this is in comparison to GNU Arch, Darcs, and many other systems).Torvalds has been very adamant that the git format not directlystore information about file/directory renames, because he believesit should be possible to determine such information without it.This is technically true, and is especially true ifin practice people carefullycommit before and after any rename without changing thecontents (and never move files with identical contents between commits).But the current tools don't try to handle this case, and so the resultsare very poor after renames. The git data format stores whether or not a file is executable, andof course the filenames and their data(there's actually an entire "mode", so you could store more informationif it was important to you).It does [i] not store the date/time stamp of individual files,only the date/timestamp of a commit (of an entire tree of files).Thus, very quickly date/time stamps of individual files are lost;this may not matter to you. Merges are currently implemented usingthe traditional 3-way merge algorithm.For Linux kernel development (and many others) this is actually quitesufficient.But this is known to have problems handling certain kinds of"criss-crossing" branches, so for some it will produce a lot ofunnecessary rejects (requiring hand correction) as compared to someother merging implementations.git actually stores complete copies of all past versions and how theyrelate, so it should be possible to implement alternative mergealgorithms in the future. Lots of functionality is missing from git and Cogito, though it's enoughnow to be used.One area of particular concern to me is that while tags can be signed,ordinary commits (even if exchanged between people) are not cryptographicallysigned.You want cryptographic signatures of commits, and have them stored inthe database, so that they can be checked later on.In particular, this sort of precaution helps prevent counter manykinds of attacks if (when) attackers take over a repository. Other SCM prototypes have been built on git,and various interfaces have been developed to other SCMs(in particular, there's a prototype git-to-Darcs interface, and GNU Arch'sTom Lord announced he was planning to switch to the git format thoughit's not clear that will really occur).Since git is low-level, it's probably best to start by using Cognitorather than the low-level git at first. A web interface to git repositories has been created; so you can seeexamples of git results by examining the[url=http://www.kernel.org/git/kernel.org] git repository[/url].The mailing list is helpful, but there's a [i] vast amount of traffic onit;[url=http://kerneltraffic.org/git/Zack] Brown's "git traffic"[/url] has lots of info on git and Cogito. [*][url=http://selenic.com/mercurial/Mercurial[/urlMercurial] (whose commands begin with "hg") is a small SCM that's an offshoot from git and Cogito.git's low-level functions store whole files (compressed).Mercurial, instead, is designed to store files as changes.This makes tasks like identifying who did what, and when a givenfile was changed, simpler to do.It's a small Python program, and lacking some functions compared to othersat this time, but it's an interesting development.[/list][url=http://abridgegame.org/darcs/Darcs[/url,in] particular, is very interesting for its technology.From what I've seen, darcs is currently more of a prototypeof some very innovative ideas for SCM, and maybe a tool for smallerprojects, rather than a useful toolfor large projects, though it can be used.Darcs is written in Haskell, which is both a strength and a weakness.Haskell is a high-level functional programming language,which probably helped the developer concentrate on abstract concepts.However, while Haskell is intriguing, in my experienceprograms written in it are generally slow, and possibly worse, itsperformance is unpredictable([url=http://supybot.com/Members/jemfinch/haskell-sucks/document_viewjemfinch] expresses somewhat similar concerns[/url]).Some have argued to me that Haskell isn't necessarily slow today, andmaybe that's true, but darcs' developer admits thatdarcs has poor performance (which would cause trouble as a project gets large).In March 2004 the darcs developer said performancehas gotten much better, so perhaps that's no longer a serious problem.However, since few developers truly grok functional programming,darcs is less likely to get other developers to help extend it.It [i] does get contributions -- a few minor contributions byothers have been reported to me -- but they're nothing compared to thescale of work by others in Subversion or GNU Arch.In March 2004 Darcs' website stated thatit does not have an "abundance of features"and its "core may be still be buggy" -- not exactly the words you wantto hear when you let a program control your source code! The main developer does say that the website is out of date,that the program is no longer buggy, and that it supports more thanbasics (though it is still missing some features). Darcs does have some innovative approaches, though, and perhaps darcswill leap past everyone else, or at least perhaps some of itsideas may slip into other SCM systems.For example,[url=http://www.abridgegame.org/pipermail/darcs-users/2003/000146.htmldarcs] can keep track of inter-patch dependencies[/url] so that bringing in just one patch can bring in "just the others needed",a clever capability not supported by other tools like GNU Arch.It is completely patch-oriented, and requires user input to helpcharacterize exactly what changed.For example, it understands a "token replace patch", whichmakes it possible to create a patch which changes every instanceof the variable stupidly_named_var'' withbetter_var_name'',while leaving other_stupidly_named_var'' untouched.As the author says,"When this patch is merged with any other patch involvingthestupidly_named_var'', that instance will also be modifiedto ``better_var_name''. This is in contrast to a more conventionalmerging method which would not only fail to change new instancesof the variable, but would also involve conflicts when mergingwith any patch that modifies lines containing the variable.By more using additional information about the programmer's intent,darcs is thus able to make the process of changing a variable namethe trivial task that it really is..."The advantage is that merge conflicts can suddenly disappear, or atleast be far less likely, because the system has more information to workwith.The disadvantage is that this requires more interaction with the developer,who already has a complicated problem.Whether or not this approach will catch on is to be seen; I doubt it,myself, since systems which don't have it seem to be acceptable tomost developers.But I can definitely see how that additional information could make anSCM system more powerful. Other Reviews of SCM or OSS/FS SCM SystemsThere are many other SCM comparisons available.The [url=http://better-scm.berlios.de/better] SCM initiative[/url] was established to encourage improved OSS/FS SCM systems, bydiscussing and comparing them.Among other things, see their[url=http://better-scm.berlios.de/comparison/comparison.htmlcomparison] file[/url].[url=http://zooko.com/revision_control_quick_ref.htmlThe] website[/url][url=http://revctrl.org/revctrl.org[/url]] is a nicestarting point for comparing alternatives.Zooko has written a short review of OSS/FS SCM tools.[url=http://www.onlamp.com/pub/a/onlamp/2004/01/29/scm_overview.htmlShlomi] Fish's OnLamp.com article compares various CM systems[/url] as does his[url=http://better-scm.berlios.de/docs/shlomif-evolution.htmlEvolution] of a Revision Control User[/url].The arch folks have developed[url=http://wiki.gnuarch.org/moin.cgi/SubVersionAndCvsComparisona] comparison of arch with Subversion and CVS/url.Another pro-arch discussion is[url=http://web.verbum.org/blog/freesoftware/distributed-futureWhy] the Future is Distributed[/url].A pro-subversion discussion is available at[url=http://www.red-bean.com/sussman/svn-anti-fud.htmlDispelling] Subversion FUD[/url].[url=http://developers.slashdot.org/article.pl?sid=04/02/22/2344228&mode=threadSlashdot] had a discussion when Subversion 1.0 was announced.[/url][url=http://www.kerneltraffic.org/kernel-traffic/kt20030323_210.txtKernel] traffic posted a summary of a technical discussion about BitKeeper.[/url][url=http://www.cmcrossroads.com/bradapp/links/scm-links.htmlBrad] Appleton has collected lots of interesting SCM links[/url].[url=http://supybot.com/Members/jemfinch/vcs/jemfinch] has someinteresting essays about SCMs/url, including whyhe thinks the approach to branches used by Darcs, Arch,and Bazaar-ng is a poor one.[url=http://linuxmafia.com/faq/Apps/scm.htmlA] brief overview of SCM systems that can run on Linuxis available[/url].[url=http://changelog.complete.org/posts/528-Whose-Distributed-VCS-Is-The-Most-Distributed.htmlWhose] Distributed VCS Is The Most Distributed?[/url] discusses distributed VCSs.[url=http://weblogs.mozillazine.org/preed/2006/11/version_control_system_shootou.htmlVersion] Control System Shootout Redux (Mozilla - Mortal Kombat[/url] describes Mozilla's decision process, which is really amusing because heuses Mortal Kombat images to describe the "shootout". I've not discussed highly related issues like bug tracking(such as Bugzilla); that's outside the scope of this paper.

BitMover's BitKeeperThere are many proprietary SCM systems, such asBitKeeper, Perforce, and Rational ClearCase, but since they aren't OSS/FSthey're really outside the scope of this paper.However, I can't completely omit discussing BitKeeper entirely,because the Linux kernel developers' use of BitKeeper demonstrated howdistributed SCM can work, and BitKeeper's association withthis well-known OSS/FS project makes it hard to ignore.Besides, the case of BitMover's BitKeeper is especially interesting,in part because it's very controversial. BitKeeper is a proprietary SCM system that supports distributed SCM.Even though BitKeeper is proprietary, Linus Torvalds decided to useit to maintain the OSS/FS Linux kernel.The bargain was that the OSS/FS kernel developers got to use (for free) a good SCM tool; the proprietary vendor got a great deal of free publicityand many helpful insights from highly intelligent users.The no-cost BitKeeper required that source code being maintained becopied to the vendor; since few commercial developers wanted to do that,they were generally willing to buy the commercial license withoutthat condition.The no-cost BitKeeper also forbid users to work on competing projects; indeed,there are reports that[url=http://lwn.net/Articles/103727/even] purchasers of the for-pay product wereforbidden to work on competing projects[/url]. Some, such as Torvalds, found these conditions acceptable.Others did not believe using a proprietary SCM system was acceptable forworking on an OSS/FS system(e.g., [url=http://kerneltrap.org/node/204Richard] Stallman'sbelieved this was fundamentally unacceptable[/url]).Others were concerned about the risks of dependingon a single vendor with a proprietary format(what if the vendor changed their policies later?), ordid not find the "cannot develop competing products" condition acceptable(this condition is very unusual and is clearly an attempt toprevent competition).BitMover released [url=http://www.bitkeeper.com/press/2005-03-17.htmla] no-cost source-available client for Bitkeeper[/url] that allows people toextract current versions of data (programs) from BitKeeper repositories;it's not clear that this client is OSS/FS, and it has limited functionality,but it may be sufficient for some purposes. In April 2005 things came to a head.Torvalds' employer (OSDL) also paid money to someone else,who on their own free time (not paid for by OSDL) was working on a competing product.BitMover's Larry McVoy complained that even this was unacceptable.After examining the difficulty of trying to keep competing interests compatible,[url=http://lwn.net/Articles/130681/Torvalds] decided he would have to switch to a different SCM program[/url].The article[url=http://kerneltrap.org/node/4966No] More Free BitKeeper[/url] gives the vendor's (BitMover's) side of the story.There's reason to hope that this decision will greatlyincrease the speed of development of an OSS/FS distributed SCM tool;the licensing constraints of BitKeeper made it very difficult for someexcellent developers to work with competing OSS/FS SCM systems, andwith that constraint gone it's likely that development of some of themwill accelerate. ConclusionsThe world of OSS/FS SCM systems is a better place than it was a fewyears ago;there are now several viable options.CVS, while it has its weaknesses, is still a workhorse able to dothe basic job.Subversion is ready today for those who just want a better CVS fora centralized SCM system, and it'sprobably the most common SCM choice todayfor those who want a centralized OSS/FS SCM system that's a little betterthan the aging CVS.There are other reasonable choices, too; Aegis seems to have a lotgoing for it too, and I've had several reports that it's mature, sofor large projects that would be a system worth examining.But there are lots of other options, and it's goingto be interesting to watch what happens in the future.A [i] lot of people want a distributed SCM system;the Linux kernel developers have shown that distributed SCM canbe [i] extremely effective through their use of BitKeeper.In distributed SCM systems, the field is currently crowded, withmany people having developed early stages with significantly differentapproaches to the problem.GNU Arch is extremely capable if you're willing to work with theissues listed above (and I think it [i] will get better), thoughit hasn't made as much progress in 2004 and 2005 as it should have, andthus it may lose its early momentum to other OSS/FS competitors.Monotone, CodeVille, and Bazaar-NG in particular look like potentiallystrong contenders at the moment to me.I really like a lot of things about Bazaar-NG, though it'sless mature and it remains to be seen if its promising start will resultin a winning product. In the end, the best approach is to look at your options, winnow down toa short list, and then try each of those top contenders.I hope you've found this brief tour helpful. [url=http://www.dwheeler.com/essays/scm-security.htmlFeel] free to also look at my paperon SCM security[/url], orsee my home page at[url=http://www.dwheeler.com/http://www.dwheeler.com[/url]].

http://www.dwheeler.com/essays/scm.html

暂无回复。
需要 登录 后方可回复。