Software Configuration Management (SCM) SecurityDavid A. WheelerMarch 13, 2004; Revised May 6, 2005IntroductionSoftware development is often supported by specialized programs called"Software Configuration Management" (SCM) tools.SCM tools often control can read and modify the source code of a program,keep history information (so that people can find out what changedbetween versions, and who changed them), and generallyhelp developers work together to improve a program under development. Problem is, the people who develop SCM tools often don't think aboutwhat kind of security requirements they need to support.This mini-paper describes briefly the kinds of security requirementsan SCM tool should support.Not every project may need everything, but it's easy to not noticesome important requirements if you don't think about them.There are two basic types of SCM tools, "centralized" and "distributed";the basic security needs are the same, but how these needs can behandled are different between the two different types.I'm primarily concentrating on basic SCM tools (like CVS, Subversion,GNU Arch, Bitkeeper, Perforce, and so on).Clearly related tools include build tools, automated (regression) test tools,bug tracking tools, static analysis tools, process automation tools,software development tools (such as editors, compilers, and IDEs), and so on. The Security BasicsFundamentally, there are some basic (potential) security requirementsthat any system needs to consider.These are: [list=1] confidentiality: are only those who should be able toread information able to do so? integrity: are only those who should be able to write/changeinformation able to do so?This includes not only limiting access rights for writing, but also protecting against repository corruption (unintentional or malicious).Changesets must be made atomically;if 3 files change in a changeset, either all or none should be committed. availability: is the system available to those who need it?(I.E., is it resistant to denial-of-service attacks?) identification/authentication: does the system safely authenticateits users? If it uses tokens (like passwords), are they protected whenstored and while being sent over a network, or are they exposed as cleartext? audit: Are actions recorded? non-repudiation: Can the system "prove" that a certain user/keydid an action later? In particular, given an arbitrary line of code,can it prove who was the individual that made that change and when?Can it show all those who approved/accepted it, as a path? self-protection: Does the system protect itself, and can itsown data (like timestamps, changesets, other data) be trusted? trusted paths: Can the system make sure that its communication withusers is protected? resilience to security algorithm failures:If a given security algorithm fails(such as the hash function or encryption), can the algorithmbe easily replaced to protect past and future data?(Added 2005-03-02, after the revelation of serious problems in SHA-1). privacy: Is the system designed so it's not possible to retrieveinformation that users want to protect?For example, spamming is a serious problem; it may be desirable toNOT record real email addresses, at least in some circumstances.If there is a "secret branch" where security patches are located, tryto not store its location in the dataset.This is similar to confidentiality, but you might not eventrust an administrator... the notion is to NOT store or depend on datayou don't want spread.[/list] An SCM has several assets to protect.It needs to protect "current" versions of software, but it must domuch more.It needs to make sure that it can recall any previous version of software,correctly, as well as the audit trail of exactly who made which changeand when.In particular, an SCM has to keep the history [i] immutable -once a change is made, it needs to stay recorded.You can undo the change, but the undoing needs to be recordedseparately.Very old history may need to be removed and archived, but that'sdifferent than simply allowing history to be deleted. The ThreatsOkay, so what are the potential threats?These vary, and not all projects will worry about all threats.Nevertheless, it's easier to provide a list of threats and thecounter-measures an SCM should support. Individual projects may choose to not employ a given counter-measure,since they may decide that's not a threat for them.For example, open source software (OSS) projects may decide that there'sno "threat" of unauthorized reading of software, since the code is opento reading by all.However, that may not always be true - many OSS projects hide changesthat reveal security vulnerabilities until the new version is ready fordeployment.Thus, it's difficult to make simple statements like "projects of type Xnever need to worry about threat Y".Instead, it's simpler to list some potential threats, and thenprojects can decide which ones apply to them(and configure their SCM system to counter them). Outsiders without privilegesAn outsider (not a developer or administrator) may try to read or modify assets (software source code or history information) when they're not authorized to do so.SCM systems should support authorization (like login systems),and support a definition of what unauthorized users can do.An SCM system should support configurations thatallow anonymous reading of a projectand/or its history, since there are many cases where that's useful.However, SCMs should also support forbidding anonymous read access.That's even true for OSS projects, since as I noted above, sometimesOSS projects want to hide security fixes until they're ready for deployment. Normally unauthorized users shouldn't be allowed tomodify a source repository, soan SCM should support that (and should make that the default).In rare cases, it's possible to imagine that even this constraint isn'ttrue, especially if the SCM tool is designed to be used for resourcesother than source code.Most Wiki systems such as [url=http://www.wikipedia.org/Wikipedia[/urlallow] anonymous changes; they work instead by protectingthe [i] history of changes so that everyone will know exactly what'schanged, instead of preventing writing of the primary data.Such approaches are rare for software code; for example,the Wikipedia software itself (as stored in its trusted repository) can only be changed by a few privileged developers.However, it is conceivable that software documentation and code wouldbe maintained by the same SCM software, and perhaps a few projectswould allow anyone to update the documentation as long as all changeswere tracked and could be easily reversed. The underlying identification and authentication system(the login system) can use intrusion detection systems todetect likely attempts to forge privileges (e.g., by detectingpassword guessing attacks, or detecting improbable locations of a login).The underlying login system could also support enabling limits(e.g., delays after X login attempts, or only permitting logins fromcertain Internet Protocol address ranges for certain developers).However, these mechanisms need to not create a denial-of-service attack;otherwise, an attacker might try to forge logins not to actually log in,but to prevent legitimate users from doing so. Non-malicious developers with privilegesAn SCM system should support protected logins (e.g., if it uses passwords,it should protect passwords during transit and while they're stored).Once users are authenticated, an SCM system should be able tolimit what users can do based on the authorization that's implied. SCM systems could usefully limit reading to particular projects, say.Limiting reading of specific files inside a project can be useful, butit often isn't as useful inside a branch developers must accessbecause developers often need the entire set offiles to develop (e.g., to recompile something).But limiting who can read changes in certain branchescould be vital for some projects.For example, it is common for security vulnerabilities to be reported toa smaller group of people than the entire development staff, and for thepatch to be developed by specially trusted developerswithout full knowledge of all developers.This is particularly true for open source software projects, but it'salso sometimes true for other projects.This kind of functionality can also be important for projects such asmilitary projects with varying degrees of confidentiality;most of the program may be "unclassified", but with a poor or stubbedalgorithm; there may be a better classified algorithm, but it will needto be maintained separately.Ideally, the SCM should be trustworthy enough to protect that data, thoughin practice such trust is rarely granted;an SCM should instead gracefully handle importing the"unclassified" version and automatically merging the "classified" dataon equipment trusted to do so. Limiting writing of specific files inside a project can be much moreuseful, since in some projects some users "own" certain files.In many situations it doesn't make sense either, but an SCMsystem should still support limiting which developerscan make which changes. Malicious developers with privileges (and attackers with their credentials) An area often forgotten by SCM systems is handling [i] malicious developers.You know, the ones who intentionally insert Trojan horses into programs.Denying they exist doesn't help; they [i] do exist.And even if they didn't, there's no easy way for an SCM to tell thedifference between an authorized malicious developer andan attacker who's acquired an authorized developer's credentials. A malicious developer might even try to make it appear that some [i] other developer has done a malicious deed (or at least make it untraceable).They can use their existing privileges to try to gain more privileges.A malicious developer might try to modify the data used by a CMsystem so that it looks like someone else made the change(e.g., provide someone else's name in a ChangeLog entry).A malicious developer might try to modify a CM "hook" to make it appearthat some other developer has inserted malicious code(perhaps to avoid blame or frame the other developer).A malicious developer might modify the build process, e.g., so thatwhen another developer builds the software, the build system attempts tosteal credentials or harm the developer. Since developers have the privileges to read and change data,malicious developers (and attackers with their credentials) areharder to counter.But there [i] are counter-measures that can be used against them.Here are some reasonable measures: [list=1] Make sure that developers can't corrupt the repository.As a counter-example, GNU Arch allows developers to share awriteable directory as a repository.That's very convenient, but if you're worried about malicious developers,that's not enough; a malicious developer could easily remove data orcorrupt it in such a way that it'd be hard to tell who caused the problem(there's current effort to create an "archd" server that would probablycounter this problem). Make sure that all developer actions are logged in anon-repudiable, immutable way.That way, even if someone makes a change, it's easy to see whomade what changes, in any time in the future.That "someone" may be a malicious developer, or an attackerwith the credentials (e.g., cryptographic keys) of a developer -- butin either case, once you find out who did a malicious act, the SCMshould make it easy to identify all of their actions.In short, if you make it easy to catch someone, you increase theattackers' risk... and that means the attacker is less likely to do it.In practice, this can be done by requiring that all changes becryptographically signed by each developer (at the [i] developer's side).You can't just make people log into a central server and trust itsresults, or have a central server sign everything; if the serveris subverted you won't be able to trust any of that data.Immutable backups also help; that way, if history is changed, it canbe easily detected.Implied here is that there is a relatively easy way to undo changesin a later version;after all, if it's easy to identify exactly what a developer did, thosechanges can be undone. Make sure all developer actions can be easily reviewed later.A simple action to show exactly what's been changed recently willmake it easy for new changes to be reviewed - and possibly set off alarms. Have tools to record and/or require others' review.If you really want to make sure that malicious code doesn't getthrough, the best method known is to make sure that some other person(who is unlikely to be colluding) reviews the code.Thus, ways to cryptographically sign that a person reviewed anothers'changes can be helpful, as long as the reviewer's signature can't beforged, and as long as the signature clearly indicates what was reviewed.A review could be at a brief "I briefly scanned for malicious code"all the way to "I deeply analyzed every line for correctness",so the SCM tool should support recording to what level the review occurred too.Note that the Linux kernel development process now has people addinga "Signed-off-by:" tag to each changeset;this is primarily for licensing reasons, but still, it's helpful inidentifying all the other parties wholooked it at so that how the change got there (but note that inSigned-off-by, a person may or may not change it further). Support automated checking before acceptance, includingdetection of suspicious/malicious changes.An SCM system should make it possible to enforce certain rules beforeaccepting a change (at some level):such as enforcing formatting rules, requiring a clean compile, and/orrequiring a clean run of a regression test suite (in a suitablyprotected environment).It should be possible to watch changes to find "suspicious" changes:the first time that developer has modified a given file,code that looks a like a Trojan horse,formatting/naming style that's significantly different than thisdeveloper's normal material,attempts to send email or other network traffic duringa code build, and so on.This is basically intrusion detection at the code change level.It should also be possible for an automated process to quicklycheck for hints of "stolen" code before accepting anything(e.g., to detect copyright-encumbered code), by calling to programs suchas Eric S. Raymond's[url=http://www.catb.org/%7Eesr/comparatorcomparator[/url.] Support authentication/cryptographic signature key changes and re-signing.No matter what protection is put in place, a developer'ssecrets (e.g., their login passwords or private keys) may beacquired by an attacker.Thus, an SCM (along with its support environment) need to supportchanging such secrets.In particular, it may be useful to "cycle" developer private keys,having developers switch to new private keys, ensuring that theold keys will not be accepted for newer changes, and possibly destroyingall copies of the older private keys (so that they cannot be stolen by anyone).Since private keys may be compromised, once such a compromise has beendetected, it should be possible to invalidate the compromised keysand re-sign data (once it's checked) with new cryptographic keys.This is yet another reason to support multiple signature keys(in addition to supporting multi-person review). On login, acquisition, and commit, report the "last time" andsource location (e.g., IP address) where readingand writing (committing) were performed.Although this doesn't deal with a malicious developer, it doesincrease the likelihood that an attack using stolen credentials willbe detected.After all, the developer is mostly likely to know the last time that theyread from and wrote to some repository, so they'll be able todetect when someone else forges their identity.Ideally, this would be resistant to repository attacks.[/list] On April 11, 2004, Dr. Carsten Bormann from the University of Bremensent me an email about a specialized attack that he terms the"encumbrance pollution attack".In an encumberance pollution attack, the attacker inserts materialthat cannot be legally included.To understand it, firstimagine an SCM with perfectly indestructible history.The attacker steals developer credentials, or is himself a maliciousdeveloper, and checks in a change that contains some encumbered material."Encumbered" material is simply material which cannot be legally included.Examples include child pornography, slanderous/libelous statements, orcode which has copyright or patent encumberances.This could be very advantageous, for example, a companymight hire a malicious developer to insert that company's code into acompeting product, and then sue the competitor for copyright infringement,knowing that their SCM system "can't" undo the problem.Or a lazy programmer might copy code that they have no right to copy(this is rare in open source software projects, because every line of codeand who provided it is a matter of public record, but it proprietaryprojects do have this risk).Any SCM can record a change that essentially undoes a previous change,but if the history is indestructable and viewable by all, thenyou can't get rid of the history.This makes your SCM archive irrevocably encumbered.This can especially be a problem if the SCM is indestructably recordingproposals by outsiders! An SCM system could be designed so that a special privilege allowed someoneto completely deletion the history data of illegal changes, of course.However, if there are special privileges to delete history data,it might be possible to misuse those privileges to cause other problems. One mechanism for dealing with an encumberance pollution attackis to allow specially-privilegedaccounts to "mask" history elements; i.e., preventing access tocertain material by normal developers so that it's no longer available,so that the material isn't included in later versions(essentially it work like an "undo" against that change).However, a "mask" would still record the event in some wayso that it would be possible to prove that the event occurred at a later time.Perhaps the system could record a hash of the encumbered change,allowing the encumbered material to be removed from the normalrepository yet proving that, at one time, the material was included.A "masking" should include a cryptographic signature of whoeverdid the masking.This mechanism in particular requires careful design, because themechanism should be design so that it doesn't permit other attacks. Most SCM systems have multiple components, say, a client and server.Even GNU arch, which can use a simple secure ftp server as a sharedrepository, has a possible server (the ftp server).Clients and servers should resist attack from other potentially subvertedcomponents, including loss of SCM data. Repository attacksMany repositories have themselves undergone attack,including the Linux CVS mirror, Savannah, Debian, andMicrosoft (attackers have acquired, at least twice, significant portionsof Windows' code).Thus, a good SCM should be able to resist attack, even when therepository it's running on subverted(through malicious administrators of a repository, attackerroot control over a repository, and so on).This isn't just limited to centralized SCM systems;distributed SCM systems still have the problem that an attacker maytake over the system used to distribute someone's changes. An SCM should be able to prevent read access, even if the repositoryis attacked.The obvious way to do this is by using encrypted archives.But there are many variations on this theme, primarily in where thekey(s) are stored for decryption.If the real problem is just to make sure that backup media or transferdisks aren't easily read, the key could simply be stored on a separate(more protected) media.The archive keys might only be stored in RAM, and required on bootup;this is more annoying for bootup, and an attacker is likely to be ableto acquire the data anyway.The repository might not normally have the keys necessary to decrypt thearchive contents at all;it could require the developer to provide those keys, which it usesand then destroys.This is harder to attack, but a determined adversary couldsubvert the repository program (or memory) and get the key.Another alternative is to arrange for therepository to [i] not have the keys necessary to decrypt thearchive contents at any time.In this case, developers must somehow be provided with the keys necessary todo the decryption, and essentially the repository doesn't really"know" the contents of the files it's managing! Preventing write access when an attacker controls a repositoryis a difficult challenge,especially since you still want to permit legitimate changes bynormal developers.Since the attacker can modify arbitrary files in this case,the goal is to be able to quickly detect any such changes: [list=1][*] Cryptographic signing of changes can help significantly here, sincethis makes it possible to detect changes by anyone other than theauthorized developers.Clearly, the list of public keys needs to be protected; this can beprotected in part by ensuring that the list is visible to all developers,and having tools automatically check that the public listed key is correct(each developer's tool checks that the key listed is really thatdeveloper's key). Changeset chaining can help detect problems (including unintentional ones).Basically, as changes are made, a chain recording those changes can berecorded and later checked.This is typically done using cryptographic hashes, possibly signedso you know who verified the chain.Note that this is also useful for detecting accidental corruption. Automated tools to detect if "my" change has been altered.Any given developer will know what changes [i] they checked in.So, record that information locally/separately, and check it later.That way, someone can modify the repository to remove the latestsecurity fix, but the developer of the change can quickly tell thatit's been removed. Immutable backups, and tools to check them, can help as well.If a repository's history is changed, that change can be comparedwith backups.Be careful that a corrupted tool won't create misleading backups, andmake sure that the repository can't give one view to backup tools, andanother view to whoever actually takes and uses the program. Simple, transparent formats can help make it harder to hide attacks.Data that is stored in simple, well-understood formats that can be analyzedindependently (e.g., a signed tarfile of patches) tend to be more resistant toattack than data structures that presume that no other processwill manipulate the data contents (e.g., typical databases).[/list] Related WorkThere only seems to be a little related work available on the topic.Lynzi Ziegenhagen wrote a Master's thesis for theNaval Postgraduate School about revision control tools for"high assurance" (a.k.a. secure) software development projects:[url=http://library.nps.navy.mil/uhtbin/hyperion-image/03Jun_Ziegenhagen.pdfi] Evaluating][ Configuration Management Tools forHigh Assurance Software Development Projects /url.A [url=http://www.opencm.org/ziegenhagen-response.htmlcommentaryon] that paper[/url] is also available.The[url=http://www.opencm.org/docs.htmlOpenCM] project[/url] has published some papers, includingJonathan S. Shapiro and John Vanderburgh's[url=http://www.opencm.org/papers/usenix-sec2002.pdfAccess] and Integrity Control in a Public-Access, High-AssuranceConfiguration Management System/url. Another related paper is[url=http://theses.nps.navy.mil/04Mar_Gross.pdfConfiguration]" Management Evaluation Guidance for High Robustness Systems"[/url] by Michael E. Gross (Lieutenant, United States Navy),March 2004. The Trusted Software Methodology included a number of configurationmanagement requirements; in particular, its upper level requirementswere specifically designed to counter malicious developers.See "Trusted Software Methodology Report" (TSM), CDRL A075, July 2, 1993,and in particular its appendix A (which defines the trust principles).The Common Criteria includes a number of configuration management requirements(see in particular part 3 in the ACM section). [url=http://www.cs.ucdavis.edu/%7Edevanbu/files/tcm.pdfSecurity] for Automated, Distributed Configuration Management[/url] by Devanbu, Gertz, and Stubblebine examine a completelydifferent problem (one which is important, but not the one in view here). There is a vast amount of literature about SCM systems,as well as papers discussing or evaluating particular systems.That includes my own[url=http://www.dwheeler.com/essays/scm.htmlComments] on OSS/FS Software Configuration Management (SCM) Systems[/url]. ConclusionsAll of this can't prevent all attacks.But such an SCM system can make the attacks much harder to perform,more likely to be detected, and make detection much more rapid.Here are some examples: [list=1][*] A malicious developer could insert a few lines into a buildprocess that said "when you compile, email to me your private key data" -then, once they had the private key, remove that line, and thenforge other changes as that unsuspecting developer.But an SCM system with all of the capabilities above would make itmuch harder to hide this.The change with these malicious instructionswould be clearly labelled as from that developer, andlater changes would be labelled as being from that developer orone of the compromised systems - and removing the change laterwould record yet [i] another change that might be detected. A malicious attacker might take over the repository, and repeatedlyremove a critical patch to a security vulnerability.Still, the removal could be detected by the creator of the patch, andactions such as changing to a different repository could be performed.Trying to change older copies would likely be detected by chaining andcomparisons with backups.[/list] It's my hope that SCM systems will have more of these capabilitiesin the future.I'm happy to note that some SCM developers have considered these issues.[url=http://aegis.sourceforge.net/propaganda/security.htmlAegishas] a nice side-by-side comparison[/url] comparing a version of this paperwith Aegis' capabilities.[url=http://bazaar-ng.org/security.htmlBazaar-NG] has considered these security ideas[/url].Hopefully others will consider these issues too. Feel free to see my home page at[url=http://www.dwheeler.com/http://www.dwheeler.com[/url]].