kmote asks: I am newly employed as the sole "SW Engineer" in a fairly small shop of scientists who have spent the last 10-20 years cobbling together a vast code base. (It was written in a virtually obsolete language: G2—think Pascal with graphics). The program itself is a physical model of a complex chemical processing plant; the team that wrote it has incredibly deep domain knowledge but little or no formal training in programming fundamentals. They've recently learned some hard lessons about the consequences of nonexistent configuration management. Their maintenance efforts are also greatly hampered by the vast accumulation of undocumented "sludge" in the code itself. I will spare you the "politics" of the situation (there's always politics!), but suffice it to say, there is not a consensus of opinion about what is needed for the path ahead.
They have asked me to begin presenting to the team some of the principles of modern software development. They want me to introduce some of the industry-standard practices and strategies regarding coding conventions, lifecycle management, high-level design patterns, and source control. Frankly, it's a fairly daunting task and I'm not sure where to begin.
Initially, I'm inclined to tutor them in some of the central concepts of The Pragmatic Programmer, or Fowler's Refactoring ("Code Smells", etc). I also hope to introduce a number of Agile methodologies. But ultimately, to be effective, I think I'm going to need to hone in on 5-7 core fundamentals; in other words, what are the most important principles or practices that they can realistically start implementing that will give them the most "bang for the buck."
So that's my question: What would you include in your list of the most effective strategies to help straighten out the spaghetti (and prevent it in the future)? Related: When is code "legacy"?
Answer: The (very) long answer (237 Votes)
haylem replies:
Foreword
This is a daunting task indeed, and there's a lot of ground to cover. So I'm humbly suggesting this as somewhat comprehensive guide for your team, with pointers to appropriate tools and educational material. Remember: These are guidelines, and as such are meant to be adopted, adapted, or dropped based on circumstances. Beware: Dumping all this on a team at once would most likely fail. You should try to cherry-pick elements that would give you the best bang-for-sweat, and introduce them slowly, one at a time. Note: Not all of this applies directly to Visual Programming Systems like G2. For more specific details on how to deal with these, see the Addendum section at the end.
Executive summary for the impatient
Define a rigid project structure, with: project templates coding conventions familiar build systems and sets of usage guidelines for your infrastructure and tools. Install a good SCM and make sure they know how to use it. Point them to good IDEs for their technology, and make sure they know how to use them. Implement code quality checkers and automatic reporting in the build system. Couple the build system to continuous integration and continuous inspection systems. With the help of the above, identify code quality "hotspots" and refactor. Now for the long version... Caution, brace yourselves!
Rigidity is (often) good
This is a controversial opinion, as rigidity is often seen as a force working against you. It's true for some phases of some projects. But once you see it as a structural support, a framework that takes away the guesswork, it greatly reduces the amount of wasted time and effort. Make it work for you, not against you. Rigidity = Process / Procedure. Software development needs good processes and procedures for exactly the same reasons that chemical plants or factories have manuals, procedures, drills, and emergency guidelines: preventing bad outcomes, increasing predictability, maximizing productivity... Rigidity comes in moderation, though!!
Rigidity of the project structure If each project comes with its own structure, you (and newcomers) are lost and need to pick up from scratch every time you open them. You don't want this in a professional software shop, and you don't want this in a lab either.
Rigidity of the build systems If each project looks different, there's a good chance they also build differently. A build shouldn't require too much research or too much guesswork. You want to be able to do the canonical thing and not need to worry about specifics: configure; make install, ant, mvn install, etc. Re-using the same build system and making it evolve over the time also ensures a consistent level of quality. You do need a quick READMEto point to the project's specifics, and gracefully guide any user/developer/researcher. This also greatly facilitates other parts of your build infrastructure, namely:
Continuous integration Continuous inspection So keep your build (like your projects) up to date, but make it stricter over time, and more efficient at reporting violations and bad practices. Do not reinvent the wheel, and reuse what you have already done. Recommended Reading:
Continuous Integration: Improving Software Quality and Reducing Risk (Duval, Matyas, Glover, 2007) Continuous Delivery: Release Software Releases through Build, Test and Deployment Automation (Humble, Farley, 2010) Rigidity in the choice of programming languages You can't expect, especially in a research environment, to have all teams (and even less all developers) use the same language and technology stack. However, you can identify a set of "officially supported" tools and encourage their use. The rest, without a good rationale, shouldn't be permitted (beyond prototyping). Keep your tech stack simple, and the maintenance and breadth of required skills to a bare minimum: a strong core.
Rigidity of the coding conventions and guidelines Coding conventions and guidelines are what allow you to develop both an identity as a team, and a shared lingo. You don't want to err into terra incognitaevery time you open a source file. Nonsensical rules that make life harder or forbid actions explicitly to the extent that commits are refused based on single simple violations are a burden. However:
a well thought-out ground ruleset takes away a lot of the whining and thinking: nobody should break under any circumstances a set of recommended rules provide additional guidance Personal Approach: I am aggressive when it comes to coding conventions, because I do believe in having a lingua franca for my team. When crap code gets checked-in, it stands out like a cold sore on the face of a Hollywood star: it triggers a review and an action automatically. In fact, I've sometimes gone as far as to advocate the use of pre-commit hooks to reject non-conforming commits. As mentioned, it shouldn't be overly crazy and get in the way of productivity: it should drive it. Introduce these slowly, especially at the beginning. But it's way preferable over spending so much time fixing faulty code that you can't work on real issues. Some languages even enforce this by design:
Java was meant to reduce the amount of dull crap you can write with it (though no doubt many manage to do it). Python's block structure by indentation is another idea in this sense. Go, with its gofmt tool, which completely takes away any debate and effort (and ego!!) inherent to style: run gofmt before you commit. Make sure that code rot cannot slip through. Code conventions, continuous integration and continuous inspection, pair programming, and code reviews are your arsenal against this demon. Plus, as you'll see below, code is documentation, and that's another area where conventions encourage readability and clarity.
Rigidity of the documentation Documentation goes hand in hand with code. Code itself is documentation. But there must be clear-cut instructions on how to build, use, and maintain things. Using a single point of control for documentation (like a WikiWiki or DMS) is a good thing. Create spaces for projects, spaces for more random banter and experimentation. Have all spaces reuse common rules and conventions. Try to make it part of the team spirit. Most of the advice applying to code and tooling also applies to documentation.
Rigidity in code comments Code comments, as mentioned above, are also documentation. Developers like to express their feelings about their code (mostly pride and frustration, if you ask me). So it's not unusual for them to express these in no uncertain terms in comments (or even code), when a more formal piece of text could have conveyed the same meaning with less expletives or drama. It's OK to let a few slip through for fun and historical reasons: it's also part of developing a team culture. But it's very important that everybody knows what is acceptable and what isn't, and that comment noise is just that: noise.
Rigidity in commit logs Commit logs are not an annoying and useless "step" of your SCM's lifecycle: you DON'T skip it to get home on time or get on with the next task, or to catch up with the buddies who left for lunch. They matter, and, like (most) good wine, the more time passes, the more valuable they become. So DO them right. I'm flabbergasted when I see co-workers writing one-liners for giant commits, or for non-obvious hacks. Commits are done for a reason, and that reason ISN'T always clearly expressed by your code and the one line of commit log you entered. There's more to it than that. Each line of code has a story and a history.The diffs can tell its history, but you have to write its story.
Why did I update this line? -> Because the interface changed. Why did the interface change? -> Because the library L1 defining it was updated. Why was the library updated? -> Because library L2, that we need for feature F, depended on library L1. And what's feature X? -> See task 3456 in issue tracker. It's not my SCM choice, and may not be the best one for your lab either; but Git gets this right, and tries to force you to write good logs more than most other SCMs systems, by using short logs and long logs. Link the task ID (yes, you need one) and a leave a generic summary for the shortlog, and expand in the long log: write the changeset's story. It is a log:It's here to keep track and record updates.
Rule of thumb: If you were searching for something about this change later, is your log likely to answer your question? Projects, documentation, and code are alive. Keep them in sync, otherwise they do not form that symbiotic entity anymore. It works wonders when you have:
Clear commits logs in your SCM, with links to task IDs in your issue tracker Where this tracker's tickets themselves link to the changesets in your SCM (and possibly to the builds in your CI system) A documentation system that links to all of these Code and documentation need to be cohesive.
Rigidity in testing
Rules of thumb:
Any new code shall come with (at least) unit tests. Any refactored legacy code shall come with unit tests. Of course, these need:
to actually test something valuable (or they are a waste of time and energy) to be well written and commented (just like any other code you check in) They are documentation as well, and they help to outline the contract of your code. Especially if you use TDD. Even if you don't, you need them for your peace of mind. They are your safety net when you incorporate new code (maintenance or feature) and your watchtower to guard against code rot and environmental failures. Of course, you should go further and have integration tests, and regression tests for each reproducible bug you fix.
Rigidity in the use of the tools It's OK for the occasional developer/scientist to want to try some new static checker on the source, generate a graph or model using another, or implement a new module using a DSL. But it's best if there's a canonical set of tools that allteam members are expected to know and use. Beyond that, let members use what they want, as long as they are ALL:
Productive NOT regularly requiring assistance NOT regularly adjusting to your general infrastructure In areas like code, build system, or documentation NOT affecting others' work ABLE to timely perform any task requested If that's not the case, then enforce that they fallback to defaults.
Rigidity vs. versatility, adaptability, prototyping, and emergencies Flexibility can be good. Letting someone occasionally use a hack, a quick-n-dirty approach, or a favorite pet tool to get the job done is fine. Never let it become a habit, and neverlet this code become the actual codebase to support.
Team spirit matters
Develop a sense of pride in your codebase
Develop a sense of Pride in Code Use wallboards Leaderboard for a continuous integration game Wallboards for issue management and defect counting Use an issue tracker / bug tracker Avoid blame games
DO use Continuous Integration / Continuous Inspection games: it fosters good-mannered and productive competition. DO keep track defects: it's just good house-keeping. DO identify root causes: it's just future-proofing processes. BUT DO NOT assign blame: it's counter productive. It's about the code, not about the developers Make developers conscious of the quality of their code, But make them see the code as a detached entity and not an extension of themselves, which cannot be criticized. It's a paradox: you need to encourage ego-less programmingfor a healthy workplace but to rely on ego for motivational purposes.
From scientist to programmer
People who do not value and take pride in code do not produce good code. For this property to emerge, they need to discover how valuable and fun it can be. Sheer professionalism and desire to do good is not enough: it needs passion. So you need to turn your scientists into programmers(in the large sense). Someone argued in comments that after 10 to 20 years on a project and its code, anyone would feel attachment. Maybe I'm wrong, but I assume they're proud of the code's outcomes and of the work and its legacy, not of the code itself or of the act of writing it. From experience, most researchers regard coding as a necessity, or at best as a fun distraction. They just want it to work. The ones who are already pretty versed in it and who have an interest in programming are a lot easier to persuade to adopt best practices and switch technologies. You need to get them halfway there.
Code maintenance is part of research work
Nobody reads crappy research papers. That's why they are peer-reviewed, proofread, refined, rewritten, and approved time and time again until deemed ready for publication. The same applies to a thesis and a codebase! Make it clear that constant refactoring and refreshing of a codebase prevents code rot and reduces technical debt, and facilitates future re-use and adaptation of the work for other projects.
Why all this??!
Why do we bother with all of the above? For code quality. Or is it quality code...? These guidelines aim at driving your team toward this goal. Some of these points help by simply showing your team the way and letting them do it (which is much better) and others take them by the hand (but that's how you educate people and develop habits). How do you know when the goal is within reach?
Quality is measurable Not always quantitatively, but it is measurable. As mentioned, you need to develop a sense of pride in your team, and showing progress and good results is key. Measure code quality regularly and show progress between intervals, and how it matters. Do retrospectives to reflect on what has been done, and how it made things better or worse. There are great tools for continuous inspection. Sonar being a popular one in the Java world, but it can adapt to any technologies; and there are many others. Keep your code under the microscope and look for these pesky annoying bugs and microbes.
But what if my code is already crap?
All of the above is fun and cute like a trip to Never Land, but it's not that easy to do when you already have (a pile of steamy and smelly) crap code, and a team reluctant to change.