Nick Galbreath is director of engineering at Etsy, overseeing groups handling fraud, security, authentication and internal tools. Over the last 18 years, Nick has held leadership positions at a number of social and e-commerce companies, including Right Media, UPromise, Friendster, and Open Market, and has consulted for many more. He is the author of "Cryptography for Internet and Database Applications" (Wiley), and was awarded a number of patents in the area of social networking. Today he shares his insights on Continuous Deployment and security.
1. In Continuous Deployment, developers push software to production several times a day. Please explain how this reduces risk in development and operations.
Before we talk about risk, let's talk about what Continuous Deployment requires. It needs both a mechanism and a policy for success. The mechanism is being able to deploy quickly without some manual process (i.e. that guy who magically deploys code for you). Every organization needs this now. The policy is how, what and when you choose to deploy. This is highly company and risk dependent, and you'll need to adjust this to your needs. You are unlikely to want to deploy, say, database schema changes as often as CSS changes due to the risk involved.
However one important part of the policy is emphasizing "small changes." For complex Web-based, data-driven systems, there is no amount of QA in your development environment that can guarantee that the code won't cause an operational or security issue. None! So given that, would you rather have a big bang change with dozens of authors and hundreds of files being modified each with multiple change sets? Good luck rolling that back and debugging when things go wrong! Or would you rather have a few very small changes that can be code-reviewed "by inspection" and is easy to undo, each with an audit trail. To me the latter is much less risky for both site availability and security.
Also note that while there is risk in making changes, there is also risk in not making changes. Not being able to push out timely security fixes and patches leaves your organization exposed during the (long) release cycle. With a Continuous Deployment process, exercised multiple times per day, you know you can push a security fix when the need arises.
2. Many Appsec professionals are concerned that Agile development teams build software too fast to be secure. Continuous Deployment seems to accelerate this even more. How do security controls and checks fit into Continuous Deployment, and what controls and checks need to be done differently to keep up with the pace?
I've seen no study saying that agile methods are any better or worse for security than any other methodology -- security can be ignored equally well in all methodologies! Waterfall with full SDLC has been producing security flaws for decades, so I'm not sure that is the gold standard. I suspect that all methodologies, properly implemented, have about the same change-to-bug ratio in a release (i.e. number of lines of code changed to number of new bugs created in a single code deployment).
Continuous Deployment is about movement of code from development to production. It doesn't mean you don't do review for security, architecture and operations and don't write tests and don't do peer code reviews. You still need to do all that. Continuous Deployment also means holding developers more responsible for their code quality, because developers can immediately see the results of their work. There is little to no buffer between them and their code running.
As mentioned to make this work, developers need to learn an "incremental programming" style involving pushing lots of small changes. This is not natural to most developers and takes a while to learn. A lot of the changes in a proper Continuous Deployment process do nothing. This dark code isn't executed in production. It just sits there. Why? Since we know if something does go wrong on deployment we can rule out these dark changes, focusing our efforts on code that actually does something. In the meantime, that dark code is now in source control for all to see and review, and we know has compiled and passed basic automated QA tests. For code that is active, there are a number of strategies to manage risk. Lighting up the dark code is normally done under a configuration flag, so it's easy to identify the change and to easily turn off if something occurs. Features are "ramped up", by first testing internally, then exposing to 1% of the site's users, then 10% and so on to make sure everything is working right before going to 100%. If a problem is found, it can just as easily be ramped down.
So far I talked a lot of risk and development. As for security and Continuous Deployment, the summary is "secure by default" and "alert on when it's not." It's easier said than done. The following aren't really specific to Continuous Deployment, but perhaps they are more important:
- Isolate, insulate or ban functions that are easy to misuse. Cryptographic functions are a prime example of this. Most of these APIs are too low-level, so they'll need wrapping into something that mere mortals can use. Then alert on the raw function. If new crypto starts showing up in the code base, it probably needs review (and ideally this code was already reviewed in your software development life cycle).
- Segment out sensitive code and alert on changes to it. Your password storage mechanism shouldn't be changing that often. If it is, find out why.
- Use static analysis, even for dynamic languages. Prevent those silly bugs before they go out into production. In C, those silly bugs are frequently security vulnerabilities. Static analysis for dynamic languages such as PHP is becoming more popular as well (e.g. see http://slidesha.re/KzTfLy).
- Making testing easy for developers and automatic for deployment. Set up a continuous integration stack such as Jenkins and have it run every time. This also means spending time on making your tests run fast. If they are slow, no-one will want to run them.
- Make untrusted user inputs secure by default. Right now, every web platform gives you the raw user input. Unless you explicitly escape it, it's insecure. This is impossible to manage. If possible flip it around so inputs are escaped, and if you need the raw data, you have to explicitly un-escape the data. In other words, the inputs are secure by default. This is complicated and probably deserves its own paper.
To get started with Continuous Deployment, the very first thing to do is to "put a button on it." Get your release engineer (or whoever deploys code) to start automating the release process so it's a "one button" process or simple shell script that anyone can do. The release engineer should not view this as job-threatening as it actually makes him more valuable. Then work on making the time it takes to deploy shorter and shorter. If it hasn't been done already, this process is likely to cause a whole cascade of site operations improvements: automation, standardization, monitoring of your stack. All of which help improve security. Then work on how fast you can upgrade key servers (e.g. Apache HTTPD) or even the operating system. You want all of this to be painless, since if it's not, you'll postpone it, or worse not do it all and miss a key security patch.
A basic requirement is to log all changes: what is the change, who made the change, and who pushed out the change to production. You'll probably want some conventions around when people can push and on peer review. You'll also want to make visible site operations and when site changes are happening. You'll want to visibly correlate changes to problems.
The previous section answered what is needed to make Continuous Deployment work securely before code goes out. Now it's time to see what happens after code is deployed. Again, none of these are unique to Continuous Deployment:
- Make security visible. Graph potential SQLi and XSS attacks. These are normally quite common, and turns security into a visible event for all to see. It's a fantastic tool for security education, as well as knowing when someone new is probing you.
- Use attacker-driven testing. Use previous results to guide how you do manual or semi-automated testing. You'll find attackers don't scan everything but focus on particular regions of your website. Maybe they found something.
- Monitor core dumps. Your server shouldn't be core dumping very often, if ever. If it is, maybe you need to patch or upgrade. Or maybe someone found a buffer overflow.
- Monitor "server 500" errors. Can you reproduce them? Probably it's just a QA problem but maybe it's someone scanning your system and they found something.
- Monitor database SQL syntax errors. SQLi breaches don't happen in a vacuum. An attacker needs to spend a lot of time probing your application for the appropriate entry point. These often end up generating SQL syntax errors.
- Look for other anomalies. New really long URLs coming in? New parameters showing up in a query string? Maybe a code review needs to happen.