linux.conf.au 2020 | Presentation: Configuration Is (riskier than?) Code

Presented by

Jamie Wilkinson
@jaqx0r
https://spacepants.org

Jamie Wilkinson is a site reliability engineer at Google. He’s a contributing author to the SRE Book and has presented on contemporary topics at prominent conferences such as Linux.conf.au, Monitorama, PuppetConf, Velocity, and SRECon. His interests began in monitoring and the automation of small installations and have continued with human factors in automation and systems maintenance on large systems. Despite his more than 15 years in the industry, he’s still trying to automate himself out of a job.

Abstract

TL;DR: Configuration is code, and config changes should be treated with at least as much care, skepticism, and rigour as code changes are. Config presents special challenges though as it's usually not a fully operational Turing equivalent language, but has a high "force multiplier" per character relative to code itself. let's explore those challenges and how we can address them to reduce the risk of configuration-change-related outages. Over ten years ago Puppet Labs and others espoused the idea of "configuration as code," setting a course that crossed DevOps, the APIfication of systems, the Cloud, and Serverless. Today, you can write a few lines of config and invoke thousands of CPUs, doing hundreds of operations, deploying entire clusters of systems, a huge force multiplier for IT operations. This force multiplier comes at a cost, and that cost is risk and impact. Never before has it been so easy to destroy an entire CDN in a single command. While numbers vary, studies show that a significant number of incidents in IT operations are caused by configuration changes. Configuration *is* code (and I'll prove it), but it sure lacks the same rigour that code receives. Configuration formats like YAML and JSON do not have the same quality of syntax checkers and debuggers that languages like C++, Go, and Ruby have. Often the first time you know that a configuration is semantically correct is when it is running in production. So what can we do about it? Why does this presenter think that a comparison between configuration format and a debugger is even possible? In this presentation we'll start by looking at this problem from a theoretical point, which will let us look to other areas that solve a similar problem, and then see how we can apply that perspective back to configuration to make future production changes safer than today. Linux Australia: http://mirror.linux.org.au/pub/linux.conf.au/2020/arena/Monday/Configuration_Is_riskier_than_Code.webm YouTube: https://www.youtube.com/watch?v=NcT8-IoImXE