When a task becomes tedious or repetitive, most developers have a natural instinct to automate it. Some developers (among them, myself) have not much tolerance for unnecessary repetition, to the extent that we'll automate boring tasks even if it might have been faster not to. We automate stuff because it's more interesting, and if we're lucky, it might save some time too.
But as I do more operations on real-life production systems, it's sinking into me that the most important benefits of automation are completely different from what I originally thought. Timesaving and fun are really just the tip of the iceberg. The other important benefits of automation are relevant even if the automation sucks to do and never saves any time at all.
The biggest benefit is that automated operations are dramatically safer than manual ones. This is something that everyone knows in principle, but it's still easy to underestimate the possibility of manual error -- that is until you're the one who finds themselves doing that midnight operation on the production database. Everybody makes mistakes. Tired or stressed-out people make way more mistakes than average -- and you will have to do operations while tired, or stressed, or often both. At those times, nothing is more comforting than knowing that you don't have to do things manually.
Probably the next-biggest benefit is that automated tasks are completely self-documenting. In other words, automation is an alternative to documentation. It doesn't completely remove the documentation burden, but it can dramatically reduce it. Not only that, but a Bash script, for example, has superior usability and tendency towards correctness than a list of commands in a wiki page or in your memory. And although it might be tempting to think that some ad-hoc operation doesn't need documenting because it won't ever need to happen again, just remember: In the fullness of time, lightning always strikes twice. Don't throw away your lightning rod.
Automation is like butter -- it's surprisingly hard to use too much of it. I've started automating as early as possible, even if it doesn't seem like it'll save time. I do it to make the operations as safe and repeatable as possible. The idea is to automate proactively: to avoid, from the very beginning, the potential problems intrinsic to manual operations. Of course, this is hardly a new idea. But I've found that doing DevOpsy work really drives home the point on a visceral level.
Here are some examples of what I think a "proactive automation" mindset looks like in practice:
- Are you doing anything non-trivial on the command line -- even once? Wrap the commands in a Bash script instead.
- After rebooting your computer, you should be able to go from a blank slate to performing any operation within just one or two commands. Don't let your emergency response time be slowed down by having to remember how to set up your work environment.
- Avoid doing important configuration or complex setup using Web-based dashboards. Instead, write a script that uses does the operations via an API.
- As a corollary of the above, run -- don't walk -- away from using anything that doesn't have an API.
- Use infrastructure-as-code tools (CloudFormation, Terraform et. al.) as much as possible. If one doesn't exist for something you're using, you might actually want to write your own. (It's not that hard, as long as there's an API (see the previous point) and you don't do anything too fancy.)