Don't Migrate to Kubernetes

No, this isn’t a hit piece

Kubernetes is a fantastic tool for running a wide range of applications. Whether it’s batch jobs, websites, databases, or even video game servers, Kubernetes can help keep your application properly running with minimal downtime.

Kubernetes is polarizing, and it’s easy to find negative opinions - too arcane, too many changes for their application, too hard to debug a failing application. These opinions had to have come from somewhere; I’m inclined to believe many negative experiences with Kubernetes extend from organizations migrating before they’re ready.

What does ready look like? When is the right time for an organization to migrate to Kubernetes? To start, moving from bare-metal servers or virtual machines straight into Kubernetes is not a good idea. Learning and implementing the core paradigms exhibited by Kubernetes isn’t trivial, and taking one massive leap rather than many small steps will create more problems than it solves. Before moving to Kubernetes, there are a few qualities that should be met:

Changing infrastructure should be fast and easy.
Changing configuration should be fast and easy.
Testing and Deploying new code should be fast and easy.
Centralized Observability, Monitoring, and Alerting should be in place.
Applications should fit well into Containers, and Operations teams should be able to manage containers efficiently.

These 5 key points come together to create robust infrastructure, regardless of runtime or environment. Some people talk about the idea of “Pets vs. Cattle”, where you should treat your applications and infrastructure as “cattle” rather than “pets”. The analogy falls apart when you start to dissect it, but from a very high level, it makes sense: An organization’s software development and infrastructure should be robust, well specified, and easy to replace. By the time the 5 points are implemented, you may find that Kubernetes isn’t needed. If it is needed, migration will be a breeze.

Before you bring out the torches and pitchforks, let me be clear. The first 4 qualities are soft requirements (containerization is a hard requirement with Kubernetes) - many organizations successfully use Kubernetes without CI/CD or Infrastructure as Code. This article aims to describe patterns that we’ve seen in smooth migrations to Kubernetes, but they should be considered on a case-by-case basis.

A Note on Documentation

In each of the operational goals described below, it’s impossible to over-stress the importance of documentation. For example, in the context of servers, this documentation should be able to answer the following (and many more) questions:

How many servers are there?
What hardware resources does each server have?
What operating system does each server have?
What applications run on each server?
What are the network/configuration dependencies of each application on each server?

Having well-located, well-defined, and well-maintained documentation will make the next steps much easier, and it makes it easier for managers and developers alike to understand the systems making up an organization. As long as people know where to find the documentation, they’ll be better able to understand the software infrastructure and better able to contribute and improve that infrastructure.

Where Do We Start?

At the risk of using a buzzword, there are two main changes that can supercharge your infrastructure: Infrastructure as Code and Configuration as Code. Both of these concepts come together to lay the groundwork for quickly changing software infrastructure in order to meet the organization’s needs. We’ve seen great success with Ansible and Terraform (the “Terrible” stack), but both tools have suitable alternatives. It’s hard to go wrong with any of the mainstream tools.

Infrastructure as Code combined with Configuration as Code can get an organization very far on their own. With the IaC creating and managing the servers, and the CaC provisioning them, adding or removing new servers becomes trivial. Any change to the servers or software is just a part of an automated provisioning process.

Declarative State

A common paradigm in Infrastructure as Code and Configuration as Code is declarative state. Declarative in this context refers to creating managing resources by describing them, rather than creating them through steps of a process.

For example, consider creating a file. An imperative (the opposite of declarative) way to do so may be:

# Create file at $PATH
$ touch $PATH
# Set file permissions
$ chmod 0755 $PATH
# Write contents to file
$ vim $PATH

Alternatively, creating a file in a declarative manner would be as follows:

# declarative-create-file takes a path, permission specification, 
#   and contents for the file
$ declarative-create-file << '{
   "path":        "/path/to/file",
   "permissions": "0755",
   "content":     "foo"
}'

The self-documenting nature of declarative code is easier to read and reason about - the reader can assume “this is exactly how things are right now,” rather than trying to figure out what a file would look like after it gets changed over and over again.

With declarative state, you can combine your server provisioning automation, your software deployment automation, and your server patching automation all into one process! Your deployment spins up new servers, patches them, and installs your software. After the load balancer reports them as healthy, the old servers can be (automatically) destroyed. How’s that for cattle?

Infrastructure as Code

All applications need to run somewhere, even if they’re “serverless.” Infrastructure as Code enables the creation and management of hardware resources programmatically, and it doesn’t stop at servers, either. It can create databases, firewall rules, network routes, or anything else your cloud provider or converged infrastructure vendor (such as VMware) exposes through an API. Many companies even have policy to only deploy cloud resources through IaC unless it’s for testing.

Configuration as Code

After the servers are provisioned, how do you get them ready to run your application? Configuration as Code tooling aims to bootstrap a server for its desired purpose of deploying and starting applications. It’s normal for Configuration as Code to update the server’s software as needed, ensure the server is configured to specification (such as creating users or setting up scheduled jobs as needed), and install and configure the desired applications.

IaC and CaC: Become Unstoppable

The self-documenting nature of Infrastructure as Code and Configuration as Code is invaluable. In minutes, a new team member can know:

What servers are deployed and needed
How applications are deployed
How applications are configured

The benefits go further than onboarding or self-documentation. Any time you would need to deploy new infrastructure, make any changes to a server, or change a setting in an applications runtime, IaC and CaC have your back:

What if a new QA environment is desired between development and production? Since it’s code, it can be copied, pasted, and tweaked with minimal effort.
The QA environment requires slightly different settings than development. Not a problem, since changing variables between environments is a one-line change.
What if a bad configuration gets deployed, or worse, all the servers get deleted? Since it’s code, and therefore in version control, the commit can be rolled back and a safe version re-deployed.

The crowning jewel behind Infrastructure/Configuration as code is disaster recovery. All of the systems that should exist are described in code, and the code is run every time something needs to change. Disaster recovery is a zero-cost benefit to using code to define infrastructure, since the “disaster recovery” runbook is a copy-paste of the “deploy” runbook.

What’s next?

With Infrastructure as Code and Configuration as Code, an organization’s IT operations are a force to be reckoned with and many organizations can stop there. That said, there are often a few missing puzzle pieces if the long term goal is to move to Kubernetes.

It’s important to note that it’s difficult to gauge success on practices like CI/CD and observability. Generally speaking, a successful implementation of the next few points should result in minimal production rollbacks and outages, and the majority of bugs being caught are caught during development and quality assurance.

A Note about DevOps

At this point, we’re firmly in the “DevOps” space. DevOps is still emerging as a methodology, and many of the terms used throughout the space are weakly defined. Every company practices DevOps differently, but DevOps foundations focus on integrating code changes without friction, reducing the possibility of bugs in production code, mitigating service downtime, and improving reliability of an organization’s hosted software.

Continuous Integration and Continuous Delivery

Continuous Integration and Continuous Delivery, commonly known as CI/CD, is one of the most important parts of DevOps. CI/CD deals with what happens after the code is committed.

CI covers things like running tests and checks to ensure the code is safe to merge into the primary branch, and CD covers deploying those changes to non-production and production environments. The main idea is that each production deploy should be strictly tested and vetted to ensure that risks are mitigated. This usually means compilation and unit tests, and sometimes extends into linting and type checking (such as with mypy on Python code).

CD concerns promotion between environments. Development and testing happen faster when new code changes are automatically deployed to a staging environment that is load tested and monitored for issues. These automated sanity checks give the development team peace of mind by ensuring a bad build doesn’t slip through the testing cracks.

With CI/CD, it’s best to start with CI and expand to CD as the codebase as it matures. Automated production deployments aren’t a requirement for any organization, but automatic testing and linting for all code is essential for all code bases.

Monitoring, Observability, and Alerting

After code is deployed (to any environment), it’s good to have monitoring in place. Most organizations start with shipping service logs to a centralized location such as ElasticSearch or Datadog. After that, tracing is typically added to the services. Tracing is the ability to track each requests through your services and helps pinpoint performance issues and error root causes.

This data is aggregated into metrics and pretty dashboards so engineers and management alike can understand when there’s an issue occurring, or when there’s room to improve. With these metrics, it’s easy to create alerts. When there’s an issue, the right team should know and be able to act on it as soon as possible.

Containerization

Let’s address the elephant in the room. In the tech world, containers are a household name. Packaging an application and its runtime environment together is a fantastic idea. Configuration as code can obviate the need for containers: If servers are automatically deployed, configured, and replaced, one of the main benefits of containers is being met. Containers boast more benefits over traditionally deployed applications. This is because containers are easier to co-locate on servers since you can build them as soon as new code is committed instead of waiting for configuration as code to run. Furthermore, they remove environmental differences between developer workstations.

An organization’s opinion on containers hinges on how well the application adapts to them. The truth is, not every application can or should be put into a container. It’s easy to be upset when an application doesn’t fit into a container well, and in turn put that blame on something other than the application itself. A few things that can cause problems for a containerized application:

In memory state that prevents a service from shutting down and starting up immediately
Reading or writing to the filesystem
Requiring multiple large configuration files
Requiring specific OS parameters
A lengthy startup step before the application is ready for connections
A lack of runtime health checks
Depending on a system daemon (such as sshd or ftpd)

There are ways to work around some of these design decisions to get an application running in a container. That said, the most reliable solution is to refactor the code. As a general guideline to get started, applications going into containers should abide by the “12-factor app” methodology, described on 12factor.net. From a high level, the “12-factor app” is a pattern that reduces difficulties in configuration, deployment, and scaling.

Kubernetes only runs containerized applications, so you need to containerize your applications before you can migrate. It’s important to note that Kubernetes is not containers. Kubernetes orchestrates containers - it schedules and places them on hardware. You can get many of the benefits of containers without the complexity of Kubernetes.

When applications are in containers, server provisioning is usually simple. Most of the time, the application deployment section of the configuration as code templates a Docker Compose file, which then starts and supervises the applications. All said and done, migrating from Docker Compose to Kubernetes is a small step - concepts for Docker Compose files neatly map to Kubernetes manifests.

OK, now can we migrate to Kubernetes?

No! But you’re ready to decide if you should. Kubernetes has both soft and hard requirements that an organization’s applications must meet before Kubernetes should be considered:

Infrastructure should be well defined and easy to change.
Server configuration should be well defined and easy to change.
Code tests and deployments should be safe and easy.
Applications should have centralized monitoring and, during an issue, the correct parties should be alerted.
Applications should fit into containers and be refactored as needed to meet best practices.

Once these are achieved, it’s then important to ask if Kubernetes really brings value to the organization. In many cases, it’s just fine to stick with the way things are currently working.

If Kubernetes still seems like the right choice, don’t worry - all the work that went into improving operations has made a migration to Kubernetes a breeze:

A managed Kubernetes cluster is trivial to stand up with infrastructure as code.
Configuration as code applies to Kubernetes though helm.
All of your applications are tested and ready to deploy in a new environment thanks to CI/CD.
Any failures are logged and the right people are notified.
All of the applications are in containers that Kubernetes can stop, start, and move around without any problems.

Apartment 304 has helped companies through all steps in this article, including moving them into Kubernetes! We have experienced, passionate engineers that can assist with DevOps projects for every step of the way. If you’d like to talk about your DevOps needs, shoot us an email at [email protected].