Are you technically ready for Cloud Native?

I'm going to focus on technical requirements in this blog. That said, I do feel your organizational change is also a prerequisite for you to be able to make such a transition effectively. Cloud Native is About Culture, Not Containers is a good starting place for looking at a transition from that point of view.

A successful transition to a new deployment technology requires that everyone involved is ready for the changes necessary:

The development team
The quality assurance team
The deployment team
The platform team
The operations team
The facilities team
The management team

I'm not going to give advice on your organization's preparation, you know them better than I do and any advice would be speculative at best. But there are a few structural boundaries that can be analyzed in order to determine if you have done the technical work. I will be using kubernetes as an example deployment platform because I know it, but the concepts should apply universally.

A_i → A_i+1

Simply put, you must have a plan for deploying the next version of your application. Most teams will be able to manually deploy a single version of their application, at worst with a sequence of kubectl commands and manual configuration for other items like load balancers, DNS entries and TLS certificates.

The real challenge is to get the next version deployed.

Why is this critical? Consider a standard web application calling a single service that is being upgraded. If the application is stateless, you could have either of the following interactions:

The complexity is obvious - calling service version i, might not provide the application with the data needed to version i+1. Likewise, calling the older version after the newer version may also have failure modes.

And that's for a single service, a microservice based application might have many, many services with multiple deployments happening at the same time. It is easy to get into a combinatorial explosion of various permutations of calls that you can never be confident will work together.

Given that complexity, let's look at some options, in somewhat increasing levels of difficulty:

Stop the world - In this case, you are allowed to take the whole system down, install the new version and then bring it back up. Admittedly, whatever you did to install your first version should work here, but why are you even taking on the complexity of a distributed deployment architecture if you don't need it?
Switch environments - In this case, you can switch to a different region, availability zone, datacenter or any other high availability deployment, so you can upgrade one copy of the application, while the other(s) continue to provide service. Things to consider:

Is the application design active-active or active-passive?

Does your replication or consistency design allow this (and have you tested it)?

Can version i+1 synchronize with version i? Do you have to get the other site(s) catch up and then do a flash cut? If you use any quorum algorithms, what happens when the new version is the majority?

Single cluster options. Deployment strategies are documented well elsewhere, but there are impacts you should be aware of:

If you have stateful or session components, do you have a way to migrate active sessions to the new version? Can you afford to wait?
If you have schema based storage, at what point can you apply any changes?

Multi-Strategy. You may want to use different strategies for different deployments.

There are business advantages to supporting A-B testing that are not related to just getting the next version deployed. Testing out different versions on subsets of real traffic will give you insights into your development that you generally can't replicate in non-production environments.
There are management advantages to supporting canary deployments, especially if portions of the environment are not under your control

Applying the YAGNI principal here is simply short sighted. you are always going to need a deployment plan for the next version because software is never done. If you don't have a plan, then you are not ready.

This is also why feature flags are very popular. You can convert a single upgrade that would have difficulties having both versions running into a two step process where you deploy a newer version that has been coded to run both ways and then a flag update that instantly switches the feature on.

P_i → P_i+1

You need to have a plan to upgrade your distributed environment. At the very least you will need to apply security patches. How are you going to do that and still provide application uptime?

Stop the world - In this case, just like with application upgrades, you could shut the cluster down, do the upgrade and then restart. While this is a basic solution, if it's consideration for your environment it's quite possible you don't need a distributed deployment architecture.
Switch clusters - this is a simple 3 step process which is MUCH easier if you have a virtualized environment:

Create a new cluster
Move your application to the new cluster
Shut down the old cluster\

Perform an in-flight upgrade. This requires you to upgrade parts (usually a single host) at a time, bringing them back into service before moving on to other parts. NOTE, not all upgrade paths may be supported. For example, OpenShift 3 was not upgradable to OpenShift 4.
Outsource the problem. If you are in a managed public cloud environment, many of the vendors support upgrades.

If you don't have a plan to upgrade your environment, you don't have everything you need to move to a distributed environment.

S_i + D_j → S_?

“A distributed system is one that prevents you from working
because of the failure of a machine that you had never heard of.”
Leslie Lamport

You also need to consider your system when dealing with a disruption. What types of disruptions will your system tolerate?

Pod failure - Do your pods restart in a manner that meets your requirements. If your pod has an init container that prevents traffic from being serviced for a period of time, that can mean your application could be in a state where it's up, but not available.
Host failure - What happens when a whole host disappears?
Rack failure - If you are in an on premise environment, what happens if you lose a top of rack switch (TOR)?
Network partition - Router failures, BGP routing mistakes and backhoes can turn a perfectly good set of computers into a mass of isolated silicon.
What if any of these failures happen while also doing a platform or application upgrade?

Ready?

Finally, you need to understand that just having a plan now, doesn't remove your responsibility to have a plan in the future. It might be reasonable at the moment to containerize your monolith. And then changes happen:

But then you start using the strangler pattern to extract the user authentication
add ISTIO so that you can do A/B testing
etc.

And then it's quite possible to no longer be ready for your cloud application and already have one.

Kotlin Notebook when you're blocked from Maven Central

TLDR; If you are blocked getting to maven central when first using Kotlin Notebooks because of company firewalls, you can use a tool like Fiddler Tool to redirect to a different network location. Kotlin Notebooks Kotlin Notebooks are a JDK based environment that brings the Python based Jupyter Notebooks expressiveness to IntelliJ. From the blog post announcing the plugin, it looks like this: At home, the installation of jar files looked like this: I played around with it at home, but I couldn't use it at work. Many companies, mine included, do not allow software components to be used when downloaded directly from the internet. In my companies case, we use a product called Artifactory, which allows you to mirror the content from Maven Central while still applying policies like CVE scanning, tracking, etc. The way it should work IntelliJ, as one of the leading IDE's, generally supports this quite well. In fact, there is a whole setting page dedicated to dealing wi...

Boundaries in Dev

Search This Blog

Are you technically ready for Cloud Native?

A_i → A_i+1

P_i → P_i+1

S_i + D_j → S_?

Ready?

Comments

Post a Comment

Popular posts from this blog

Spring Boot native builds when internet downloads are blocked made simple

Kotlin Notebook when you're blocked from Maven Central

BSOD Unexpected Kernel Mode Trap

Boundaries in Dev

Are you technically ready for Cloud Native?

Ai → Ai+1

Pi → Pi+1

Si + Dj → S?

Ready?

Comments

Post a Comment

Popular posts from this blog

Spring Boot native builds when internet downloads are blocked made simple

Kotlin Notebook when you're blocked from Maven Central

BSOD Unexpected Kernel Mode Trap

A_i → A_i+1

P_i → P_i+1

S_i + D_j → S_?