Today we are happy to announce the 1.2 release of Backyards (now Cisco Service Mesh Manager), Banzai Cloud's automated and operationalized service mesh product built on Istio.
This is an announcement post describing the new features of [Backyards 1.2]. If you're not familiar with
Backyards (now Cisco Service Mesh Manager) yet, and want to know why we decided to build this product, we suggest reading the
blog post about the first major release.
Want to know more? Get in touch with us, or delve into the details of the latest release.
Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we've already blogged about.
tl;dr
In Backyards 1.2 we've added the following major items:
- Istio 1.5 support simplifies mesh management and enhances usability
- Mixerless telemetry greatly improves Istio control plane resource utilization
- A lightweight Istio distribution enables multi-cluster support for Istio 1.5
- Drill-down view of services and workloads helps find the root cause of failures in the mesh
- Stability and performance fixes
What's new
While the
previous Backyards release introduced a vast amount of
new features, the aim of this release was mainly to improve stability and usability. This is in line with Istio's focus that also shifts towards this direction.
With this in mind, let's see what's new in Backyards 1.2!
Istio 1.5 support
The 1.5 release has held the biggest changes in Istio's architecture for a very long time. While everyone is talking about the move from microservices to a more monolithic approach - namely istiod
- some other significant changes were also introduced. Telemetry V2 is now the default mode for capturing metrics, and a new WebAssembly based model for Envoy proxy extensibility is also available. These changes meant that we had to do major refactors in the Backyards codebase as well.
We don't want to talk too much about istiod
now. We published a blog post about it, and Christian Posta from the Istio community also wrote a nice post that explains why it was a reasonable and good decision. The point is that installing, running, and upgrading Istio became much easier with fewer moving parts.
Telemetry V2 (also known as Mixerless telemetry) is Istio's new model of collecting telemetry data from Envoy proxies. The new system improves latency by a significant margin (~50% per Istio's documentation), and also reduces total CPU consumption. It was available in the previous release as well, but had some serious deficiencies, like the complete lack of TCP metrics. The most important features were completed, and Telemetry V2 is now the default, but there's still a
feature gap between V1 and V2.
The last major addition is a new way of extensibility: WebAssembly plugins for Envoy. Developers can write their custom code, compile it to WebAssembly plugins, and configure Envoy to execute it. Wasm also helps with the safe and unified distribution of these plugins. These plugins can hold arbitrary logic (it's simple code!), so it can be useful for all kinds of integrations or mutations of messages.
Backyards (now Cisco Service Mesh Manager) 1.2 comes with Istio 1.5 as the default installation option. To learn more about the new Istio release, read more in our own
blog post, or in the official
announcement.
Telemetry V2
The above section briefly mentioned Telemetry V2, but because Backyards is an observability tool as well, that's built on Istio telemetry, migrating to a completely new system was not straightforward. On the surface, not too many things have changed. You'll still see the topology of service-to-service communication, telemetry dashboards, or live request rates and latencies.
But under the hood a lot have changed. Mixer as a control plane component is no longer running in Backyards managed meshes. All metrics are provided by Envoy proxies that are directly scraped by Prometheus. But what's different from a Backyards point of view, if - in the end - all metrics remained the same? First of all, not all metrics have remained the same. Latency histograms changed significantly, and we had to review other minor labeling changes as well to keep the feature set the same. And we had to stay compatible with Telemetry V1 as well.
Second, Mixerless telemetry completely changed how single mesh multi cluster setups work. Without a central telemetry component, it's now up to the end user to federate all the metrics in one place. Luckily
Backyards (now Cisco Service Mesh Manager) solves that for you, and
sets up Prometheus federation automatically between clusters in the same mesh. And even more importantly, Telemetry V2 completely lacks cluster information, so normally you wouldn't be able to differentiate metrics across clusters. This leads to our next topic, a lightweight Istio distribution.
If you're interested in the internals of mixerless telemetry, then stay tuned, we'll write a detailed post about it soon.
A lightweight Istio distribution
Up until now we didn't want to say that we have our own Istio distribution. It's a vague term, a bit overused and often used for marketing purposes only. But Backyards has always had heavy multi-cluster support, and Istio 1.5 basically broke both shared and replicated control plane multi cluster setups. It wasn't an option for us not to support multi-cluster topologies, but we wanted to have Istio 1.5 as well.
In previous Backyards releases we've already forked some of the Istio components and added some minor tweaks. These changes were so small, we didn't even announce it. But we became familiar with parts of the Istio codebase and figured that we could fix some of these 1.5 multi cluster problems for our use cases.
These events led us to the conclusion that we have to introduce our very own Istio distribution. It's lightweight because it's 99% upstream Istio with only a few improvements for some of our special use cases. While we'd really want to contribute back to the Istio community, we haven't had the resources to work with Istio maintainers on generic solutions for these problems. Our enhancements are therefore opinionated, and couldn't be contributed back to the Istio project directly, but they work perfectly well in a Backyards environment.
Some of the opinionated enhancements that we've added in Backyards 1.2:
- Backyards pre-configures proxies to hold cluster information in their node metadata
- We've changed the Wasm
stats
plugin to populate cluster info as Prometheus metric labels
istiod
is now able to validate pods on remote clusters by issuing token reviews to remote API servers
- Properly distribute root CA certificates to remote clusters as well
Drill-down view of services and workloads
Backyards already has a few built-in features for discovering the root cause of specific failures (validations, tap, topology view, metrics, traces, etc.). The new
drill-down view is another addition in this toolbox.
The topology and list views of Backyards are built on telemetry information provided by Istio. The dashboard is often a starting point for investigating issues within your cluster. When something goes wrong, the first thing you'll probably notice is that your services will start to misbehave: error rate or latency is increasing. But the root cause can be a whole bunch of different things, from application bugs to node failures.
Backyards 1.2 provides a drill-down view of services and workloads in the mesh. You can trace back the original issue by navigating deeper in the stack from the top-level service mesh layer, and see the status and most important metrics of your Kubernetes controllers, pods, and even nodes. In previous versions, Backyards configured Prometheus to scrape targets for mesh telemetry, now it's extended to node exporters and kube-state-metrics
as well.
With the drill-down feature, Backyards (now Cisco Service Mesh Manager) became a bit more than only a service mesh product. Now it's a more complete observability tool as well, that not only provides information based on the network metrics of the service mesh, but includes other valuable telemetry, like the CPU and memory usage of pods or nodes. Drill-down was designed to be extensible with third party metadata and telemetry providers as well. An important note is that all the collected information are actionable and used by the several available features.
Stability and performance fixes
Here are some of the most noteworthy bugfixes and enhancements:
- Stability improvements on the tap view
- Main metrics are available in the Service List view as well
- Throughput metrics are shown for TCP connections in the list views
- Multiple connections on different protocols are shown between services on the topology view
- Fixed edge "flickering" issue on topology view when error rate is nonzero
- Fixed OOMKilled errors of Istio operator
- Improved and stabilized upgrade flows from previous versions
- Fixed possible port collisions of the Prometheus service
- Better handling of custom pod controllers
- Improved error handling on the topology view when some parts of the graph cannot be displayed
- Port matchers were added to the traffic management tab
Wrap-up
The Istio project is heading in the right direction. The maintainers are listening to the community and working towards simplifying the management and usability of Istio. The vast majority of the new features in the last two releases are either architectural changes, or improvements of user experience. We think that these changes will greatly help with the adoption of Istio and the service mesh in general.
With
Backyards (now Cisco Service Mesh Manager), our goal was very similar from day one: making Istio just work. This recent change of focus in the Istio project, along with the increased maturity of Backyards makes it easier than ever to get on the service mesh ship, so give
Backyards (now Cisco Service Mesh Manager) a try now!
Want to know more? Get in touch with us, or delve into the details of the latest release.
Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we've already blogged about.