Open Source Kubernetes for the next level of network virtualization

By Daniel Park Principal Engineer, Samsung Research
By Moonki Hong Principal Engineer, Samsung Research
By Krzysztof Opasiak Staff Engineer, SRPOL
By Taewan Kim Staff Engineer, Samsung Research
By Lukasz Wojciechowski Staff Engineer, SRPOL

We as Samsung Research have recently presented a research achievement as a global team work with Samsung R&D Institute Poland and Samsung Research America for improving the Kubernetes open source project into the next level of competency in IEEE INFOCOM 2021.

What is a background of this research?

Container technology has revolutionized the way software is being packaged and run. The telecommunications industry, now challenged with the 5G transformation, views containers as the best way to achieve agile infrastructure that offers high throughput and low latency for 5G edge applications. These challenges make optimal scheduling of performance-sensitive workflows a matter of emerging importance.

How to solve this? Do not worry, we have Kubernetes open source project!

Kubernetes is a de-facto standard open source project for container orchestration in IT industries and has been deployed to various cloud solutions including AWS, Google, Microsoft, IBM, VMWare, and so on. It is now in Cloud Native Computing Foundation (CNCF) of Linux Foundation, and is a very dynamic project since over 400 commits are being contributed to it every month. Samsung is also contributing various ideas into this project with a collaboration with community members.

Okay, we have an open source which is free, but is that enough? It is not so simple.

It is only logical that there are also many new capabilities in the scheduling area. Grabowski and Tyczynski described Kubernetes scheduler and its features available as of Kubernetes v1.16 [1]. It is able to execute the same decision tree policy as Google’s Borg does with the proper configuration [2], but it is not the default choice in Kubernetes, so it requires additional operator’s intervention to activate it. Several improvements have been proposed to Kubernetes scheduler to make it more suitable for public cloud, but they are not considering any network metrics, or examining any inter-application relations and node communication delays, or requiring information about the service function chains that is provided manually, which are all crucial in most of telecommunication use cases in terms of high throughput, low latency orchestration operated in a fully automated manner.

In addition, the most critical drawback of the existing scheduler enhancement proposals is their incompatibility to the default scheduler of Kubernetes. No matter how excellent the proposed scheduler guarantees, it requires a custom configuration of selectively activating that scheduler for a specific use case – this is a huge blocker in the telecommunication industry in terms of wide deployment of the scheduler solution. When you consider of realizing the better world of technologies with your great idea using open source like Kubernetes, the most important criteria is its generality of deployment in various use cases that in the end should also guarantee compatibility to the existing open source mainstream.

So what have we been researching and developing for general deployment of the industries?

We design and implement NetMARKS – Kubernetes scheduler extender that uses information collected by Service Mesh to schedule pods based on current network metrics. Our solution does not conflict with current Kubernetes scheduler plugins or other extenders, and it can be used simultaneously with them to ensure backward compatibility. NetMARKS can automatically discover inter-application relations as well as schedule the application in a way to save the inter-node network bandwidth and to reduce the application response delay by extending the default Kubernetes scheduler including network metrics awareness, to ensure an efficient placement of Service Function Chains (SFCs). In this context, we propose a novel approach to collect network metrics for the scheduler using Istio Service Mesh.

Kubernetes for a baseline scheduler, but what is Istio and how it works in NetMARKS?

Istio has a sidecar-based service mesh design as shown in Fig. 1. This architecture assumes that there is an adjacent container, as a sidecar for every application that acts as a proxy and intercepts all the network traffic. Istio uses a powerful proxy named Envoy [3]. It provides multiple functionalities including mutual TLS, L3/L4 traffic filtering, HTTP L7 filtering and routing, telemetry, and advanced load balancing with circuit breaking and automatic retries. To benefit from those features, Envoy Proxy has to be configured properly and has to be provided with certificates. This is done by istiod, which is an essential part of Service Mesh controlplane that governs all Envoy Proxies. Istio is configured using Kubernetes CRD which allows a user to interact with it using Kubernetes objects.

Fig. 1. An architecture of Istio Service Mesh.

As Envoy participates in communication between services, it adds some delay to it. Based on [4], Istio generally adds around 3-ms latency for a mesh with 1,000 requests per second across 16 connections. This result can be improved [5] using a Container Network Interface (CNI) plugin, which supports SOCKMAP from the Linux kernel, such as Calico [6] and Cilium [7].

As the communication is always intercepted by at least one Envoy Proxy, the traffic can be easily traced and monitored. Every Envoy instance can be configured to collect metrics on the traffic that it is passing. These metrics are further collected and stored by Prometheus [8], which can be deployed as a part of Istio control plane. Users can use the data in multiple ways depending on their needs, e.g. for observability and diagnostics using Kiali [9], which visualizes the structure of mesh and presents flows between components.

NetMARKS is intended to be fully backward compatible with kube-scheduler. The most effective way to achieve this is to utilize Kubernetes Scheduler Extension mechanism. Extenders, including NetMARKS, are implemented as separate HTTP services that are contacted by the scheduler whenever a new pod needs to be scheduled. It is assumed that Istio is enabled for all of cluster namespaces, and Prometheus service is deployed as a part of Istio control plane.

NetMARKS uses data collected by the Prometheus to make the scheduling decision. It retrieves two metrics from there: istio_request_bytes_sum and istio_response_bytes_sum, which contain the number of bytes that is transferred in requests and responses respectively. The application entity does not exist in Kubernetes, so it is defined by Istio as a set of pods residing in the same namespace and having the same app label.

Both requests and responses metrics are summed in (1) to calculate F_(t_1,t_2)^(A,B), which is an average data flow between applications A and B in time period [t_1,t_2 ] in bytes per second. 〖req〗_t^(A,B) is the number of bytes in requests from application A to application B until t in bytes, and 〖resp〗_t^(B,A)- same for responses from B to A, with respect to 〖req〗_t^(A,B).

(1)

Metrics are collected periodically while keeping the average flow, since the start of data collection using (2).

(2)

Let F(t_i ) be the current flow at a given time t_i, after expanding F_(t_(i-1),t_i)^(A,B) in (2) with (1). Then, its final numerical model can be derived as follows.

(3)

This data flow metric is used to assess each feasible node that has not been filtered out. The algorithm has two input parameters: a Pod to be scheduled and a Node. Current state of pods assignment to nodes is required to determine where to bind the new Pod, so NetMARKS retrieves this information via Kubernetes API.

Operation "Get all traffic neighbors of Pod" selects data flow metrics, where namespace and application label of the Pod match either source or destination of the traffic.

The result of the algorithm can be understood as a sum of the traffic bandwidths between the Pod communication peers assigned to the Node. To calculate the Score, the algorithm iterates over all the pods that intersect between a set of pods assigned to the Node and a set of Pod communication partners. Then, it adds the flow data to the final result if namespaces are matched.

Scheduling example: Consider 4 applications:

- F: farm, producing grain

- W: well, producing water

- P: pig farm, producing pigs, requiring grain and water

- S: slaughter, producing meat, requiring pigs

Fig. 2 shows relation and traffic flows between applications. Assuming that F, W, and S are already running on nodes, the algorithm calculates the score in order to place pod with P as follows:

- Node1: State={F},Score=30+50=80

- Node2: State={W},Score=40+70=110

- Node3: State={S},Score=60+100=160

- Node4: State={},Score=0

Fig. 2. Example of the data flow graph.

The node3 receives the highest score, because placing pig farm together with slaughter allows to use intra-node communication, the expected volume of which is bigger than others.

Has NetMARKS been validated from rigorous experiments? Yes!!

We validated our solution using different workloads and processing layouts. Based on our analysis, NetMARKS can reduce application response time up to 37 percent and save up to 50 percent of inter-node bandwidth in a fully automated manner. This significant improvement is crucial to Kubernetes adoption in 5G use cases, especially for multi-access edge computing and machine-to-machine communication.

Publications: https://infocom.info/day/3
(Day3: Containers and Data Centers session “NetMARKS: Network Metrics-AwaRe Kubernetes Scheduler Powered by Service Mesh”)

What is the role of Samsung Open Source Group?

We have been contributing our idea into various open source projects in technical domains such as telecommunication, IoT, cloud, robotics, artificial intelligence, and so on with various developers across Samsung Electronics and others from various companies and R&D institutes, in order to make the better technical innovation and the bigger ecosystem in an open collaboration. If you are interested in our recent activities, please visit our official website https://opensource.samsung.com/.

Reference

[1] W. T. Marek Grabowski, “Kubernetes scheduling features or how can i make the system do what i want,” in KubeCon + CloudNativeCon Europe, 2017. ↓
[Online]. Available: https://www.youtube.com/watch?v=bbPcb2JuJPw

[2] A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at Google with Borg,” in Proceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015.

[3] “The envoy proxy.” [Online]. Available: https://www.envoyproxy.io/

[4] “Istio performance best practices.” [Online]. ↓
Available:https://istio.io/latest/blog/2019/performance-best-practices/

[5] N. Kapocius, “Performance studies of kubernetes network solutions,” in 2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 2020, pp. 1–6.

[6] “Project calico.” [Online]. Available: https://www.projectcalico.org/

[7] “Project cilium.” [Online]. Available: https://cilium.io/

[8] “The prometheus project.” [Online]. Available: https://prometheus.io/

[9] “Kiali - service mesh management for istio.” [Online]. Available: https://kiali.io/