Kubernetes for data scientists - Hosting predictions

October 7, 2022
Share this post

In the last issue we went through the workflow of a data scientist and where exactly kubernetes can prove to be a useful base on which to build a platform for it.

In this issue let’s go through a simple example to gain hands on experience for the same.


Setting up a playground

Before starting we need a playground to perform the demo. We will set up a kubernetes cluster on the local machine to do this. Although a cluster should contain multiple nodes for fault tolerance and high availability, we will mimic that behaviour using an awesome tool kind (kubernetes-in-docker).

At the end of this section we will have multiple containers running with each container acting as a separate cluster node.


Follow the directions provided here

Test the installation by running

$ kind --version
kind version 0.14.0

Launching the cluster

Now we start a local cluster using kind. We will create one control plane and two worker nodes. It is possible to have multiple of both.

<aside> 💡 Kubernetes can have multiple control plane and worker nodes. All the centralised cluster management components live on the control plane nodes while user workload run on worker nodes. Read more here


First, create a kind configuration in a file called kind-config.yaml. You can find it here. This will define the structure of our cluster -

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
- role: control-plane
- role: worker
- role: worker

Here, we have defined three nodes with one in the role of control plane and other two as worker nodes.

Launch a cluster using this config. This might take some time. Make sure docker daemon is up on your system before executing this -

$ kind create cluster --config kind-config.yaml
Thanks for using kind! 😊

Lets run kubectl to make sure our cluster is up -

$ kubectl cluster-info
Kubernetes control plane is running at <>
CoreDNS is running at <>

This tells us that the cluster is indeed up. We can also see the individual containers acting as nodes by executing docker ps.



With our cluster up, let’s look at a broad architecture of what we are about to provision.

Broadly speaking, we will host multiple replicas of our application inside the cluster and try to access them from outside with requests being load balanced across the different instances.

To achieve this, there are a few kubernetes specific terminology that we need to be aware of -

  • Pod - Pods are the smallest deployable units of computing that you can create and manage in Kubernetes. In our case, one instance of the application will be running inside one independent pod. These are ephemeral resources and control plane can move them across nodes if needed.
  • Deployment - A deployment is useful where we want to have more than one replicas for an application. Kubernetes tries to always maintain the number of replicas to be equal to what is provided in a deployment. We will create three identical replicas for our application.
  • Service - A service is useful to load balance across a set of pods running on the cluster. Since pods are essentially ephemeral and can be replaced at any time, service provides a stable interface to access the pods running behind it. We will use a service to test our application.

These three resources will enable us to host a scalable endpoint for serving our application.

Preparing the image

With the cluster up, we can now deploy an application and test it out. We will create an application using the popular iris classifier dataset.

Clone the repo

The repo is available at https://github.com/shubham-rai-tf/iris-classifier-kubernetes. It already contains the code for building and serving predictions at /iris/classify_iris endpoint using fastapi.

Building the docker image

We need to package this code in a docker image to prepare it for kubernetes. A Dockerfile is provided in the repo for doing that - here.

This Dockerfile specifies the image we will need to create a container hosting the prediction endpoints in a uvicorn server on port 5000. More details about the syntax is available here.

Execute this command to build a local image -

$ docker build . -t iris-classifier:poc
$ docker image ls
iris-classifier  poc  549913d5b1f9  12 seconds ago  737MB

We can see that the image has been successfully created with name iris-classifier and tag poc. We will now load this image into the cluster to use it inside the cluster

Loading the image

<aside> 💡 This step is only needed because we  don’t have an image registry to pull the newly built image from. In production, the image should be hosted in a private registry like Dockerhub or AWS ECR and then pulled into the cluster directly


Execute this command to load the locally built image to the cluster -

$ kind load docker-image iris-classifier:poc
Image: "iris-classifier:poc" with ID "sha256:549913d5b1f9456a4beedc73e04c3c0ad70da8691a8745a6b56a4f483c4f0862" not yet present on node "kind-worker2", loading...
Image: "iris-classifier:poc" with ID "sha256:549913d5b1f9456a4beedc73e04c3c0ad70da8691a8745a6b56a4f483c4f0862" not yet present on node "kind-control-plane", loading...

You can verify that the images have been loaded by listing images inside any of the three containers -

$ docker exec -it kind-worker crictl images
IMAGE                                TAG    IMAGE ID         SIZE
docker.io/library/iris-classifier    poc    549913d5b1f94    753MB

Deploying to kubernetes

Kubernetes is essentially a declarative system. That means, we describe the contours of what we want to do and control plane components constantly drive the system towards reaching that state.

To implement the architecture we discussed earlier, we will describe our intent in the form of a yaml file which acts as a record of intent. In kubernetes parlance these are called manifests.

All kubernetes manifests have the following fields -

  • apiVersion - Multiple resources are grouped together in same api versions. This provides a standardised way of deprecating or promoting a resource across kubernetes versions.
  • kind - Identifies the exact object type that is to be created
  • metadata - Contains fields that act as metadata for the created object. The apiVersion, kind and metadata.name fields together identify a unique resource inside a namespace
  • spec - This field contains the specification for the object to be created. Every kind defines its own structure for this field with its own implementation.

We will use the manifests present in the repo in files within the manifests directory here.

It defines two kubernetes resources, Deployment and Service in deployment.yaml and service.yaml respectively. Let’s go through both the sections.


apiVersion: apps/v1
kind: Deployment
# number of replicas
 replicas: 3
# name of the image
     - image: iris-classifier:poc
       name: iris-classifier

The deployment manifest in deployment.yaml majorly defines the pod spec we want to deploy in terms of the image name and the number of replicas. Once we apply this, kubernetes will constantly take steps to maintain the number of replicas to what we specify here.


apiVersion: v1
kind: Service
# Type of the service
 type: ClusterIP
# Port where the service will be accessible
 - port: 8080
# Port on the container where the traffic is to be forwarded
   targetPort: 5000
   protocol: TCP
   app: iris-classifier

The service manifest in service.yaml defines how to load balance across the replicas created by the deployment. Here we have defined how the port on service is to be mapped to port on the containers. Since our application runs at port 5000, the targetPort is set to 5000. The service is exposed on port 8080. TCP traffic sent to 8080 will be load balanced across port 5000 on the containers.

Applying the manifests

Run the following command to apply the manifests to kubernetes -

$ kubectl apply -f manifests/
deployment.apps/iris-classifier created
service/iris-classifier created

Both resources are successfully created on the cluster. We can verify running the following commands -

$ kubectl get service iris-classifier
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
iris-classifier   ClusterIP   <none>        8080/TCP   37m
$ kubectl get deployment iris-classifier
iris-classifier   3/3     3            3           38m
$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
iris-classifier-5d97498ff9-77wqw   1/1     Running   0          39m
iris-classifier-5d97498ff9-8twjm   1/1     Running   0          39m
iris-classifier-5d97498ff9-znrz8   1/1     Running   0          39m

As we can see, the service is exposed at port 8080 and three pods have been created as we specified.

Modify deployment.yaml to have 2 replicas instead of 3 and re-apply. Kubernetes will delete one of the replicas to match the spec.

Making the prediction

Now that the resources have been created in the cluster, we can verify our deployment by calling the model using the service endpoint. Since we are using a local setup, we will have to port-forward the service to a port on the local machine.

<aside> 💡 In a cloud provider setup, this service will be bound to an external load balancer which can be accessed from the internet if needed.


Run the following command to perform port forwarding for the service -

$ kubectl port-forward services/iris-classifier 8080
Forwarding from -> 5000
Forwarding from [::1]:8080 -> 5000

We can verify by calling the /healthcheck endpoint on the model -

$ curl '<http://localhost:8080/healthcheck>'
"Iris classifier is ready!"

To perform a test prediction, we will send a sample input to get a prediction -

$ curl '<http://localhost:8080/iris/classify_iris>' -X POST \\
-H 'Content-Type: application/json' \\
-d '{"sepal_length": 2, "sepal_width": 4, "petal_length": 2, "petal_width": 4}'

We get a prediction of class setosa with a 99% probability. By running multiple of these predictions, we can verify that the requests are indeed being routed to different pods in a round robin fashion.


Let’s remove all the kubernetes resources we had installed first -

$ kubectl delete -f manifests/

This will cleanup all the kubernetes resources we had created in the previous sections. Now we can take down the cluster as well -

$ kind delete cluster
Deleting cluster "kind" ...


In this issue we went through how to host a model as a callable service in kubernetes. Although this was a toy example where we built a docker image locally and ran it on a cluster running on the same machine, a typical production setup works on similar principles. A lot can be achieved with just these two resources.

In the subsequent issues we will explore other more advanced features like multi tenancy and access control which become essential as we move towards day 2 operations.

Build, Train, and Deploy LLM/ML Faster
Start Your Free 7-Day Trial Now!

Discover More

May 16, 2024

Volumes on Kubernetes

February 15, 2024

A Guide to Cloud Node Auto-Provisioning

December 7, 2023

Fractional GPUs in Kubernetes

LLMs & GenAI
May 1, 2023

Authenticated gRPC service on Kubernetes

Engineering and Product

Related Blogs

June 23, 2024
5 min read

A Guide to Cloud Node Auto-Provisioning

June 13, 2024
5 min read

TrueML Talks #29 - GenAI and LLMs for Location Intelligence @ Beans.AI

Blazingly fast way to build, track and deploy your models!