GPU-cluster - Thomas More

GPU-cluster - Thomas More
Image by Thierry Eeman

Recently, I had the opportunity to work on a project called "GPU-cluster", where I set up 18 computers to work together utilizing Flatcar Linux and Kubernetes. The cluster consisted of 9 dual-GPU computers with NVIDIA Quadro K620 GPUs and 9 CPU computers for more general tasks.

Provisioning Flatcar Linux with Typhoon and Ignition Files

To begin, I used Typhoon to provision Flatcar Linux on the worker nodes. Typhoon is an open-source project that allows to automate the installation and configuration of Flatcar Linux.

It utilizes CoreOS Ignition files, which are JSON files that describe the desired state of the system. These files include information such as users, groups, and SSH keys, as well as the partition layout, filesystems and the config of the services.

This made the provisioning process simple and streamlined, allowing me to set up the operating system on all worker nodes in a consistent and efficient manner.

Installing Kubernetes and Bootstrapping the Cluster

Once the operating system was set up, I installed Kubernetes onto the worker nodes and bootstrapped the entire cluster. Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of containerized applications. It works by abstracting the underlying infrastructure and providing a unified API for managing containerized workloads.

I configured the master node, which is responsible for the overall management of the cluster. It runs the Kubernetes control plane components such as the API server, etcd, and the controller manager. The configuration node, on the other hand, is responsible for maintaining the configuration of the cluster, such as adding or removing nodes.

The worker nodes are the actual machines that run the containerized workloads. They communicate with the master node to receive instructions and report their status. In my cluster, the worker nodes were the 9 dual-GPU computers and 7 CPU computers.

Networking and Firewall

As part of the project, I also had to set up a layer 3 switch and connect all the computers to it. This allowed me to create a network for the cluster that was separate from the rest of the organization's network. It also enabled me to create VLANs for different types of traffic such as management and data traffic.

I also had to set up a firewall to protect the cluster from external threats. I configured the firewall to allow only necessary traffic to pass through, such as traffic to the Kubernetes API server and traffic between the nodes in the cluster.

Containerization and Hash Cracking

My colleague, Thierry Eeman, worked on creating the necessary containers for the cracking of a hash. Containers are a lightweight form of virtualization that allows to package an application and its dependencies together in a single image. This makes it easy to deploy and run the application on any machine that supports containers. In our case, the containers were used to run the hash cracking workloads on the GPU-enabled worker nodes. Unfortunately, Thierry's work has not been completed yet.

Conclusion

Throughout this project, I found that Flatcar Linux and Kubernetes provided a stable and reliable foundation for the GPU-cluster. The use of Typhoon made the provisioning process easy and efficient, allowing me to focus on the more important tasks such as configuring the cluster and installing Kubernetes. Kubernetes enabled me to manage the cluster in a unified and automated way, which made it easy to scale and deploy workloads. The network and firewall setup allowed me to secure the cluster and separate it from the rest of the organization's network.