In this post we are exploring how to run MongoDB in GKE using StatefulSets.
A StatefulSet manages Pods that are based on an identical container specification. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods.
First we create a new cluster - obviously, you can also use an existing cluster.
gcloud container clusters create mongodb
We will be using a replica set so that our data is highly available and
redundant. I'm using the MongoDB replica set sidecar from https://github.com/thesandlord/mongo-k8s-sidecar
A "sidecar" is a helper container that helps the main container run its jobs and tasks. In this case the MongoDB replica set.
We now create a new StorageClass for our MongoDB instances.
cat googlecloud_ssd.yaml kind: StorageClass apiVersion: storage.k8s.io/v1beta1 metadata: name: fast provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd kubectl apply -f googlecloud_ssd.yaml
Next up is creating a headless services for MongoDB. Basically, a headless service is one that doesn't includes load balancing. In combination with StatefulSets, this will give us individual DNS names to access our pods. In this way we can connect to all of our MongoDB nodes individually.
cat mongo-service.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
We are now deploying the StatefulSet, that runs the MongoDB workload and orchestrates our resources. Looking at the yaml, the first section describes the StatefulSet object.
As part of specs, the terminationGracePeriodSeconds is used to gracefully shutdown the pod when you scale down the number of replicas.
We then have the configurations for the two containers. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db: the location where MongoDB saves its data.
The second container runs the sidecar. This sidecar container will configure the MongoDB replica set automatically.
Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It provisions a 100 GB disk for each MongoDB replica.
cat mongo-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: "mongo"
replicas: 3
selector:
matchLabels:
role: mongo
template:
metadata:
labels:
role: mongo
environment: test
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo
command:
- mongod
- "--replSet"
- rs0
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-persistent-storage
mountPath: /data/db
- name: mongo-sidecar
image: cvallance/mongo-k8s-sidecar
env:
- name: MONGO_SIDECAR_POD_LABELS
value: "role=mongo,environment=test"
volumeClaimTemplates:
- metadata:
name: mongo-persistent-storage
annotations:
volume.beta.kubernetes.io/storage-class: "fast"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Gi
We can now review the configuration from the GCP console
As we have our pods running, we connect to the first replica set member and initiate the replica set configuration.
kubectl exec -ti mongo-0 -- mongosh
rs.initiate()
Each pod in a StatefulSet backed by a headless service will have a stable DNS name. Following the following naming convention: <pod-name>.<service-name>
We can now connect to our DB from the application. Below the example of my connection string, including the replica set.
"mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/?replicaSet=rs0"



