Skip to content





Table of Content

mongodb

https://www.mongodb.com/en-us

https://www.mongodb.com/docs/

https://github.com/mongodb/mongo

backup and restore

https://www.mongodb.com/docs/manual/tutorial/backup-and-restore-tools/

This operation was confirmed working using mongodb version 6 running as docker container.

# the container name is "mongodb6" in this example

# backup one database
docker exec mongodb6 mkdir -p /opt/backup
docker exec mongodb6 /usr/bin/mongodump -u username_here --password="password_here" --authenticationDatabase=admin --db="db_name_here" --gzip --out=/opt/backup
# retrieve the copy of the backup file to local machine outside docker container
docker cp mongodb6:/opt/backup/{db_name_here}/work.bson.gz ~/db_bak/.


# to restore, assuming the same version of database service is running
# prepare the backup directory and store the backup file there
docker exec mongodb6 mkdir -p /opt/backup/{db_name_here}
cd ~/db_bak
docker cp work.bson.gz mongodb6:/opt/backup/{db_name_here}/.

# and restore using the backup file
docker exec mongodb6 /usr/bin/mongorestore -u username_here --password="password_here" --authenticationDatabase=admin --gzip /opt/backup/{db_name_here}/work.bson.gz

# note that the parent directory name of the backup file is important
# because that becomes the name of the database

MongoDB Community Kubernetes Operator

https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/README.md

note

The community edition does not support changing the volume size.

installation

https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/docs/install-upgrade.md

# add the repository to the helm
helm repo add mongodb https://mongodb.github.io/helm-charts

# helm repo update

# see the available items in the mongodb repo
helm search repo mongodb

# see the list of available versions of mongodb community operator
helm search repo -l mongodb/community-operator

# get the values file of the interesting version
helm show values mongodb/community-operator --version=0.11.0 > mongodb-community-operator-0.11.0-values.yaml

# keep this one and generate a copy to edit and use
cp mongodb-community-operator-0.11.0-values.yaml mongodb-community-operator-values.yaml

# edit mongodb-community-operator-values.yaml file

# generate helm source and helm release flux manifest to let flux gitops process it
# and, create the namespace before passing the manifests to flux

# the mongodb community operator should be deployed

values file

These are the changes I made on the values file.

  • operator
    • watch namespace: all
    • extraenv
      • set my custom cluster domain name
  • database
    • namespace: mongo

script to generate flux source and helm release

#!/bin/bash

# add flux helmrepo to the manifest
flux create source helm mongodb \
    --url=https://mongodb.github.io/helm-charts \
    --interval=1h0m0s \
    --export >../mongodb-community-operator.yaml

# add flux helm release to the manifest including the customized values.yaml file
flux create helmrelease mongodb-community-operator \
    --interval=10m \
    --target-namespace=mongo \
    --source=HelmRepository/mongodb \
    --chart=community-operator \
    --chart-version=0.11.0 \
    --values=../values/mongodb-community-operator-values.yaml \
    --export >>../mongodb-community-operator.yaml

namespace

I like to create a namespace independent from helm release.

---
apiVersion: v1
kind: Namespace
metadata:
  name: mongo
  labels:
    service: mongo
    type: infrastructure

creating a database

This is my test mdbc manifest. Create the user secret separately. In this case, I created a secret named "mdbadmin-secret" in the same namespace.

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: testmongo
  namespace: mongo
spec:
  members: 3
  type: ReplicaSet
  version: "4.4.29"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: mdbadmin
      db: admin
      passwordSecretRef: # a reference to the secret that will be used to generate the user's password
        name: mdbadmin-secret
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
        - name: dbAdminAnyDatabase
          db: admin
        - name: readWriteAnyDatabase
          db: admin
      scramCredentialsSecretName: my-scram
  additionalMongodConfig:
    storage.wiredTiger.engineConfig.journalCompressor: zlib

  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "directpv-min-io"
            resources:
              requests:
                storage: 20Gi
        - metadata:
            name: logs-volume
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "directpv-min-io"
            resources:
              requests:
                storage: 2Gi

      template:
        spec:
          nodeSelector:
            role: storage-node
          containers:
            - name: mongod
              resources:
                limits:
                  cpu: "0.2"
                  memory: 250M
                requests:
                  cpu: "0.2"
                  memory: 200M
            - name: mongodb-agent
              resources:
                limits:
                  cpu: "0.2"
                  memory: 250M
                requests:
                  cpu: "0.2"
                  memory: 200M
          initContainers:
            - name: mongodb-agent-readinessprobe
              resources:
                limits:
                  cpu: "2"
                  memory: 200M
                requests:
                  cpu: "1"
                  memory: 100M

user secret

https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/docs/users.md#create-a-user-secret

deploying mdbc on a namespace other than the one mdbc operator is running

The required sa, roles, and rolebindings are not automatically created when deploying mdbc manifest.

Create this set on whichever namespace you are deploying mdbc database on, and the creation of necessary statefulset runs successfully.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: mongodb-database
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: mongodb-database
subjects:
  - kind: ServiceAccount
    name: mongodb-database
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: mongodb-database
rules:
  - apiGroups:
      - ""
    resources:
      - secrets
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - patch
      - delete
      - get
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mongodb-database

accessing the database

Various secrets are created by the operator.

kubectl -n mongo get secret
NAME                         TYPE     DATA   AGE
mdbadmin-secret              Opaque   1      125m
my-scram-scram-credentials   Opaque   6      23m
testmongo-admin-mdbadmin     Opaque   4      17m
testmongo-agent-password     Opaque   1      23m
testmongo-config             Opaque   1      23m
testmongo-keyfile            Opaque   1      23m

The name of the database (mdbc, MongoDBCommunity) is "testmongo", db name for the credentials is "admin", and the username created is "mdbadmin". You can find username, password, and connection string inside the secret. Run kubectl -n mongo get secret testmongo-admin-mdbadmin -o jsonpath='{.data.connectionString\.standard}' | base64 -d for example, and you get the decoded mongodb connection string.

kubectl -n mongo get secret testmongo-admin-mdbadmin -o json
{
  "apiVersion": "v1",
  "data": {
    "connectionString.standard": "base64 string here"
    "connectionString.standardSrv": "base64 string here"
    "password": "base64 string here"
    "username": "base64 string here"
  },
  "kind": "Secret",
  "metadata": {
    "creationTimestamp": "2024-08-29T00:58:24Z",
    "name": "testmongo-admin-mdbadmin",
    "namespace": "mongo",
    "ownerReferences": [
      {
        "apiVersion": "mongodbcommunity.mongodb.com/v1",
        "blockOwnerDeletion": true,
        "controller": true,
        "kind": "MongoDBCommunity",
        "name": "testmongo",
        "uid": "2134b34c-edd2-4117-8220-cfcf959d8199"
      }
    ],
    "resourceVersion": "54265198",
    "uid": "032abf28-7537-4bee-a14e-f9114d5dd8b7"
  },
  "type": "Opaque"
}

using pymongo

pip install "pymongo[srv]" and try lines below in interactive mode.

from pymongo import MongoClient

username = "username_here"
password = "password_here"
db_server = "server_part_of_connection_string_here"
# example in case of my testmongo mdbc in mongo namespace
# db_server = "testmongo-svc.mongo.svc.cluster.local/admin?replicaSet=testmongo&ssl=false"

client = MongoClient("mongodb+srv://%s:%s@%s" % (username, password, db_server),serverSelectionTimeoutMS=4000,)

client.admin.command("ping")
client.server_info().get("version")

# create database & collection, and add a record
db = client["testdb"]
col = db["testcol"]

dct = {"work_id": "123", "title": "testtitle"}

col.find_one_and_update({"work_id": "123"}, {"$set": dct}, upsert=True)

# confirm that the new database is created
for c in client.list_databases():
    print(c)

client.close()

deleting the mdbc

In my case using flux, I just removed the manifest from the kustomization and the database related workloads are gone. I had to manually delete the pvc used.

$ kubectl -n mongo get pvc
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
data-volume-testmongo-0   Bound    pvc-4650f7d7-2b00-41b2-927c-5b5efb8d80c0   20Gi       RWO            directpv-min-io   <unset>                 22m
data-volume-testmongo-1   Bound    pvc-0ebecb90-01a5-4e2a-a582-776c54a7bbdf   20Gi       RWO            directpv-min-io   <unset>                 21m
data-volume-testmongo-2   Bound    pvc-af616099-b7c9-4c73-a01e-f711305478ae   20Gi       RWO            directpv-min-io   <unset>                 21m
logs-volume-testmongo-0   Bound    pvc-2d6fd006-28ac-460b-9a65-d2dc4a997f72   2Gi        RWO            directpv-min-io   <unset>                 22m
logs-volume-testmongo-1   Bound    pvc-2118b5d5-a6b0-4229-ba13-95a07a70cde8   2Gi        RWO            directpv-min-io   <unset>                 21m
logs-volume-testmongo-2   Bound    pvc-f183b82e-22f5-4278-a7fd-19cb7d65d1c5   2Gi        RWO            directpv-min-io   <unset>                 21m
$ kubectl -n mongo delete pvc logs-volume-testmongo-0
persistentvolumeclaim "logs-volume-testmongo-0" deleted
$ kubectl -n mongo delete pvc logs-volume-testmongo-1
persistentvolumeclaim "logs-volume-testmongo-1" deleted
$ kubectl -n mongo delete pvc logs-volume-testmongo-2
persistentvolumeclaim "logs-volume-testmongo-2" deleted
$ kubectl -n mongo delete pvc data-volume-testmongo-2
persistentvolumeclaim "data-volume-testmongo-2" deleted
$ kubectl -n mongo delete pvc data-volume-testmongo-1
persistentvolumeclaim "data-volume-testmongo-1" deleted
$ kubectl -n mongo delete pvc data-volume-testmongo-0
persistentvolumeclaim "data-volume-testmongo-0" deleted

backup and restore of mongodb data running on mdbc

  • create a pod with pvc running python image
    • PV might not be needed if the data is small enough
      • specific numbers here...?
  • get inside the container and run python
    • kubectl -n namespace_for_the_application exec -it pod/backup -- bash
    • pip install -U pip setuptools
    • pip install "pymongo[srv]"
    • OR, just have the pod spin up the container with pymongo installed
  • run bson dump script
  • kubectl cp to copy the backup data to somewhere else
import bson
from pymongo import MongoClient
import os

db_username = os.environ.get("ENV_FOR_DB_USERNAME")
db_password = os.environ.get("ENV_FOR_DB_PASSWORD")
db_server = os.environ.get("ENV_FOR_DB_SERVER")

db_name = "database_name_here"
path = "/opt/backup"  # or any directory wich pvc volume mount

connection_string = ("mongodb+srv://%s:%s@%s" % (db_username, db_password, db_server,),)

conn = MongoClient(connection_string, serverSelectionTimeoutMS=4000)

def dump(conn, db_name, path):
    db = conn[db_name]
    # all available collections
    collections = db.list_collection_names()
    for coll in collections:
        with open(os.path.join(path, f'{coll}.bson'), 'wb+') as f:
            for doc in db[coll].find():
                f.write(bson.BSON.encode(doc))

    return 0

def restore(conn, db_name, path):
    db = conn[db_name]
    for coll in os.listdir(path):
        if coll.endswith('.bson'):
            with open(os.path.join(path, coll), 'rb+') as f:
                db[coll.split('.')[0]].insert_many(bson.decode_all(f.read()))

    return 0

# run whichever you need, dump() or restore()