Overview of Red Hat's Multi Cloud Gateway (Noobaa)

- Toni Schmidbauer Toni Schmidbauer ( Lastmod: 2024-05-05 ) - 5 min read

This is my personal summary of experimenting with Red Hat's Multi Cloud Gateway (MCG) based on the upstream Noobaa project. MCG is part of Red Hat's OpenShift Data Foundation (ODF). ODF bundles the upstream projects Ceph and Noobaa.

Overview

Noobaa, or the Multicloud Gateway (MCG), is a S3 based data federation tool. It allows you to use S3 backends from various sources and

  • sync
  • replicate
  • or simply use existing

S3 buckets. Currently the following sources, or backing stores are supported:

  • AWS S3
  • Azure Blob
  • Google Cloud Storage
  • Any other S3 compatible storage, for example

    • Ceph
    • Minio

Noobaa also supports using a local file system as a backing store for S3.

The main purpose is to provide a single API endpoint for applications using various S3 backends.

One of the main features of Noobaa is the storage pipeline. With a standard Noobaa S3 bucket, when storing a new Object Noobaa executes the following steps:

  • Chunking of the Object
  • De-duplication
  • Compression
  • and Encryption

This means that data stored in public cloud S3 offerings is automatically encrypted. Noobaa also supports using Hashicorp Vault for storing and retrieving encryption keys.

If you need to skip the storage pipeline, Noobaa also supports namespace buckets. For example these type of buckets allow you to write directly to AWS S3 and retrieve Objects via Noobaa. Or it could be used to migrate buckets from one cloud provider to another.

Noobaa also has support for triggering JavaScript based function when

  • creating new objects
  • reading existing objects
  • deleting objects

Setup

With OpenShift Plus or an OpenShift Data Foundation subscription you can use the OpenShift Data Foundation Operator.

For testing Noobaa we used the standalone installation method without setting up Ceph storage (see here). OpenShift was running in AWS for testing.

If you would like to use the upstream version you can use the Noobaa operator (https://github.com/noobaa/noobaa-operator). This is what the OpenShift Data Foundation (ODF) is using as well.

Command line interface

Noobaa also comes with a command line interface noobaa. It's available via an ODF subscription or can be installed separately. See the noobaa-operator readme for more information.

Resources

Before using an S3 object store with Noobaa we need to create so called Resources. This can be done via the Noobaa user interface or via the command line. For example the following commands create a new Resource using an AWS S3 bucket as a backing store

# create an S3 bucket in eu-north-1
aws s3api create-bucket \
    --region eu-north-1 \
    --bucket tosmi-eu-north-1 \
    --create-bucket-configuration LocationConstraint=eu-north-1

# create an S3 bucket in eu-north-1
aws s3api create-bucket \
    --region eu-west-1 \
    --bucket tosmi-eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1

# create Noobaa backing store using the tosmi-eu-north-1 bucket above
noobaa backingstore create aws-s3 \
       --region eu-north-1 \
       --target-bucket tosmi-eu-north-1 aws-eu-north

# create Noobaa backing store using the tosmi-eu-west-1 bucket above
noobaa backingstore create aws-s3 \
       --region eu-west-1 \
       --target-bucket tosmi-eu-west-1 aws-eu-west

Or if we would like to use Azure blob

# create two resource groups for storage
az group create --location northeurope -g mcg-northeurope

# create two storage accounts
az storage account create --name mcgnortheurope -g mcg-northeurope --location northeurope --sku Standard_LRS --kind StorageV2

# create containers for storing blobs
az storage container create --account-name mcgnortheurope -n mcg-northeurope

# list storage account keys for noobaa
az storage account list
az storage account show -g mcg-northeurope -n mcgnortheurope
az storage account keys list -g mcg-westeurope -n mcgwesteurope
az storage account keys list -g mcg-northeurope -n mcgnortheurope

noobaa backingstore create \
       azure-blob azure-northeurope \
       --account-key="<the key>" \
       --account-name=mcgnortheurope \
       --target-blob-container=mcg-northeurope

Using

noobaa backingstore list

we are able to confirm that our stores were created successfully.

Buckets

After creating the backend stores we are able to create Buckets and define the layout of backends.

There are two ways how to create buckets, either directly via the Noobaa UI, or using Kubernetes (K8s) objects.

We will focus on using K8s objects in this post.

Required K8s objects

The Noobaa operator provides the following Custom Resource Definitions:

  • BackingStore: we already created BackingStores in the Resources section
  • BucketClass: a bucket class defines the layout of our bucket (single, mirrored or tiered)
  • StorageClass: a standard K8s StorageClass referencing the BucketClass
  • ObjectBucketClaim: A OBC or ObjectBucketClaim creates the bucket for us in Noobaa. Additionally the Noobaa operator creates a ConfigMap and a Secret with the same name as the Bucket, storing access details (ConfigMap) and credentials (Secret) for accessing the bucket.

BucketClass

Let's create a example BucketClass which mirrors objects between the AWS S3 buckets eu-west-1 and eu-north-1.

apiVersion: noobaa.io/v1alpha1
kind: BucketClass
metadata:
  labels:
    app: noobaa
  name: aws-mirrored-bucket-class
  namespace: openshift-storage
spec:
  placementPolicy:
    tiers:
    - backingStores:
      - aws-eu-north
      - aws-eu-west
      placement: Mirror

So we are defining a BucketClass aws-mirrored-bucket-class that has the following placement policy:

  • A single tier with one backing store
  • The backing store uses two AWS buckets

    • aws-eu-north
    • aws-eu-west
  • The placement policy is mirror, so all objects uploaded to buckets using this BucketClass will be mirrored between aws-eu-north and aws-eu-west.

A BucketClass could have multiple tiers, moving cold data transparently to a lower tier, but let's keep this simple.

StorageClass

After creating our BucketClass we are now able to define a standard K8s StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    description: Provides Mirrored Object Bucket Claims (OBCs) in AWS
  name: aws-mirrored-openshift-storage.noobaa.io
parameters:
  bucketclass: aws-mirrored-bucket-class
provisioner: openshift-storage.noobaa.io/obc
reclaimPolicy: Delete
volumeBindingMode: Immediate

This StorageClass uses our BucketClass aws-mirrored-bucket-class as a backend. All buckets created leveraging this StorageClass will mirror data between aws-eu-north and aws-eu-west (see the previous chapter).

ObjectBucketClaim

Finally we are able to create ObjectBucketClaims for projects requiring object storage. An ObjectBucketClaim is similar to an PersistentVolumeClaim. Every time a claim is created the Noobaa operator will create a corresponding S3 bucket for us.

Let's start testing this out by creating a new OpenShift project

oc new-project obc-test

Now we define a ObjectBucketClaim to create a new bucket for our application:

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  labels:
    app: noobaa
  name: aws-mirrored-claim
spec:
  generateBucketName: aws-mirrored
  storageClassName: aws-mirrored-openshift-storage.noobaa.io

We use the StorageClass created in the previous step. This will create

  • a S3 Bucket in the requested StorageClass
  • a ConfigMap storing access information
  • a Secret storing credentials for accessing the S3 Bucket

For testing we will upload some data via s3cmd and use a pod to monitor data within the bucket.

Let's do the upload with s3cmd, we need the following config file:

[default]
check_ssl_certificate = False
check_ssl_hostname = False
access_key = <access key>
secret_key = <secret key>
host_base = s3-openshift-storage.apps.ocp.aws.tntinfra.net
host_bucket = %(bucket).s3-openshift-storage.apps.ocp.aws.tntinfra.net

Of course you must change host-base according to your cluster name. It's a route in the openshift-storage namespace:

oc get route -n openshift-storage s3 -o jsonpath='{.spec.host}'

You can extract the access and secret key from the K8s secret via:

oc extract secret/aws-mirrored-claim --to=-

Copy the access key and the secret key to the s3 command config file (we've called our config noobaa-s3.cfg). Now we can list all available buckets via:

$ s3cmd ls -c noobaa-s3.cfg
2022-04-22 13:56  s3://aws-mirrored-c1087a17-5c84-4c62-9f36-29081a6cf5a4

Now we are going to upload a sample file:

$ s3cmd -c noobaa-s3.cfg put simple-aws-mirrored-obc.yaml s3://aws-mirrored-c1087a17-5c84-4c62-9f36-29081a6cf5a4
upload: 'simple-aws-mirrored-obc.yaml' -> 's3://aws-mirrored-c1087a17-5c84-4c62-9f36-29081a6cf5a4/simple-aws-mirrored-obc.yaml'  [1 of 1]
 226 of 226   100% in    0s   638.18 B/s  done

We can also list available files via

s3cmd -c noobaa-s3.cfg ls s3://aws-mirrored-c1087a17-5c84-4c62-9f36-29081a6cf5a4
2022-04-22 13:57          226  s3://aws-mirrored-c1087a17-5c84-4c62-9f36-29081a6cf5a4/simple-aws-mirrored-obc.yaml

Our we could use a Pod to list available files from within OpenShift. Note how we use the ConfigMap and the Secret the Noobaa operater created for us, when we created the ObjectBucketClaim:

apiVersion: batch/v1
kind: Job
metadata:
  name: s3-test-job
spec:
  template:
    metadata:
      name: s3-pod
    spec:
      containers:
      - image: d3fk/s3cmd:latest
        name: s3-pod
        env:
        - name: BUCKET_NAME
          valueFrom:
            configMapKeyRef:
              name: aws-mirrored-claim
              key: BUCKET_NAME
        - name: BUCKET_HOST
          valueFrom:
            configMapKeyRef:
              name: aws-mirrored-claim
              key: BUCKET_HOST
        - name: BUCKET_PORT
          valueFrom:
            configMapKeyRef:
              name: aws-mirrored-claim
              key: BUCKET_PORT
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-mirrored-claim
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-mirrored-claim
              key: AWS_SECRET_ACCESS_KEY
        command:
        - /bin/sh
        - -c
        - 's3cmd --host $BUCKET_HOST --host-bucket "%(bucket).$BUCKET_HOST" --no-check-certificate ls s3://$BUCKET_NAME'
      restartPolicy: Never

That's all for now. If time allows we are going to write a follow up blog post on

  • Replicating Buckets and
  • Functions