OpenShift Data Foundation - Noobaa Bucket Data Retention (Lifecycle)

- By: Thomas Jungbauer ( Lastmod: 2024-02-13 ) - 3 min read

Data retention or lifecycle configuration for S3 buckets is done by the S3 provider directly. The provider keeps track and files are automatically rotated after the requested time.

This article is a simple step-by-step guide to configure such lifecycle for OpenShift Data Foundation (ODF), where buckets are provided by Noobaa. Knowledge about ODF is assumed, however similar steps can be reproduced for any S3-compliant storage operator.

Prerequisites

  1. Installed OpensShift 4.x cluster (latest version during the creation of this article 4.14)

  2. Installed Open Data Foundation Operator (4.14+)

  3. Configured Multi-Cloud Gateway (for example) to provide OpenShift with object storage.

  4. Installed aws command line tool. The deployment for different operating system is explained at: Installing AWS Client

  5. A configured bucket and running openshift logging.

In the further steps we use the bucket that is created for OpenShift Logging (using Lokistack) as a reference.

Working with aws client

The first thing to do to work with the aws client is to retrieve the access key and secret key to be able to authenticate against the S3 API. During the deployment of OpenShift logging and the Lokistack a bucket has been created. The secret to authenticate against this bucket can be found in the openshift-logging namespace. In my example the name of this secret is logging-loki-s3 and it contains the following values:

Loki Secret

The easiest way to authenticate against the S3 API is to create the following configuration file:

> vim ~/.aws/credentials
[default]
aws_access_key_id     = <value of access_key_id>
aws_secret_access_key = <value of access_key_secret>
Be sure that this file is not globally accessible.

Listing Objects

As a first test, we can try to list objects. The following command uses the S3 endpoint and the name of the bucket. Again, these values can be found in the Loki secret, mentioned above:

$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3 ls s3://logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56
                           PRE application/
                           PRE audit/
                           PRE index/
                           PRE infrastructure/

This command simply lists the current content (some folders in our case) of this bucket.

Copy a file into the bucket

For further testing we copy a test file into the bucket. This can be any file, I have created an empty one called testfile.txt

$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3 cp testfile.txt s3://logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56

upload: ./testfile.txt to s3://logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56/testfile.txt

Listing the content again, will now show the uploaded testfile.txt

$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3 ls s3://logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56
                           PRE application/
                           PRE audit/
                           PRE index/
                           PRE infrastructure/
2024-02-10 04:23:14         32 testfile.txt

Verifying current Lifecycle configuration

To verify the current lifecycle configuration, execute the following command. It will show any configuration that is currently available, or an empty value if there is no such setting yet.

$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3api get-bucket-lifecycle-configuration --bucket logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56

Put a lifecycle configuration in place

To configure a data retention of 4 days for our bucket we first need to create the following JSON file:

cat logging-bucket-lifecycle.json
{
    "Rules": [
        {
            "Expiration": {
                "Days": 4 (1)
            },
            "ID": "123", (2)
            "Filter": {
                "Prefix": "" (3)
            },
            "Status": "Enabled"
        }
    ]
}
1Defines the retention period is days.
2A simple ID for this Rule. (string).
3A filter used to identify objects that a Lifecycle Rule applies to, here the filter is empty, so all objects are affected.
There are much more setting possible as describe at AWS S3API.

The following command will put the defined rule in place:

Already defined rules will be overwritten by the JSON file. If there have been previous configurations, put them into the JSON file as well.
$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3api put-bucket-lifecycle-configuration --bucket logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56 --lifecycle-configuration file://logging-bucket-lifecycle.json

Checking again with the previous command, the rule should now be configured:

$ aws --endpoint https://s3-openshift-storage.apps.ocp.local --no-verify-ssl s3api get-bucket-lifecycle-configuration --bucket logging-bucket-d39a258c-e971-4a0f-a1fa-302cb7e76a56

What now?

Now you have to wait. With the configuration used above, it takes 4 days until the file is rotated. I have tested this using a 1 day retention period and saw that the file will be rotated after about 30 hours. So the rotation will not happen exactly at 24 hours but a bit afterwards.

Consclusion

This article describes very, and I mean very, briefly how to configure such data retention for OpenShift Data Foundation. Unfortunately, public documentation can be confusing, so I summarized here the commands I have used.

There are some limitations with the Noobaa integration though. For example file transition (to a different storage class) is (currently) not supported.

Also, there are much more possible API calls that might be interesting. Please follow the AWS documentation: