Development

Weaviate DB Restore Backup

In this tutorial, we will learn how to enable and run a backup operation in Weaviate using API endpoints.
Captain Salem 4 min read
Weaviate DB Restore Backup

As database administrators, performing regular backups of your data is an essential component. Even in vector databases such as Weaviate, performing backups is necessary as it allows to recover data in case of data loss.

Enabling Weaviate Backup Modules

To take backups in Weaviate, we must enable the backup provider modules. Although you can enable multiple backup modules for your providers, we will use the filesystem module in this tutorial to create backups in the local filesystem.

Enabling the filesystem backup module allows us to back up the Weaviate to the local filesystem instead of a remote backend such as S3, Google Bucket, etc. This is useful during development as it is a quick and easy setup for simplistic backups.

However, consider using other modules, such as cloud-based backup features if you are in production.

To allow backups in the local filesystem in the Weaviate cluster, we need to use the backup-filesystem to the ENABLE_MODULES environment variable.

This environment variable is responsible for determining the enabled modules in Weaviate.

Ensure the environment variable is as shown:

ENABLE_MODULES=backup-filesystem,text2vec-transformers

Once enabled, we can configure the path in the filesystem where the backups will be stored.

BACKUP_FILESYSTEM_PATH=/opt/weaviate/backups

This required parameter defines where all the Weaviate backups will be copied or retrieved from during restoration.

Weaviate Create Backup

Once you have configured the parameters for Weaviate backups on the filesystem, you can initiate a backup operation.

The most common method for initializing a new backup process is using API Endpoints. The method and API endpoint are shown below:

POST /v1/backups/{backend}

URL Parameter

This requires you to specify the target backup backend. Weaviate supports backup backends such as Amazon S3, Google Bucket, Azure Storage, and Filesystem.

Note: Ensure to provide the name of the backup provider without the prefix. For example, s3, gcs, or filesystem.

Request Body Parameters

In the request body, the request supports the following parameters, which determine the backup operation:

  1. Id – this provides the ID of the backup as a string. This string is useful as you need it for future requests such as backup restoration, status checking, etc.
  2. Include – this is a list of class names to be included in the backup. By default, Weaviate will include all the classes in the target schema.
  3. Exclude – this defines a list of class names to be excluded in the backup.

Weaviate Initiate Backup cURL.

The following example command shows how to use cURL and the Weaviate API endpoint to create a backup in the filesystem.

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}' \
http://localhost:8080/v1/backups/filesystem

The code above should create a backup to the filesystem called backup-1.

Including Specific Classes

We can also backup specific classes instead of the entire schema as demonstrated in the example request below:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": " backup-1",
"include": ["Books", "Person"]
}' \
http://localhost:8080/v1/backups/filesystem

In this case, we create a backup in Weaviate that only includes the Books and Person classes in the Weaviate schema.

Weaviae Initiate Backup – Python

The second method we can use to create a backup is the Weaviate Python Client. We can run the code as shown below.

import weaviate
client = weaviate.Client('http://localhost:8080')
result = client.backup.create(
backup_id="backup-1",
backend="filesystem",
include_classes=["Books", "Person"],
wait_for_completion=True,
)
print(result)

The code above tells Weaviate to back up the Books and the Person classes to the filesystem.

We also ensure that Weaviate will wait until the backup process is complete. Keep in mind that this will put Weaviate in an unusable state until the backup is complete. Avoid this option for large or automated backups.

Weaviate Get Backup Status

To get the status of a backup creation, you can use the get_create_status() method as shown in the example code below:

result = client.backup.get_create_status(
backup_id="backup-1",
backend="filesystem",
)
print(result)

This should return the status of the backup creation.

Weaviate Restore Backup

Once you have created, you will come instances where you need to restore a specific backup.

Weaviate allows you to restore any backup to any machine provided that the name and the number of the nodes between the source and target machine are identical.

Weaviate Restore – HTTP Request

As you can guess, the simplest method of restoring a given backup is using a HTTP request in the restore API endpoint.

The request and method is as shown:

POST /v1/backups/{backend}/{backup_id}/restore

URL Parameters

The following are the required parameters for the restoration using HTTP request:

  1. Backend – this specifies your target backup backend such as s3, gcp, filesystem.
  2. Backup_id – this specifies the ID of the backup you wish to restore.

Request Body Parameters

The request takes a json object with the following properties:

  1. Include – specifies a list of classes you wish to include from the backup.
  2. Exclude – specifies the list of classes you wish to exclude in the restoration.

Note: You cannot use include and exclude simultaneously. Set none or exactly one of those.

Weaviate Initiate Restore – cURL

The following command shows how to use cURL to invoke a backup restoration in Weaviate:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}' \
http://localhost:8080/v1/backups/filesystem/backup-1/restore

The command above should initiate a backup restoration for all the classes included in the backup.

Exclude Specific Classes

To exclude specific classes from the restoration, you can run the request as shown:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
"exclude": ["Person"]
}' \
http://localhost:8080/v1/backups/filesystem/backup-1/restore

We tell Weaviate to exclude the Person class from the restoration in this case.

Weaviate Initiate Restore – Python

We can also use the Python client to invoke a restoration process as shown in the code below:

result = client.backup.restore(
backup_id="backup-1"
backend="filesystem",
wait_for_completion=True,
)
print(result)

Similarly, this should restore the specified backup with all the supported classes.

Weaviate Get Restore Status

To check the status of a restoration process, you can run the code as:

result = client.backup.get_restore_status(
backup_id="my-very-first-backup",
backend="filesystem",
)
print(result)

This should return the status of the restoration process in an asynchronous manner.

Conclusion

In this tutorial, we learned how to configure backup operation in Weaviate, various methods of initiating and checking backup status, and the various methods and techniques of restoring a backup.

Share
Comments
More from Cloudenv

Cloudenv

Developer Tips, Tricks and Tutorials.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Cloudenv.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.