As database administrators, performing regular backups of your data is an essential component. Even in vector databases such as Weaviate, performing backups is necessary as it allows to recover data in case of data loss.
Enabling Weaviate Backup Modules
To take backups in Weaviate, we must enable the backup provider modules. Although you can enable multiple backup modules for your providers, we will use the filesystem module in this tutorial to create backups in the local filesystem.
Enabling the filesystem backup module allows us to back up the Weaviate to the local filesystem instead of a remote backend such as S3, Google Bucket, etc. This is useful during development as it is a quick and easy setup for simplistic backups.
However, consider using other modules, such as cloud-based backup features if you are in production.
To allow backups in the local filesystem in the Weaviate cluster, we need to use the backup-filesystem to the ENABLE_MODULES
environment variable.
This environment variable is responsible for determining the enabled modules in Weaviate.
Ensure the environment variable is as shown:
ENABLE_MODULES=backup-filesystem,text2vec-transformers
Once enabled, we can configure the path in the filesystem where the backups will be stored.
BACKUP_FILESYSTEM_PATH=/opt/weaviate/backups
This required parameter defines where all the Weaviate backups will be copied or retrieved from during restoration.
Weaviate Create Backup
Once you have configured the parameters for Weaviate backups on the filesystem, you can initiate a backup operation.
The most common method for initializing a new backup process is using API Endpoints. The method and API endpoint are shown below:
POST /v1/backups/{backend}
URL Parameter
This requires you to specify the target backup backend. Weaviate supports backup backends such as Amazon S3, Google Bucket, Azure Storage, and Filesystem.
Note: Ensure to provide the name of the backup provider without the prefix. For example, s3
, gcs
, or filesystem
.
Request Body Parameters
In the request body, the request supports the following parameters, which determine the backup operation:
- Id – this provides the ID of the backup as a string. This string is useful as you need it for future requests such as backup restoration, status checking, etc.
- Include – this is a list of class names to be included in the backup. By default, Weaviate will include all the classes in the target schema.
- Exclude – this defines a list of class names to be excluded in the backup.
Weaviate Initiate Backup cURL.
The following example command shows how to use cURL and the Weaviate API endpoint to create a backup in the filesystem.
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}' \
http://localhost:8080/v1/backups/filesystem
The code above should create a backup to the filesystem called backup-1
.
Including Specific Classes
We can also backup specific classes instead of the entire schema as demonstrated in the example request below:
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": " backup-1",
"include": ["Books", "Person"]
}' \
http://localhost:8080/v1/backups/filesystem
In this case, we create a backup in Weaviate that only includes the Books and Person classes in the Weaviate schema.
Weaviae Initiate Backup – Python
The second method we can use to create a backup is the Weaviate Python Client. We can run the code as shown below.
import weaviate
client = weaviate.Client('http://localhost:8080')
result = client.backup.create(
backup_id="backup-1",
backend="filesystem",
include_classes=["Books", "Person"],
wait_for_completion=True,
)
print(result)
The code above tells Weaviate to back up the Books and the Person classes to the filesystem.
We also ensure that Weaviate will wait until the backup process is complete. Keep in mind that this will put Weaviate in an unusable state until the backup is complete. Avoid this option for large or automated backups.
Weaviate Get Backup Status
To get the status of a backup creation, you can use the get_create_status() method as shown in the example code below:
result = client.backup.get_create_status(
backup_id="backup-1",
backend="filesystem",
)
print(result)
This should return the status of the backup creation.
Weaviate Restore Backup
Once you have created, you will come instances where you need to restore a specific backup.
Weaviate allows you to restore any backup to any machine provided that the name and the number of the nodes between the source and target machine are identical.
Weaviate Restore – HTTP Request
As you can guess, the simplest method of restoring a given backup is using a HTTP request in the restore API endpoint.
The request and method is as shown:
POST /v1/backups/{backend}/{backup_id}/restore
URL Parameters
The following are the required parameters for the restoration using HTTP request:
- Backend – this specifies your target backup backend such as s3, gcp, filesystem.
- Backup_id – this specifies the ID of the backup you wish to restore.
Request Body Parameters
The request takes a json object with the following properties:
- Include – specifies a list of classes you wish to include from the backup.
- Exclude – specifies the list of classes you wish to exclude in the restoration.
Note: You cannot use include and exclude simultaneously. Set none or exactly one of those.
Weaviate Initiate Restore – cURL
The following command shows how to use cURL to invoke a backup restoration in Weaviate:
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}' \
http://localhost:8080/v1/backups/filesystem/backup-1/restore
The command above should initiate a backup restoration for all the classes included in the backup.
Exclude Specific Classes
To exclude specific classes from the restoration, you can run the request as shown:
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
"exclude": ["Person"]
}' \
http://localhost:8080/v1/backups/filesystem/backup-1/restore
We tell Weaviate to exclude the Person class from the restoration in this case.
Weaviate Initiate Restore – Python
We can also use the Python client to invoke a restoration process as shown in the code below:
result = client.backup.restore(
backup_id="backup-1"
backend="filesystem",
wait_for_completion=True,
)
print(result)
Similarly, this should restore the specified backup with all the supported classes.
Weaviate Get Restore Status
To check the status of a restoration process, you can run the code as:
result = client.backup.get_restore_status(
backup_id="my-very-first-backup",
backend="filesystem",
)
print(result)
This should return the status of the restoration process in an asynchronous manner.
Conclusion
In this tutorial, we learned how to configure backup operation in Weaviate, various methods of initiating and checking backup status, and the various methods and techniques of restoring a backup.