Development

Weaviate Database Get Schema

The schema allows us to define how the data in the database is structured. A schema describes the classes (types of objects) and their related properties that each data object can have.
Captain Salem 4 min read
Weaviate Database Get Schema

Weaviate is a free, open-source, scalable, decentralized knowledge graph leveraging machine learning for semantic search and analysis. Weaviate organizes data objects based on semantic meanings as a vector search engine and utilizes artificial intelligence to understand the structure and meaning of data objects. Finally, Weavites stores them in a graph database as vectors allowing them to search and access quickly.

One of the most crucial features of Weaviate is a schema. The schema allows us to define how the data in the database is structured. A schema describes the classes (types of objects) and their related properties that each data object can have.

Remembering that the data schema is flexible and can change over time to adapt to your application needs is good.

The goal of this tutorial, however, is to learn how to use the Get Schema functionality provided by the Weaviate SDKs and API endpoints to help you retrieve the current schema structure from a given Weaviate instance.

Prerequisites

Before we get started, ensure that you have the following:

  1. A running Weaviate instance – You can configure a local cluster using Docker or Kubernetes or have a hosted instance on the Weaviate cloud.
  2. Basic familiarity with RESTful APIs and JSON data structure, as we will interact with the Weaviate API endpoints.
  3. A tool to send HTTP requests, such as curl command line tool, Postman, or any preferred HTTP client.
  4. Python knowledge as we will demonstrate how to use the Weaviate Python SDK to interact with the cluster.

Understanding the Weaviate Schema

You can think of a Weaviate scheme like a blueprint that defines the structure of the knowledge graph. A schema is composed of two main parts:

  1. Classes
  2. Properties.

Classes

Classes in a Weaviate schema define the types of objects that you wish to store in a given knowledge graph. For example, if we have a schema that stores book information, we can have a class such as Book, Author, Genre, etc.

Properties

Next, we have properties. Properties in the schema refer to attributes associated with a given class. For example, a Book class can have properties such as the title, publication date, author, and more.

Weaviate Create Schema

For demonstration purposes, let us start by creating a new schema. We can do this by sending a POST request to the Weaviate's /v1/schema API endpoint.

The following shows basic cURL commands to create the book and author schemas.

curl -X POST "http://localhost:8080/v1/schema" -H "Content-Type: application/json" -d '{"class": "Author", "properties": [{"name": "name","dataType": ["string"]}]}'
{
  "class": "Author",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "multiTenancyConfig": {
    "autoTenantActivation": false,
    "autoTenantCreation": false,
    "enabled": false
  },
  "properties": [
    {
      "dataType": ["text"],
      "indexFilterable": true,
      "indexRangeFilters": false,
      "indexSearchable": true,
      "name": "name",
      "tokenization": "whitespace"
    }
  ],
  "replicationConfig": {
    "asyncEnabled": false,
    "factor": 1
  },
  "shardingConfig": {
    "virtualPerPhysical": 128,
    "desiredCount": 1,
    "actualCount": 1,
    "desiredVirtualCount": 128,
    "actualVirtualCount": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  },
  "vectorIndexConfig": {
    "skip": false,
    "cleanupIntervalSeconds": 300,
    "maxConnections": 32,
    "efConstruction": 128,
    "ef": -1,
    "dynamicEfMin": 100,
    "dynamicEfMax": 500,
    "dynamicEfFactor": 8,
    "vectorCacheMaxObjects": 1000000000000,
    "flatSearchCutoff": 40000,
    "distance": "cosine",
    "pq": {
      "enabled": false,
      "bitCompression": false,
      "segments": 0,
      "centroids": 256,
      "trainingLimit": 100000,
      "encoder": {
        "type": "kmeans",
        "distribution": "log-normal"
      }
    },
    "bq": {
      "enabled": false
    },
    "sq": {
      "enabled": false,
      "trainingLimit": 100000,
      "rescoreLimit": 20
    }
  },
  "vectorIndexType": "hnsw",
  "vectorizer": "none"
}

NOTE: You can replace the address to the Weaviate cluster from http://localhost:8080 with your target address.

Weaviate Get Schema

The Get Schema functionality allows us to view the current schema. We can do this by either sending a GET request to the /v1/schema endpoint or using the provided SDK for your favorite programming language.

Using API Endpoint

Using the API endpoint, you can make a query to the /v1/schema endpoint, as shown in the example curl command:

curl -X GET "http://localhost:8080/v1/schema"

This command sends a GET request to the Weaviate instance running on a local machine and retrieves the current schema.

Since the command returns JSON data, we can pipe the output to JSON processing tools such as jq as shown:

curl -X GET http://localhost:8080/v1/schema | jq

Output

{
  "classes": [
    {
      "class": "Author",
      "invertedIndexConfig": {
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        },
        "cleanupIntervalSeconds": 60,
        "stopwords": {
          "additions": null,
          "preset": "en",
          "removals": null
        }
      },
      "multiTenancyConfig": {
        "autoTenantActivation": false,
        "autoTenantCreation": false,
        "enabled": false
      },
      "properties": [
        {
          "dataType": [
            "text"
          ],
          "indexFilterable": true,
          "indexRangeFilters": false,
          "indexSearchable": true,
          "name": "name",
          "tokenization": "whitespace"
        }
      ],
      "replicationConfig": {
        "asyncEnabled": false,
        "factor": 1
      },

Using the Python SDK

The second most common and efficient method of retrieving the schema information is the Python SDK.

Weaviate provides us with the Weaviate package from Pypi to interact with the Weaviate cluster. We can install it using the command:

pip3 install weaviate-client

Once installed, we can use the Weaviate package to query the schemas. An example code is shown below:

>>> import weaviate
>>> import json
>>> client = weaviate.Client("http://localhost:8080")
>>> schema = client.schema.get()
>>> print(json.dumps(schema))

Here's the breakdown of the code provided above.

  1. import weaviate – The first line imports the Weaviate client library, which allows us to connect and interact with the Weaviate cluster.
  2. import json – the second line imports Python's built-in json module, which we will use to handle the JSON data from the request.
  3. The third line allows us to create a client object that connects to a Weaviate instance running on localhost. You can replace this address with the address of your target Weaviate cluster.
  4. The fourth line uses the client.schema.get() method to retrieve the current schema of the Weaviate instance.
  5. Finally, we use the json.dumps(schema)) method to convert the retrieved schema from a Python dictionary to a JSON-formatted string.

We can then format the output to tools such as JQ and get a pretty output, as shown in the screenshot below:

Conclusion

In this tutorial, we learned how to use the Get Schema functionality of Weaviate to retrieve the current schema. This knowledge is essential to understand how the data is structured within your Weaviate instance.

Share
Comments
More from Cloudenv

Cloudenv

Developer Tips, Tricks and Tutorials.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Cloudenv.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.