Configure Weaviate Multimodal: A Vector Hub for Generative AI

Generative AI has transformed how we handle data, creating text, code, and more with stunning flexibility. To make the most of these outputs, you need a way to store and search them efficiently, and that’s where Weaviate shines. As an open-source vector database, Weaviate supports multimodal data—think text snippets alongside Python functions—turning them into embeddings for fast, meaningful queries. Whether you’re a developer building AI-driven media, a researcher exploring machine learning art, or a tech enthusiast diving into generative systems, configuring Weaviate for multimodal use opens up new possibilities. In this guide, we’ll walk you through installing the Weaviate client and Docker, configuring a local instance with authentication, defining a multimodal schema for text and code, adding a vectorizer module like text2vec-openai, and testing with sample embeddings—all laid out naturally and clearly.

Aimed at coders and AI practitioners, this tutorial builds on Text-to-Vector Pipeline Setup and supports projects like Code Generation with Codex. By the end, you’ll have a configured Weaviate instance ready to manage multimodal data for your generative AI workflows as of April 10, 2025. Let’s get started with this vector hub setup, step by step.

Why Configure Weaviate for Multimodal Data?

Weaviate is a vector database designed to store embeddings—numerical representations of data like text or code—optimized for similarity searches using metrics like cosine distance. Unlike traditional databases with rows and columns, it handles multimodal data, letting you mix text, code, or even images in one system—see What Is Generative AI and Why Use It?. With a vectorizer like text2vec-openai, it can automatically turn your data into vectors, making it a powerhouse for AI projects.

Why use it? It’s open-source, free to run locally or scalable in the cloud. It’s flexible, supporting diverse data types in one place, and fast, delivering query results in milliseconds. Configuring it for multimodal use lets you store and search generative AI outputs—like code from Codex—effortlessly. Let’s set it up naturally.

Step 1: Install Weaviate Client and Docker

Kick off by installing the Weaviate client in Python and Docker to run the database locally.

Setting Up Your Environment

You’ll need Python 3.8 or higher and pip, plus Docker for hosting Weaviate. Open a terminal—VS Code is a great pick with its built-in terminal, editor, and debugging tools—and check your setup:

python --version

Expect “Python 3.11.7,” the stable version as of April 2025, known for solid performance. If it’s missing, download it from python.org. During setup, check “Add Python 3.11 to PATH” so python runs from any terminal spot, avoiding path issues.

Next, confirm pip:

pip --version

Look for “pip 23.3.1” or a similar version. If it’s not there, install it with:

python -m ensurepip --upgrade
python -m pip install --upgrade pip

Install the Weaviate client and OpenAI for vectorization:

pip install weaviate-client openai python-dotenv

Here’s what each does:

  • weaviate-client: The Weaviate library, version 3.26.2 or newer, about 2 MB, connects your script to the database for storing and querying vectors.
  • openai: The OpenAI library, version 0.28.1+, under 1 MB, provides a vectorizer module to turn data into embeddings.
  • python-dotenv: A small utility, around 100 KB, loads API keys from a .env file, keeping them secure.

Install Docker from docker.com—it’s a platform to run Weaviate in a container. Check it’s working:

docker --version

Expect “Docker version 20.10.24” or similar. This sets up your toolkit.

How It Works

  • python --version: Shows your Python version, like 3.11.7, confirming it’s ready for weaviate-client and other libraries.
  • pip --version: Verifies pip is installed, letting you pull packages from PyPI without trouble.
  • pip install ...: Fetches weaviate-client, openai, and python-dotenv, setting them up in your environment. weaviate-client links to the database, openai handles embeddings, and python-dotenv keeps keys safe.
  • docker --version: Confirms Docker is ready, ensuring you can run Weaviate locally.

This gets your environment set—next, configure Weaviate.

Step 2: Configure Local Weaviate Instance with Authentication

Configure a local Weaviate instance with authentication to secure your database.

Setting Up Weaviate

Get an OpenAI API key from platform.openai.com (free tier: $5 credit) and create a folder:

mkdir WeaviateMulti
cd WeaviateMulti

Add to .env:

OPENAI_API_KEY=sk-abc123xyz
WEAVIATE_API_KEY=your-weaviate-key  # Optional custom key

Run Weaviate with Docker, enabling authentication:

docker run -d -p 8080:8080 \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='false' \
  -e AUTHENTICATION_APIKEY_ENABLED='true' \
  -e AUTHENTICATION_APIKEY_ALLOWED_KEYS='your-weaviate-key' \
  -e AUTHENTICATION_APIKEY_USERS='user1' \
  -e WEAVIATE_DEFAULT_VECTORIZER_MODULE='text2vec-openai' \
  -e DEFAULT_VECTORIZER_OPENAI_APIKEY='sk-abc123xyz' \
  semitechnologies/weaviate:latest

Create weaviate_setup.py:

import weaviate
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Connect to Weaviate with authentication
client = weaviate.Client(
    url="http://localhost:8080",
    auth_client_secret=weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY"))
)

# Verify connection
print("Weaviate Instance Running:")
print(client.is_ready())

Run python weaviate_setup.py, and expect:

Weaviate Instance Running:
True

How It Works

  • .env file: Stores your OpenAI and Weaviate API keys, keeping them secure for loading into the script.
  • docker run ...: Starts Weaviate locally with parameters:
    • -d -p 8080:8080: Runs in detached mode, mapping port 8080 for access.
    • AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='false': Turns off open access, requiring a key.
    • AUTHENTICATION_APIKEY_ENABLED='true': Enables API key authentication.
    • AUTHENTICATION_APIKEY_ALLOWED_KEYS='your-weaviate-key': Sets an allowed key (e.g., “your-weaviate-key”).
    • AUTHENTICATION_APIKEY_USERS='user1': Assigns the key to a user.
    • WEAVIATE_DEFAULT_VECTORIZER_MODULE='text2vec-openai': Uses OpenAI’s vectorizer by default.
    • DEFAULT_VECTORIZER_OPENAI_APIKEY: Links your OpenAI key for vectorization.
  • client = weaviate.Client(...): Connects to Weaviate at “http://localhost:8080” with the API key for secure access.
  • client.is_ready(): Checks if Weaviate is running, returning True if all’s well.

This secures your local instance—next, define a schema.

Step 3: Define Multimodal Schema for Text and Code

Define a multimodal schema in Weaviate to store text and code with properties.

Coding the Schema

Update weaviate_setup.py:

import weaviate
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Connect to Weaviate
client = weaviate.Client(
    url="http://localhost:8080",
    auth_client_secret=weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY"))
)

# Define multimodal schema
schema = {
    "classes": [{
        "class": "MultimodalData",
        "description": "Store text and code snippets with metadata",
        "properties": [
            {"name": "content", "dataType": ["text"], "description": "Text or code content"},
            {"name": "type", "dataType": ["string"], "description": "Type: text or code"},
            {"name": "source", "dataType": ["string"], "description": "Origin of the data"}
        ],
        "vectorizer": "text2vec-openai",
        "moduleConfig": {
            "text2vec-openai": {
                "model": "ada",
                "modelVersion": "002",
                "type": "text"
            }
        }
    }]
}

# Create schema if not exists
if not client.schema.contains(schema):
    client.schema.create(schema)
    print("Schema Created:")
else:
    print("Schema Already Exists:")

# Display schema
print(client.schema.get()["classes"][0]["class"])

Run python weaviate_setup.py, and expect:

Schema Created:
MultimodalData

How It Works

  • schema = {...}: Defines a “MultimodalData” class with:
    • class: Names it “MultimodalData,” a container for your data.
    • description: Notes its purpose—storing text and code with metadata.
    • properties: Lists fields:
      • content: Holds text or code as a text type. Advanced use: Add index=True for faster searches.
      • type: Tags it as “text” or “code” with string. Advanced use: Set index=True for type-based filtering.
      • source: Tracks origin (e.g., “manual”) as string. Advanced use: Use multiValue=True for multiple sources.
    • vectorizer="text2vec-openai": Uses OpenAI’s vectorizer to auto-generate embeddings.
    • moduleConfig: Configures the vectorizer:
      • model="ada": Picks ada for embeddings (768D). Advanced use: Use babbage for different granularity.
      • modelVersion="002": Sets version “002” for consistency. Advanced use: Test “001” for legacy compatibility.
      • type="text": Treats data as text for vectorization.
  • client.schema.create(schema): Adds the schema to Weaviate if it’s new, structuring your database.
  • client.schema.get(): Retrieves the schema, confirming “MultimodalData” is set.

This organizes your multimodal data—next, add a vectorizer.

Step 4: Add Vectorizer Module (e.g., text2vec-openai)

Add the text2vec-openai vectorizer module to turn text and code into embeddings automatically.

Configuring the Vectorizer

The vectorizer is already set in the Docker command and schema from Step 2 and 3. Update weaviate_setup.py to verify:

import weaviate
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Connect to Weaviate
client = weaviate.Client(
    url="http://localhost:8080",
    auth_client_secret=weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY"))
)

# Define schema (already created in Step 3)
schema = {
    "classes": [{
        "class": "MultimodalData",
        "description": "Store text and code snippets with metadata",
        "properties": [
            {"name": "content", "dataType": ["text"], "description": "Text or code content"},
            {"name": "type", "dataType": ["string"], "description": "Type: text or code"},
            {"name": "source", "dataType": ["string"], "description": "Origin of the data"}
        ],
        "vectorizer": "text2vec-openai",
        "moduleConfig": {
            "text2vec-openai": {
                "model": "ada",
                "modelVersion": "002",
                "type": "text"
            }
        }
    }]
}

# Create schema if not exists
if not client.schema.contains(schema):
    client.schema.create(schema)

# Verify vectorizer
modules = client.get_modules()
print("Vectorizer Module Configured:")
print(modules["modules"]["text2vec-openai"]["name"])

Run python weaviate_setup.py, and expect:

Vectorizer Module Configured:
text2vec-openai

How It Works

  • Docker command: Sets WEAVIATE_DEFAULT_VECTORIZER_MODULE='text2vec-openai' and links your OpenAI key, enabling the module at startup.
  • schema: Configures “MultimodalData” to use text2vec-openai, with model="ada" (768D embeddings), modelVersion="002" for consistency, and type="text" for processing. Advanced use: Set model="curie" for larger embeddings (4096D) in bigger setups.
  • client.get_modules(): Fetches module info, confirming text2vec-openai is active and named correctly.
  • Print: Shows the module’s name, verifying it’s set up.

This enables automatic vectorization—next, test with embeddings.

Step 5: Test with Sample Text and Code Embeddings

Test your setup by adding and querying sample text and code embeddings in Weaviate.

Coding the Test

Update weaviate_setup.py:

import weaviate
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Connect to Weaviate
client = weaviate.Client(
    url="http://localhost:8080",
    auth_client_secret=weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY"))
)

# Define schema (from Step 3)
schema = {
    "classes": [{
        "class": "MultimodalData",
        "description": "Store text and code snippets with metadata",
        "properties": [
            {"name": "content", "dataType": ["text"], "description": "Text or code content"},
            {"name": "type", "dataType": ["string"], "description": "Type: text or code"},
            {"name": "source", "dataType": ["string"], "description": "Origin of the data"}
        ],
        "vectorizer": "text2vec-openai",
        "moduleConfig": {
            "text2vec-openai": {
                "model": "ada",
                "modelVersion": "002",
                "type": "text"
            }
        }
    }]
}
if not client.schema.contains(schema):
    client.schema.create(schema)

# Add sample data
text_data = {"content": "Neural networks process data with layers.", "type": "text", "source": "manual"}
code_data = {"content": "def add(a, b): return a + b", "type": "code", "source": "script"}
client.data_object.create(text_data, "MultimodalData")
client.data_object.create(code_data, "MultimodalData")

# Query with sample text
query = "How do neural networks work?"
query_obj = client.query.get("MultimodalData", ["content", "type", "source"]).with_near_text({"concepts": [query]}).with_limit(1)
result = query_obj.do()

# Display result
print("Test Query Result:")
print(result["data"]["Get"]["MultimodalData"][0]["content"])
print(f"Type: {result['data']['Get']['MultimodalData'][0]['type']}")

Run python weaviate_setup.py, and expect:

Test Query Result:
Neural networks process data with layers.
Type: text

How It Works

  • client.data_object.create(...): Adds text and code samples to “MultimodalData,” triggering text2vec-openai to generate 768D embeddings automatically.
  • client.query.get(...): Queries Weaviate with:
    • "MultimodalData": Targets the class.
    • ["content", "type", "source"]: Retrieves these properties.
    • with_near_text({"concepts": [query]}): Searches for vectors near the query’s embedding, using cosine similarity.
    • with_limit(1): Returns the top match. Advanced use: Set to 3 for multiple related results.
  • result["data"]["Get"]["MultimodalData"][0]): Pulls the closest match, showing the stored text.

This confirms your multimodal setup works—you’re ready to roll!

Next Steps: Scaling Your Weaviate Setup

Your Weaviate instance is configured and tested! Add more data types or pair with Code Generation with Codex. You’ve built a multimodal vector hub, so keep growing and experimenting!

FAQ: Common Questions About Configuring Weaviate Multimodal

1. Can I use other vectorizers?

Yes, text2vec-transformers or text2vec-huggingface work—swap in the Docker config.

2. Why run locally with Docker?

It’s free and quick to start—cloud options scale better later.

3. What if authentication fails?

Check your API key in .env and Docker settings match.

4. How does text2vec-openai work?

It turns text into vectors via OpenAI’s models—see Weaviate Docs.

5. Can I store images too?

Yes, add an image property and a compatible vectorizer like img2vec.

6. Why test with text and code?

It verifies multimodal support, ensuring versatility.

Your questions are covered—configure with ease!