Bagel's RAG - Vector Embedding
This tutorial demonstrates how to use the Bagel library to create and query vector embeddings using the bagel-text and bagel-multi-modal models. You can try Google Colab to walkthrough step by step:
Prerequisites
Python 3.x
Bagel library (install via
pip install bagelML
)API key from https://bakery.bagel.net/api-key
Client Setup
Import the necessary libraries:
Create a Bagel client instance:
Set the API key as an environment variable:
Creating a Vector Text Embedding Asset
Define the
create_asset
function for creating a Vector Text Embedding Asset:
title
: The title or name for the dataset, e.g., "Apple".
dataset_type
: Defines the type of dataset being created. This could be a RAW, VECTOR, or MODEL dataset. They will be treated differently depending on the type.
tags
: A list of optional tags to further classify the asset for easier searching, e.g., ["AI", "DEMO", "TEST"].
category
: Classifies the dataset into a specific category such as Cat1 or Cat2 to help with organization and retrieval.
details
: Additional descriptive details or notes about the dataset.
user_id
: The unique identifier for the user creating the asset, e.g., "345281182308180743453".
embedding_model
: This specifies that it is a text embedding function with default as bagel-text
dimension
: This is the default dimension for text embedding which is 768
.
Call the
create_asset
function to create a new vector text embedding asset:
Adding Text Embeddings
Add text embeddings to the asset:
source
: Here you add the text to embed.
documents
: This is a long text that contains the source text you embedded.
Querying from Vector Embedding
Query the vector embedding asset:
where
: A filter to specify conditions on metadata. For example, this can limit the search to certain categories like {"category": "Cat2"}.
where_document
: Specifies conditions related to the document itself, such as whether the document is published, e.g., {"is_published": True}.
n_results
: The number of results to return, e.g., 2.
include
: A list of fields to include in the result, such as metadatas, documents, or distances for contextual information related to each result.
query_texts
: The search term(s) used to query the asset, e.g., ["google"].
padding
: A Boolean flag indicating whether to pad the results to a fixed size. This could be useful for ensuring uniformity in certain result sets, e.g., False.
Creating a Vector Multi-Modal Embedding Asset
Define the
create_asset
function with thebagel-multi-modal
embedding model:
title
: The title or name for the dataset, e.g., "Apple".
dataset_type
: Defines the type of dataset being created. This could be a RAW, VECTOR, or MODEL dataset. They will be treated differently depending on the type.
tags
: A list of optional tags to further classify the asset for easier searching, e.g., ["AI", "DEMO", "TEST"].
category
: Classifies the dataset into a specific category such as Cat1 or Cat2 to help with organization and retrieval.
details
: Additional descriptive details or notes about the dataset.
user_id
: The unique identifier for the user creating the asset, e.g., "345281182308180743453".
embedding_model
: This specifies that it is a text embedding function with default as bagel-multi-modal
dimension
: This is the default dimension for text embedding which is 1408
.
You can find the created VECTOR asset on the Bakery. Simply login and search for the specific title under My Datasets
.
Call the
create_asset
function to create a new vector multi-modal embedding asset:
Adding Image Embeddings
Add image embeddings to the asset:
file_path
: This is the path to the image you want to embed.
Querying for Images
Query Search
Query the vector multi-modal embedding asset:
Converting Base64 to Image Object
Define the
load_and_display_image_from_base64
function:Convert the base64 image string to an image object:
This tutorial covers the basic steps for creating vector embedding assets, adding text and image embeddings, and querying the assets using the bagel-text and bagel-multi-modal models. Users can follow these guidelines to get started with the Bagel RAG Tutorial and explore the capabilities of vector embeddings.
Need extra help or just want to connect with other developers? Join our community on Discord for additional support 👾
Last updated