Steve's thoughts and experiments
The brains behind the machine image

The brains behind the machine

The main interface to the models is a chat interface. When you present a questions or insutrction, how do the models "know" what you mean? How does it understand the relationship between words?

The answer is embeddings!

An embedding is a high-dimensional vector representation of words, phrases or concepts. Instead of memorising words models map them into a numerical space where similar meanings are close together.

What are embeddings really?

Imagine you have a word like "dog". Computers don't understand words, to reduce complexity and to be able to align with computational ideas we conert words to numbers.

Previous attempts to do this used indexes and would assign words to IDs

dog → 1
cat → 2
fish → 3

But simple IDs doesn't capture enough context about a word. So when trying to model similarities (for example "dog" is closer to "cat" than to "fish") the ID model falls short. This is where embeddings come in. Instead of assigning meaningless numbers, embeddings represent words as coordinates in a high-dimension space.

For example:

dog → [0.2, -0.5, 0.7, 0.9]
cat → [0.1, -0.6, 0.8, 0.95]
fish → [0.9, 0.2, -0.3, -0.7]

This allows for closer relations you can see mathematically dog and cat are closer to each other.

Historically embeddings have been statically generated. Think semantic search engines, they would generate embeddings for a corpus of text and then use these to find similar documents. With the introduction of LLMs and methods provided by BERT (Bidirectional Encoder Representations from Transformers) and other models, the emebeddings can be generated dynamically. This means that the model can build context based on surrounding words, helping it understand the relationship between words in isolation.

For example — if we have two sentences:

  • Sentence 1: "The river bank is wet"
  • Sentence 2: "I deposited money in the bank"

We can see that bank has a different meaning in each sentence. By using dynamic embeddings the model can understand that bank in the first sentence is a noun meaning the edge of a river, and in the second sentence it is a noun meaning a financial institution.

Using static embeddings, both instances of "bank" would have the same vector. But with dynamic embeddings, the model assigns different vectors based on context, such as:

bank (river edge) → [0.2, -0.5, 0.7, 0.9]
bank (financial institution) → [0.9, 0.2, -0.3, -0.7]

This allows LLMs to distinguish meanings based on context, improving search results, text understanding, and chatbot responses.

A use case!

Lets take the following scenario; we want to build a support agent. The agent will provide level 1 support for a software product company.

We could simply integrate with an LLM like GPT-4 and be done with it. This would allow text-speach and provide a fair amount of expert knowledge, but what if we wanted to build a more context aware agent? We wanted it to be able to address customers and know that they have previously raised tickets related to their issues.

We could use embeddings to help! We can use static embeddings to augment the LLMs dynamic embeddings in a process called retrieval-augmented generation (RAG).

1️. Precompute & Store Embeddings

  • Process all historical support tickets (or any relevant documents).
  • Convert them into embeddings using an embedding model (e.g., OpenAI’s text-embedding-ada-002, BERT, or SBERT).
  • Store these embeddings in a vector database (FAISS, Pinecone, Weaviate, ChromaDB).

2️. Retrieve Relevant Context at Query Time

  • When a user asks a support question, convert it into an embedding.
  • Use cosine similarity (or another distance metric) to find the most relevant past tickets.
  • Retrieve those ticket descriptions and pass them as context to the LLM.

3️. Generate an Informed Response

  • The LLM now has dynamic knowledge of past tickets.
  • It combines its internal understanding with the retrieved support history.
  • It generates a context-aware response instead of relying only on pre-trained data.

This allows the agent to be more context aware and provide a better experience for the customer.

From this we can see that embeddings are a cruical part of the LLM's ability to understand and respond to user queries. They can be context aware to provide a better experience for the customer or statically generated to provide a more general knowledge base. By using different techniques we can adjust and augment embeddings from the open-source models to jump start implementatoins. We don't have to train a model from scratch, we can use the power of the community to build a better agent for our specific use case.