
Scalars, Vectors, and Tensors... Oh My!
When working with Large Language Models (LLMs) like GPT, the core mathematical structures you're dealing with are scalars, vectors, and tensors. Just like Dorothy braving the forest, I'm going to follow the yellow brick road and break down these concepts so that I have a better understanding of what they mean and maybe... just maybe I'll be in Kansas again.
🦁 Scalars: The building blocks
A scalar is just a single value — but in AI, it's rarely "just a number."
In LLMs, scalars can represent:
- The probability a model assigns to a word: "The next token is 'cat' with a probability of 0.87."
- Temperature settings in an API call, controlling randomness: "Set temperature to 0.7."
Shape: ()
Scalars are crucial in AI because they can represent anything quantifiable — from probabilities to token scores.
Example:
Scalar: 0.87
🐯 Vectors: Words as coordinates
A vector is a list of scalars — like [1, 2, 3]. But in LLMs, vectors have a special job: they represent word embeddings.
When you send a prompt to an LLM, every token (word, punctuation, or subword piece) is converted into a high-dimensional vector.
Example:
The word "cat" might be transformed into a 768-dimensional vector (in BERT):
Vector: [0.1, 0.2, 0.3, ...]
These vectors act like coordinates in a multi-dimensional space, where similar words (like "dog") have vectors close to "cat".
- Words with similar meanings are near each other in this space.
- Unrelated words are far apart.
So when an LLM "understands" language, it's really just calculating distances between these word vectors.
🐻 Tensors: Stacking the pieces
A tensor is just a generalisation of vectors and matrices — a way to handle multi-dimensional data.
In LLMs, tensors show up everywhere:
A 1D tensor is a vector — like a word embedding:
[0.12, 0.85, ..., -0.54] # single word embedding
A 2D tensor stacks vectors for each token in a sentence — this is the "sequence of embeddings":
[
[0.12, 0.85, ..., -0.54], # "cat"
[0.34, 0.65, ..., 0.21], # "sits"
[0.87, -0.23, ..., 0.44] # "here"
]
(Shape: [sequence_length, embedding_dim] — e.g., 3 tokens by 768 dimensions.)
A 3D tensor appears when you process batches of sentences — useful for handling multiple API calls at once:
Shape: [batch_size, sequence_length, embedding_dim]
Example: processing 4 prompts at the same time,
each 10 tokens long, with 768-dimensional embeddings:
[
[
[0.12, 0.85, ..., -0.54], # "cat" represented by 768 items in the vector
[0.34, 0.65, ..., 0.21], # "sits" represented by 768 items in the vector
[0.87, -0.23, ..., 0.44] # "here" represented by 768 items in the vector
]
]
The foundations... but what next?
The LLMs do not generate their word vectors on the fly. Instead, they use a pre-trained embedding matrix - a component learned during model training that maps tokens to vectors.
- An embedding matrix is a large 2D tensor that's part of the model's parameters
- When you input a token, the model simply looks up the corresponding vector in this matrix
- These embeddings capture semantic relationships between words based on how they were used in the training data
Think of it like this:
- Vocabulary size: 10,000 tokens
- Embedding dimension: 768
The embedding matrix is a 10,000 x 768 tensor, with one row per token in the vocabulary. When processing text, the model converts each token ID to its corresponding vector through a simple lookup operation.
# Input tokens (representing "the cat sits")
# Split into tokens: ["the", "cat", "sit", "##s"]
# Token IDs: [1234, 4321, 6789, 9876]
# Embedding matrix: 10000 x 768
# Output vectors:
# - embedding_matrix[1234] = [0.12, 0.85, ..., -0.54]
# - embedding_matrix[4321] = [0.34, 0.65, ..., 0.21]
# - embedding_matrix[6789] = [0.87, -0.23, ..., 0.44]
# - embedding_matrix[9876] = [0.56, 0.78, ..., -0.32]
While the embedding matrix is important, the true "intelligence" of an LLM comes from its transformer layers that process these embeddings through complex attention mechanisms and neural networks.
The following diagram shows the process of tokenisation, embedding, and transformer layers:
Conclusion
In this blog post, we've explored the fundamental mathematical structures of scalars, vectors, and tensors, and how they are used in LLMs.
- Scalars represent single values
- Vectors represent lists of scalars
- Tensors represent multi-dimensional arrays
We've also explored how these structures are used in tokenisation for LLM API interactions. So I think this is just an overview of the basics of LLMs. There is a lot more to it, but this is a good place to start.
Sources
- DEV Community
- TensorFlow
- Hugging Face - Understanding Embeddings
- Stanford CS224N: Natural Language Processing with Deep Learning
- The Illustrated Transformer by Jay Alammar
- PyTorch Documentation on Embeddings
- Google's Machine Learning Crash Course - Embeddings
- Andrej Karpathy's "The Unreasonable Effectiveness of Recurrent Neural Networks"