Steve's thoughts and experiments

Adversarial Attacks on RAG Systems: Poisoning the Knowledge Base

03 April 2025

As we previously went through, a common pattern when implementing models into systems is to use RAG (retrieval augmented generation) by using domain specific data with the GenAI models. But what happens if the data source is compromised or poisoned? In this post we'll explore RAG poisoning attacks, their real-world implications and mitigation strategies to secure your AI implementations.

Lab 4 Building a RAG pipeline

26 March 2025

In this lab, I'm going to move away from security for a little while and look at how we can augment our LLM using a technique called Retrieval Augmented Generation (RAG).

The idea is that we can use a pre-trained LLM to answer questions about a specific dataset. We can then augment the pre-trained model with additional information from a knowledge base. To do this we usually convert the data into a vector space and storing this information in a vector database, like ChromaDB. In this we'll cover:

Store text embeddings in ChromaDB (an open source vector database)
Retrieve relevant knowledge dynamically
Use a local model to generate answers

The brains behind the machine

19 March 2025

The main interface to the models is a chat interface. When you present a questions or insutrction, how do the models "know" what you mean? How does it understand the relationship between words?

The answer is embeddings!

An embedding is a high-dimensional vector representation of words, phrases or concepts. Instead of memorising words models map them into a numerical space where similar meanings are close together.

Lab 3 Prompt Injection

17 March 2025

Previosuly I've looked at the use of white hat attacks. These test the models as you're training them and should be considered as part of the development process. A prompt injection attack is a type of attack that is used to manipulate the output of a model and is usually an attack of a deployed model.

Large language models generate responses based on user inputs and hidden system instructions. A prompt injection attack exploits this by tricking the AI into ignoring its original constraints, leading to unsafe, unintended, or malicious outputs.

Lab 2 C&W Attack

11 March 2025

In my previous post we explored the use of FGSM a powerful yet simple attack method for generating adversarial examples for LLMs. What happens if we need something more subtle and sophisticated?

Enter the Carlini and Wagner (C&W) attack — a method that iteratively optimises the perturbation to the input to maximise the loss function which generates adversarial examples with minimal distortion.

Scalars, Vectors, and Tensors... Oh My!

10 March 2025

When working with Large Language Models (LLMs) like GPT, the core mathematical structures you're dealing with are scalars, vectors, and tensors. Just like Dorothy braving the forest, I'm going to follow the yellow brick road and break down these concepts so that I have a better understanding of what they mean and maybe... just maybe I'll be in Kansas again.

Lab 1 FGSM

08 March 2025

In my first lab I'm going to explore the use of FGSM (fast gradient signed method) to generate adversarial examples for a LLM.

What is FGSM?

FGSM is one of the most famours adversarial attack methods. It is designed to trick a neural network by adding small, carefully-crafted noise (a perturbation) to the input with the goal of having the model misclassify the input.

Breaking into AI security, my journey from DevOps to AI

07 March 2025

Hello! My name is Steve, and I'm a DevOps engineer with a passion for security. I've been in the industry for around 15 years now, and I've seen a lot of changes. I've seen the rise of cloud computing, the adoption of DevOps practices, and the increasing importance of security. I've always dabbled in security, but I never really thought of it as a career path.

If I'm honest I have had some fear and anxiety around the LLMs and AI. I've seen the headlines, I've seen the hype, and I've seen the potential. I've also seen the risks, and I'm not sure how to navigate them. One thing is clear is that they're here to stay, and I need to find a way to move from fear to excitement.

Blog Archive

What is FGSM?