Started in April 2021

AI for more efficient and effective Purple team exercises. We will use Reinforcement Learning to improve tactical sequential decision making. 


Large organisations perform so-called Red team and Purple team exercises to stress-test their cyber capabilities, train personnel, and eventually improve infrastructural resilience and incident response techniques & processes. However, these exercises are costly, time-consuming, and cannot provide coverage for the complete infrastructure. We believe AI should play a role in solving these challenges, and maybe even disrupt the industry for good!

Reinforcement Learning is a machine learning technique that could transform the way we look at cyber security. This technology has proven to be very promising to improve tactical sequential decision making. The ultimate goal of the PurpleAI project is to apply Reinforcement Learning for tactical sequential decision making during security control assessment exercises and ultimately real incidents. With a Proof-of-Concept (PoC) we aim to proof the business value for both the Red and Blue teamers. 

Our use-case: privilege escalation
For the PoC we have selected a feasible a use-case: privilege escalation on a Windows machine. For this use-case we have developed a Reinforcement Learning agent that learns how to escalate privileges based on a pre-defined set of actions (the action-space). The agent learns how particular sequences of actions affect the state of the target Windows system (the state-space). The agent will receive a reward when it changes the state of the system such that it has escalated privileges on the target machine. By running simulations time and again the agent will learn an optimal policy to escalate privileges based on the available actions.

The PoC is proof that our ideas can be turned into practice on a technical level, however we strongly believe the true value lies in the framework that we have developed. Imagine an actively maintained action-space with more and more complex attack techniques. With this growing action-space the Reinforcement Learning agent will re-optimize its policy, and learn how to perform multi-stage attacks on target infrastructures. The more actions there are in the agent’s arsenal the more complex the sequential decision making becomes, and thus the more added value Reinforcement Learning optimization will provide. 

We believe this technology will enable the Red team to cover more ground, and even allow for continual Purple teaming. Furthermore, with this technology we will be able to automatically report back actionable insights to the Blue team, enabling identification and remediation of potential attack vectors.

Project results

Explore phase
In the explore phase of this project we have focused on validating the desirability of this idea within the PCSI partner organizations. Furthermore, we have started a landscape assessment of the state-of-the-art technologies that could help us develop and implement a prototype. Blue team, Red team, and AI technologies were considered in this process.

At the end of the Explore phase we have concluded that:

  • Desirability of the PurpleAI idea is confirmed by more than 10 experts at 3 different PCSI partner organizations.
  • The potential of PurpleAI is to help the Red team with increasing their coverage, but it will especially help to alleviate the Blue team both directly and indirectly.
  • PurpleAI is an ambitious and long-term effort from a technical perspective. We will have to build on top of existing technologies, and potentially involve external organizations.

PoC phase
In the PoC phase we have continued the work by focusing on proofing that our ideas could be turned into a practical application. With the successful development of an agent that can autonomously learn how to escalate privileges on a Windows target machine we can conclude that it is technically feasible.

Pilot phase
In the Pilot phase we will focus on determining how this technology can be leveraged by the Red team and Blue team. For feasibility we will still stick to the privilege escalation use-case. A first step is collecting requirements for a Minimum Viable Product (MVP). The first milestone is to work towards a functional design of the MVP. In this process we will also revisit the state-of-the-art analysis of existing market solutions. We will sharply identify what gap in the current market solutions our MVP needs to fill.

The second milestone is the development and testing of the MVP within a realistic environment. Ideally we will do this within the real testing infrastructure of one of the partners. This phase will be closed off by assessing the user experiences of Red teamers and Blue teamers with respect to the MVP, and collecting points for improvement.

Exploit phase
We are pleased to announce we have published our vision paper: Human-competitive AI will disrupt the cyber security industry; prepare now! Available for download below.


This project is part of the trend

28 Opportunity and threat April 2025

Growing use of AI applications

30 Threat May 2026

Increase of malicious uses and abuses of AI

Beeldmerk PCSI
PCSI is een samenwerking van
    ABN-AMRO Achmea ASML Belastingdienst ING TNO