Skip to content

Getting Started with RLMatrix

Introduction

When we write traditional programs, we tell the computer exactly what to do in every situation. For example, if we wanted to write a program that matches numbers, we might write:

if (input == pattern)
{
return "Correct!";
}
else
{
return "Try again!";
}

But what if we want our program to learn on its own? What if the rules are too complex to write out, or we don’t even know the rules ourselves? This is where reinforcement learning comes in.

Setting Up Your Project

You can follow along or clone this GitHub repository. First, let’s get everything installed:

Installing RLMatrix via NuGet
dotnet add package RLMatrix
dotnet add package RLMatrix.Toolkit

Your First Learning Environment

Let’s create something simple but meaningful - an environment where our AI will learn to match patterns. While this seems basic (and would be trivial to program directly), it introduces all the key concepts we need.

Here’s our complete environment:

PatternMatchingEnvironment.cs
using RLMatrix.Toolkit;
namespace PatternMatchingExample;
[RLMatrixEnvironment]
public partial class PatternMatchingEnvironment
{
private int pattern = 0;
private int aiChoice = 0;
private bool roundFinished = false;
// Simple counters for last 50 steps
private int correct = 0;
private int total = 0;
// Simple accuracy calculation
public float RecentAccuracy => total > 0 ? (float)correct / total * 100 : 0;
[RLMatrixObservation]
public float SeePattern() => pattern;
[RLMatrixActionDiscrete(2)]
public void MakeChoice(int choice)
{
aiChoice = choice;
roundFinished = true;
// Update counters
total++;
if (aiChoice == pattern) correct++;
}
[RLMatrixReward]
public float GiveReward() => aiChoice == pattern ? 1.0f : -1.0f;
[RLMatrixDone]
public bool IsRoundOver() => roundFinished;
[RLMatrixReset]
public void StartNewRound()
{
pattern = Random.Shared.Next(2);
aiChoice = 0;
roundFinished = false;
}
public void ResetStats()
{
correct = 0;
total = 0;
}
}

Training Your AI

Now comes the interesting part - teaching our AI to match patterns. We’ll use an algorithm called DQN (Deep Q-Network). Don’t worry too much about the name - it’s just one way of teaching AI to make decisions.

Here’s how we set up the training:

Program.cs
using RLMatrix.Agents.Common;
using RLMatrix;
using PatternMatchingExample;
Console.WriteLine("Starting pattern matching training...\n");
// Set up how our AI will learn
var learningSetup = new DQNAgentOptions(
batchSize: 32, // Learn from 32 experiences at once
memorySize: 1000, // Remember last 1000 attempts
gamma: 0.99f, // Care a lot about future rewards
epsStart: 1f, // Start by trying everything
epsEnd: 0.05f, // Eventually stick to what works
epsDecay: 150f // How fast to transition
);
// Create our environment
var environment = new PatternMatchingEnvironment().RLInit();
var env = new List<IEnvironmentAsync<float[]>> {
environment,
//new PatternMatchingEnvironment().RLInit() //you can add more than one to train in parallel
};
// Create our learning agent
var agent = new LocalDiscreteRolloutAgent<float[]>(learningSetup, env);
// Let it learn!
for (int i = 0; i < 1000; i++)
{
await agent.Step();
if ((i + 1) % 50 == 0)
{
Console.WriteLine($"Step {i + 1}/1000 - Last 50 steps accuracy: {environment.RecentAccuracy:F1}%");
environment.ResetStats();
Console.WriteLine("\nPress Enter to continue...");
Console.ReadLine();
}
}
Console.WriteLine("\nTraining complete!");
Console.ReadLine();

When you run this code, you’ll see the training progress displayed every 50 steps:

Training Progress
Starting pattern matching training...
Step 50/1000 - Last 50 steps accuracy: 48.0%
Press Enter to continue...
Step 100/1000 - Last 50 steps accuracy: 68.0%
Press Enter to continue...
Step 150/1000 - Last 50 steps accuracy: 86.0%
Press Enter to continue...
Step 200/1000 - Last 50 steps accuracy: 82.0%
Press Enter to continue...

Beyond Simple Matching

While our example is straightforward, the same principles apply to much more complex problems:

Test Your Understanding

Understanding Reinforcement Learning Basics

Next Steps

Ready to go further? Your next steps could be:

We have two main algorithms available:

  • DQN: What we just used, good for simple choices, benefits from large replay memory.
  • PPO: More advanced, handles continuous actions (like controlling speed or direction)