Alfonso Catrón

WordPress Engineer, Support & Tooling

The Devil is in the details: A gamified approach to train troubleshooting skills

One of the hardest things when training skills in technical support is finding ways to connect theory with practice. You can talk about how things work, read documentation, and walk through examples. But nothing compares to the moment when you are in front of a real site, with a real problem, trying to figure out what’s going on while a customer is (im)patiently waiting.

So while I was coaching teammates, I realized coaching can be very boring and tedious, so I had the idea of trying a different kind of training: I called it The Devils Saga.

It’s part game, part simulation. The goal is simple: troubleshoot a real problem together. But instead of reviewing steps or reading articles, we do it live, like a real support session.

How the Game Works

The teammate shares their screen.
The trainer (me, or someone else) plays the role of a customer.
We start with a vague problem, something a customer might actually say, and we let the teammate drive the investigation.

We use a website that has been set up with a real issue. The “devil,” in this case, is the bug or misbehavior hidden in the site.

The teammate has to ask questions, look around, check logs, deactivate plugins… whatever it takes. If they get stuck, we guide them. If they miss something, we take two steps back and try again. The idea is not to test them and this is VERY IMPORTANT. This is not a trial: the goal is to help them build the instincts that make a good troubleshooter.

It’s also a lot of fun!

To give you an idea, here’s a real example from one of our training sessions:


😈 Case: Mandinga

In South American folklore, Mandinga is the devil who hides in plain sight. He appears charming, maybe even helpful, but once he’s gone, he leaves behind the smell of sulfur.

That’s exactly what this case was about. A site that looked fine on the surface… but something wasn’t right underneath.

It starts wih an email from the “Customer”:

“My site cache is continuously preloading.
It’s a big site with over a thousand posts. Preload runs, but it never stops!”

This is a normal email, with a classic issue. We have the credentials for this website, so we can login to check everything and investigate. After the teammate runs the rist checks, at first glance, everything seems fine. The cron jobs were running. Pages were being cached. But then the cache preload would start again, and again, and again.

So we started digging.

The troubleshooting

When you’re doing technical troubleshooting, there are many different ways to reach the same result, and each teammate will follow a different path. So is very important that the trainer understand this, and allow this process to happen, to respect it and let it flow. Is best to ask questions than to guide directly to the cause of the problem. While this process is happening, by doing some small interactions, a good trainer could make the difference here.

For example, a troubleshooting process might look like this:

  1. First hypotesis (highly accurate): something is clearing the cache repeatedly and that would trigger preload to run again.
  2. Used a helper plugin to log what was happening behind the scenes.
  3. Searched the codebase using a plugin called String Locator to look for anything using save_post or rocket_clean_files.
  4. That led to a specific plugin, which had code that cleared the entire cache on every post update. Because the site had a lot of activity, this was happening constantly, causing preload to run forever.

But, for another teammate the process might be completely different:

  1. First hypotesis: something is preventing the cache to be written (low accuracy)
  2. Starts the basic troubleshoting to discover what’s preventing the cache file creation
    • Trainer asks: what other things can make the preload run non-stop?
  3. second theory something is clearing the cache
  4. recovers and reach the same result

This is was one of those classic Support cases where the root cause wasn’t what the customer described. They saw preload running nonstop. But the real problem was the cache being cleared too often, silently, in the background by a hidden function in a 3rd party plugin. In the steps to troubleshoot this, and in the moment when the teammate does the discovery, finds the exact line of the code with the problematic funcion, is where the magic happens.

So while the 1st process is perfect, and the 2nd one is not, both are fine because the real goal here is the troubleshooting process itself:

  • Hypotesis A
    • confirm or discard
  • Hypotesis B
    • confirm or discard
  • repeat …

Learning how to systematically approach a problem, while connecting all the theory dots we have in our minds, is the key here, because is important to remember that when we start working on a new problem, we don’t know what’s causing it, yes it sounds pretty obvious, but is not.

So, having a consistent troubleshooting process will become our map, and the type of skill you aim for.

What happens next? Incorporating new knowledge thorugh team processes

We are still playing, so after the discovery there might be some formal steps to follow, most support teams will apply internal processes at this point. So it is also key to apply those processes now, because we have to “consolidate” the fresh new knowledge… we can’t miss this opportunity!

  • We have to write a reply for the customer explaining the issue and how to work around it. A good moment to test our writing skills on how to communicate a technical problem to a non-technical user.
  • We might need to get in touch with the 3rd party plugin, how we do that?
  • We also discuss which processes are needed to fully finalize the case internally:
    • Document the plugin conflict?
    • Update our known issues database so others could recognize this “devil” more easily in the future?
    • Ping the team in the appropriate channels?

Why this kind of training works?

I think it works because it feels real.

We’re not just reviewing theory, we’re doing the work.
We’re asking real questions, making mistakes, running into dead ends, getting confused, and then suddenly, finding it: That “aha” moment is what sticks.

It teaches you to slow down, observe, and listen to what the problem is really telling you. And that’s probably the most important skill in support.

It also works for the trainers, because is systematic and predictable. You have a script to follow and you get better at it every time you play it, because you will start learning more and more about this devil and all its details. Each time this game is played, new things will emerge. For example: we tried one of our internal helper plugins, which is supposed to prevent cache clearing. But it didn’t work because the way this function was called

And this unexpected idea, a mistake, leads to another discovery and more knowledge, as usually good mistakes do.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *