2025 has been a strange year for AI. At the start of the year most LLM agents appeared to be generating mostly slop. There were signs of the code getting better, but it looked like things were still a way off.
Now at the end of the year, it seems like AI capabilities have leapfrogged from “a nice demo” to “something that could replace my entire IDE”.
It’s interesting seeing models evolve at such a quick pace, and it’s been personally fun for me to try and see what else I can get the LLMs to build.
The next item on my list is a “GameBoy Emulator”
First Steps
My initial thought process was that “an emulator is probably already in it’s dataset”. So given the right prompt it could probably output a finished emulator verbatin.
That wouldn’t be too interesting of an experiment.
One easy way to get LLMs to build something new, instead of just spewing something it has in it’s training data (which many SOTA models already avoid) is to get it to do things step by step, instead of all at once.
So the first step is to get it to generate a plan of action, a high level document of how it would build out the emulator.
So I fired Claude Code and asked it to generate a plan for building out a gameboy emulator. The only requirements I gave it were that it would need to run on the web, and should ideally be in typescript (since there are fewer unlikely emulators in typescript)
Implementation plan (phase overview)
oneshadab/gbemu/spec/implementation-plan.md
The initial plan it presented looked quite good, but I felt like it was still not detailed enough.
The next step was to ask it to break down each phase into it’s own document, and tell Claude that I would actually be using another separate agent to do the actual implementation (I would not). This was to make sure it didn’t omit any details in the spec that would be necessary for a full implementation.
After “Mesmerizing…” for a good while it came up with a 9 phase plan to complete the implementation.
Full spec docs (all phases)
oneshadab/gbemu/tree/main/spec
Just to make sure I was not just “Vibe Coding”, I took the time to go through the entire docs, and had to make some minor edits myself.
As a rule, I don’t want to outsource my thinking to Claude. I want it to “write for me”, not “think for me”. And if it does something I don’t understand, I will first have Claude explain it to me, and once I’m confident I understood the approach, I will ask it to actually make the change. Sometimes when I find it made a mistake, I’ll ask it fix the mistake, or just fix it myself.
So once I had made my edits and was confident that the high level plan made sense, it was time to go and start implementing things.
Coding at the speed of thought inference
The initial boilerplate was generated quite fast, but after that Claude seemed to get slower and slower. The first few phases took a few minutes, but the last and ninth one took over half an hour.
(Which I find amusing because that’s what happens to be as well. An AI agent doesn’t seem immune to code bloat either.)
After each phase I asked it to stop and let me review what it had written, to make sure it was staying on track. It looked like it had followed it’s own plan quite well, but starting making small creative changes. At the time I didn’t realize it, but this would come back to bite it (and me) later.
So after about an hour of vibing (and almost running out of tokens), Claude helpfully let me know that it was done and everything should be “100% ready” now.
I decided to use the copy of Tetris I use to test my own emulators and booted it up immediately
Stumbling through trouble
Unfortunately that “100% ready” turned out to mean “not ready at all”, because as soon as I booted the ROM, I was greeted with nothing.

As in nothing happened, no tetris, no error, just a blank screen.
To be honest, I didn’t really expect it to work, so I wasn’t too surprised. Most first runs don’t really work for me either, so I’m happy to see AI keeping that human aspect of things.
So I gave Claude the screenshot and asked it to figure out what could have gone wrong. After “Finagling” for a few minutes Claude figured out what the issue could be and managed to figure it out after 3 rounds of “I know, it’s probably X!“.
After that I still had to do some back and forth with Claude, each conversation going
Claude: “It should be completely fixed now!” Me: “But the screen is still flickering”
Claude: “You’re absolutely right! I’ve fixed that. NOW everything should be fine” Me: “The Joypads still don’t work”
Claude: “You’re absolutely right! The joypad code as completely missing! Now it should definitely work” Me: “Nothing’s moving”
Claude: “You’re absolutely right! I forgot to implement the CPU…”
And so it went.
But eventually Claude got it to a working state and I was able to load and play Tetris!

Turning Red
Honestly going from an empty codebase to Tetris working in an hour or so wasn’t too bad. I think Claude could probably oneshot it if I chose C/C++, since it probably has more to match against that in it’s corpus. Having to use typescript, a language which I can’t imagine a sane person using for an emulator, worked as a nice handicap.
But now that Tetris was working, it was time to load up a few other games and see how well they’d do.

Dr Mario seemed to load well enough. So did Pacman. So it was time to start with something a bit more difficult, something I might actually play. It was time to try out Pokemon.
I loaded up Pokemon Red, and the initial title screen looked promising (although a bit strange)

But that soon feel apart as the menu finished loading.

Well, that doesn’t look good. But it looked like it was probably a memory mapping issue. So I sent Claude to investigate, hoping it would be a quick fix. But boy was I wrong.
After 20+ attempts to find the bug, Claude started blaming me, trying to tell me that I probably loaded the wrong ROM.
I eventually got annoyed, and asked Codex and Gemini to try to fix it, both of them failing as well (but thankfully not blaming me).
I was almost ready to throw in the towel, and I realized that I never told Claude how I would debug the problem. I just kept asking it to think really hard and figure it out on it’s own.
I realized I needed to give it the same debugging flow that I would use.
Enter the DMG
Luckily there’s already a few ways to test and see if a gameboy emulator implementation is incomplete. The one I like is the DMG ACID test (DMG-01 for Game Boy).
So I loaded it up in the emulator and gave it a run.

and it immediately showed a broken smiley.
A quick primer on the DMG ACID test: Much like the ACID web test, the ROM builds a smiley emoji with different gameboy rendering techniques (tiles, windows, etc). And when a particular part fails to render, it usually points to a particular rendering issue.
It’s a quick way to QA for a complete rendering implementation.
Now coming back to the broken image, it clearly pointed to something being wrong in the emulator implementation. But there’s no way for Claude to know what.
So I took a screenshot of the broken image and a list of what each part is rendered with from the DMG ACID repo, then set Claude to work.
DMG ACID repo
mattcurrie/dmg-acid2
After it spent a little while “Cooking”, it came back with a hypothesis on where the bug could be. So I asked it to explain to me what the bug could be, and once the explanation made sense I asked it to fix it.
After the fix was done, I reloaded the DMG image in the emulator and…it still had other issues.

Unfortunately I had to go back to Claude with broken smiley renderings a few more times, before it finally managed to render a working smiley

(I could have probably automated it, but it felt like that would be project in and of itself)
So Claude had finally managed to get the DMG ACID test to pass. But that was just a test, now we needed to go back to the real thing.
Back in Action
I decided to load Pokemon Red one more time, hoping to see another bug and…

it worked!

I even managed to play it for a few minutes, without any hiccups. Claude managed to write a working emulator in a bit over an hour (with a bit of nudging), and it working well.
Thoughts
I know an emulator probably isn’t the most interesting thing to generate (there’s already so many on the internet), but it was a nice test.
I have a feeling it worked really well because a lot of the parts of the emulator was already in it’s corpus. But it still had to translate some things because I doubt it had an exactly similar Typescript Gameboy Emulator written already.
I think Claude, and other LLM code assistants are here to stay. Even though they aren’t perfect, and they tend to not be able to do truly novel things, most of us aren’t really doing novel things either. In fact, using Claude to quickly write things we already know how to do, frees us up to do novel things that we (or Claude) has never done before. To me, that seems like a much more interesting problem to solve.
And next, I’m off to write an emulator for the Dreamcast in Ruby (something Claude definitely does not have in it’s corpus)