In this article, I’m going to show the concept of making an AI Bot that plays Tetris like a real human. It’s not 100% perfect, but it’s quite good.
To simulate a human brain, I used Machine Learning with Convolutional Neural Network.
The game is programmed in Javascript using Phaser 2 framework and Tensorflow.js library. It runs directly in a browser without any issue. You can try it at the end of this presentation.
1. Video Trailer
Let’s get started with watching this video trailer:
2. Getting Data
To train the network, I needed a high-quality dataset of the various board configurations described by images with corresponding labels where:
- the image is a snapshot of the board
- the label is an action that represents the final column placement and rotation of a played piece on that board
How did I get this data?
Well, there is Youtube channel with a lot of videos showing Tetris World Championship matches. There, all competitors play Tetris at the top level producing a small number of errors. Besides, they aim to score the most points by clearing four rows at once all the time.
So these matches are a great resource of high-quality Tetris data! You only have to scrape it somehow. And that was my original idea of getting the training data.
So I made Tetris Data Scraper, a specific tool that collects data from videos of Tetris matches:
Based on the image processing of each video frame, this tool analyzes differences between pixel colors of the previous and current frames.
This way, it recognizes the current board configuration and the position of the currently played tetromino.
Using this tool, I processed 15 Tetris World Championship matches, generating around 50,000 useful records.
3. Augmenting Data
Unfortunately, 50,000 data records were not enough to train the network.
I estimated that I needed at least 1,000,000 records.
So I made the Tetris Data Augmentation tool to artificially expand the dataset by creating modified versions of the original boards.
But before processing, I changed all boards by converting all gaps into occupied fields.
Why?
Well, while testing the model, I found it was more cost-effective to ignore all the gaps than leaving them on the boards.
The image below shows the concept:
Figure 1 shows an original 20×10 board scraped from the video.
Then in figure 2, we see there are two gaps on this board.
Finally, in figure 3, the gaps are converted into occupied fields.
Now, let’s look at the concept of data augmentation. Using the board shown in figure 3, the tool generated these boards:
The upper boards are created by inserting new lines at the bottom.
The lower boards are created by removing lines one by one.
After generating 1,000,000 data records with this tool, I merged all of them into one final 180MB dataset file.
Now, I could train the convolutional neural network.
4. Building CNN Model
The image below shows complete neural network architecture used in this game:
The model is sequential, meaning it consists of a linear stack of layers with no branching.
To compile the model, I used the following parameters:
- Optimizer: Adam with a learning rate of 0.0005
- Loss Function: Categorical Crossentropy
- Evaluation Metric: Accuracy
4.1. The Input
When dealing with CNN, it’s fine to have a square-shaped input.
So the input to the model is a 20×20 black/white image representing a board configuration.
Since the images in the dataset are 20×10, they must be expanded by 5 blank columns on each side.
Here I found a smart way on how to use these extra columns.
Instead of keeping a piece on its initial position in the middle of the first row, it’s better to place each piece on their specific locations within the extra columns.
It seems, CNN better recognizes the different pieces when placed in separate locations.
As the picture speaks more than words, here are examples of inputs for the same board configuration, but with different pieces:
4.2. The Output
The output from the model is one of the 44 possible actions (labels).
And why are there 44 output actions?
Well, each action represents a combination of the final column position and rotation for the piece played on the current board configuration.
So let’s first look at the body structure of each piece. They are built of 4×4 blocks. Besides, each piece has 4 rotations, as shown in the image below:
Now, let’s go back to the dimension of the board. We know, its original size is 20×10.
But to show a piece at the board edges, considering that its body size is 4×4, it is essential to extend the board with hidden rows and columns.
Therefore, we need to use an extended 25×14 board as shown in the examples below:
So to place any piece on its final position, we need 11 columns (marked from 0 to 10).
Since each piece has 4 rotations (marked from 0 to 3), that is a total of 44 actions (11 * 4).
5. Training CNN Model
To get good predictions from CNN, we need to train it through numerous iterations using previously prepared data.
During the training, the neural network was showing gradual progress but nothing spectacular until 40000 iterations. Then it began to place pieces in the correct positions and clear the rows continuously.
After 75000 training iterations, the bot was already playing pretty well.
Yes, of course, it still makes some funny mistakes during gameplay, but it is also able to clear four rows at once and score more than 999,999 points.
It also knows how to get out of some tricky situations most of the time.
However, it is not always so unbeatable and sometimes loses the game very quickly, but generally speaking, it can survive for a long time.
6. Playing Tetris with AI Bot
While playing the game, the AI bot uses a trained CNN to control pieces.
The image below shows the entire process:
The input to CNN is a 20×20 black/white image of the current board configuration.
So at first, the bot captures a snapshot of the whole board (region of interest).
After that, the snapshot is downsampled and converted to a normalized 20×20 array.
That means all colorized (occupied) fields are mapped to value 1, and all black fields (voids) are mapped to value 0.
Here we have an exception: since we decided to ignore gaps in training data, then we also need to ignore them during gameplay. So all black fields that are gaps must be mapped to value 1 (in this example, there is one gap).
Likewise, we must also replace the playing piece from its initial position to its specific location within extra columns. In this example, we see the I-piece replaced from the center to the top left corner.
After feeding CNN with such a prepared 20×20 input, it produces an output prediction. And this prediction is one of the 44 possible actions.
To get the final column position and rotation of the piece from the output prediction, the bot uses these simple calculations:
rotation = (output / 11) = {0, 1, 2, 3}
column = (output % 11) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
7. Live Demo
And here is the game!
Press the Load button to load a pre-trained model and enjoy the magic of artificial intelligence (although it’s not 100% perfect).
Please note that the Train button is disabled in the online version, and the dataset is not loaded. It’s because the size of the dataset file is too large to be loaded online (about 180 MB).
So you can’t train the model online. Instead, I already pre-trained it locally.
8. Conclusion
The goal of this project was to create an artificial intelligence (AI) that learns to play Tetris using a convolutional neural network (CNN).
It wasn’t an easy task, and the biggest challenge was how to generate a high-quality dataset to train the network for playing Tetris.
Furthermore, to teach the network as many skills as possible, I also needed a large amount of data.
Now I can say that most of the time, I spent not on programming but on collecting and pre-processing data.
In the end, though, I was able to train CNN with a pretty good dataset.
However, there is still no guarantee that the predictions from such a trained model will always be 100% accurate.
Due to the variability of the real world, the network doesn’t know how to handle all situations. It then makes stupid mistakes that ultimately lead to the loss of the game.
Anyhow, the predictions in this game are generally 90% accurate and quite acceptable.
To conclude, with a large and high-quality dataset, we can teach a convolutional neural network to play Tetris pretty well.
9. Source Code
The source code will be available on Github as soon as possible.
Currently, it is not so clear and readable to be ready for download.
So stay tuned, and don’t forget to like, share and subscribe.
Hi Srdjan,
I saw this on Reddit couple days ago. You’ve put so much work into this! It looks fun. Any particular reason you didn’t use RL? It would be more generalizable to other problems.
Is RL much harder to implement?
It’s also interesting that you’re using JavaScript and Tensorflow.
I’m not sure what graphics library you’re using. Are you familiar with three.js?
I always thought someone should create a fun browser based, webGL based, graphics calculator that does neat things and is somewhat programmable.
Children these days could really use cool tools for learning math. And perhaps Tensorflow.js also fits in there somehow.
My brother is also interested in games and (simple) AI. I’ll be sure to tell him about you. Do you use regular TF too? Do you use other languages besides JS?
Thanks a lot. Cool stuff!
Thanks for your comment. Nice to hear someone who is interested in this project.
A lot of people ask me the same question why I didn’t use RL. The answer is, I wanted to try a slightly different approach by using CNN and an already prepared high-quality dataset. As there is no such data set, I had to generate it by myself. Then I found this Youtube channel with Tetris matches and got the idea to scrape data from them.
Regarding the graphics library, I used the old Phaser 2 Framework.
Currently, I’m not using regular TF. My focus is on exploring machine learning in browser environment so I’m using Javascript and Tensorflow.js.
Otherwise, I have been programming in many languages throughout my history.
Hi Srdjan,
Thank you for the article, its well structured, easy to follow and really useful as I’m building a RL tetris agent myself. I have a question – how did you decide this particular neural network architecture? Could you possibly steer me towards more information about it? (I’m quite new in this field)
After a lot of experimentation and testing with different CNN architectures, I found that this one gives the best results. So I chose it.
Hi Srdjan,
Really great workout. Me and my son are trying to do the same, thinking of having one more output (hold piece), which requires one more input, too (next piece).
However, since we want to use a CNN, we are facing the same problem: where to get data. Since writing our own scraper would delay the project pretty long, we were thinking of reading in the gameplay from my son in his own written tetris. Still, data from these professionels would be much better by means of data quality. Thus, we wanted to ask if it is possible to share your 180MB data with us? I know, you were having a hard time creating it, so it really would be much appreciated.
Again, thanks for the great descriptions of your project
Ilyas and Sebastian
At first, thanks a lot for your comment.
Next, I think a good dataset is the heart of Machine Learning projects. And for now, but unfortunately for you, I decided to not share this dataset. At least not yet.
Here is why:
You are the first one who noticed how much time I invested to get it. I think 70-80% of this project was collecting and preparing data, and remaining 20-30% was programming a classic tetris gameplay with classic CNN implementation for image recognition.
Just to list some steps of collecting data in this way:
1. programming video data scraper: analyzing video frame by frame, image processing, recognizing played moves
2. processing 15 videos with this tool (about 30 minutes per video)
3. clearing data to ignore all invalid or bad records
4. augmenting collected data to increase initial dataset from 50,000 to 1,000,000 records
On the other hand, many others asked me why I didn’t make a bot that learns by playing Tetris itself using RNN. But I found it is not an easy task, besides I don’t know how to teach it to clear 4 rows at once like these professionals.
Also, I tried to get data as you by playing my own Tetris game, but I was playing awful compared to these professionals. And after an hour of playing I only got a few useful data records.
So I hope you understand me. I also hope, you will manage to make your own Tetris Bot.
Good luck!
Hi Srdjan!
First, you did an excellent study, with an excellent description of how the training was conducted and what it took.
I am new to programming and your project inspire me to learn neural network, I would like to repeat something similar, can I ask you to share the source?
Even if he is dirty it does not matter.
Thank you
Thanks for your comment.
I was thinking about sharing the code, but currently I’m not ready yet. To be honest, I was expecting much more interest in this project, so I lost motivation to go further with it including sharing the source code. Anyway, I hope to share the code one day.
Regarding the implementation of the neural network used in this project, it’s a classic convolutional neural network for recognizing images, but adapted for recognizing Tetris boards (which I scraped from the professional Tetris matches). So any code that implements CNN will be good for you to learn about it.
Hi, is there a tutorial to make this step by step ? anyway i wonder how to implement it directly to the game in our pc like Tetris/snake that have existed before ?
Dude, this is awesome.
I would like to replicate it, I understand the concept, but implementation is a black box for me. Can you provide a technical tutorial from start to end, with the code and the data. I am also interested in how you acquired the data.