People sometimes ask how did I get the idea for a browser extension that recognizes chess positions and how the algorithms work.
Here’s the story.
I started playing chess in late 2015 and from the beginning, I got addicted to watching chess videos on YouTube. Often when watching a video, I was wondering “what if I do this move instead” and there was no easy way to find out what the engine thinks about it - then the idea was born - let’s build an algorithm that can find chessboards on any website in any form, YouTube in this case, recognize the position on the board and show what the engine thinks.
But I abandoned the idea - fortunately only temporarily. The reason was another career opportunity I had around that time. But in summer 2018, I was free and searching for new challenging projects. The chessboard recognition task was perfect - relatively easy to start with, but challenging to do accurately and bring real value to people.
Given an image that most likely contains a 2d chessboard, the task is to recognize the chess position on this chessboard, if any.
For example, let’s say the image is the following:
and the expected result is the position on the board, i.e. what is on every square of the board, and then later, covert it to the FEN format:
The above image is just a simple example, often the situation is more complicated. For example, there can be multiple chessboards, or the chessboard contains some artifacts like arrows, highlight, weird style pieces, or the chessboard is tilted or skewed, and more.
There were some existing solutions for that, but none of them was as accurate as I’d like it to be so I decided to go for it.
The task naturally decomposed into two separated problems:
- Detect position of the chessboard in the image
- Recognize chess piece on each square of the found chessboard
Both problems are interesting, but the real mystery was figuring out the algorithm for the first problem because for the second problem, once a proper dataset is built, as a well trained convolutional neural network will solve it.
1. Detecting chessboard position in the image
Most of the other existing methods for this task use line detection combined with merging detected lines in a grid. There are also a few methods that rely on pattern matching but none of them was production-ready. I didn’t find those methods accurate enough for my needs and also wanted to build something less restricted and more general and that’s how it all started.
My solution is a combination of pattern detection with techniques related to graph algorithms. It tries to build a “skeleton” of a chessboard and once it’s confident enough, it returns the position of the chessboard in the image. It is also flexible in the sense that it takes an expected size of the grid, usually 8x8, as an input parameter so it can detect checkerboard-like grids of any given size.
Here are some results of the final method applied to various images:
You can find more examples here: chessboards detection results
I really hope to find time to publish a paper about it as it’s an accurate and efficient algorithm to find chess-grids in images and uses new methods in this field.
2. Classification of individual squares of the chessboards
At first, it seems a straightforward task not much different than MNIST classification, but the use case is crucial here. In order to get the whole standard chessboard right, we have to classify all 64 squares correctly. Assuming that some classifier on average gets 0.99 accuracy score then roughly we’d get 0.5 chance to get the whole board correct as 0.99^64 ~ 0.52. So the goal is actually to build a classifier that has a significantly better accuracy score than 0.99, for example, 0.999 would be a good result.
Critical thing is to build a dataset that matches the distribution of the real data that the algorithm will be running on in the production environment.
First, I decoupled the task of classifying the content into two separated tasks:
- piece type classification, 6 classes + empty square
- color classification of a piece, either White or Black
After that, I had to collect a lot of data that mimics the expected distribution of future input quite well. This part was critical. I started with an automatic generation of labeled data (several piece themes and several board themes combined). After that, I trained the first version of a convolutional neural network on a fairly simple architecture. I used that network to extend the dataset with real, non-generated data, by gathering unlabeled data, labeling it using the network, and fix mislabeled examples by hand. I followed it up with training the network with that extended huge dataset and a bit deeper architecture. The final step was to see what kind of specific inputs the network performs poorly and fix that with even more data corresponding to those mistakes.
Next, it was time to put it into a real application that people can use. The requirement was that a user can run in on basically any website, so an obvious product was a browser extension. I hacked a simple Chrome extension and posted a video screencast on Reddit to get the idea if people like it:
It got really positive feedback so I decided to build a real chessvision.ai Chrome extension and publish it. More about it in the next post…