How I learned to deploy ML models to production

2 minute read

Running a service for chessvision.ai required having a Machine Learning model deployed in a production environment so that it can make predictions in real time upon user’s request.

How I learned to deploy ML models to production

A few days before launching the chessvision.ai for Chrome I had almost everything covered, except one thing:

How to deploy my machine learning models for piece classification in a production environment?

I asked friends, then asked for recommendations on Reddit, but no one had a clear answer as most people either used ML only in non-production environment or deployed it in large companies that have internal structures for this purpose. Some were recommending Amazon SageMaker which is apparently good but the minute I tried it I knew it’s just not for me. For some reason, I avoid AWS services whenever possible as I don’t feel comfortable using them and all the configuration is often overwhelming. I knew that choosing the deployment environment it’s an important decision so I focused on that.

After a bit of research, I realized that I’m quite familiar with Google Cloud from other projects I was involved in, so I explored more what it offers and it turned out that if you don’t need GPU to run your models, then it’s very easy and pleasant to deploy them as a Flask applications on Google App Engine with flexible environment - basically you deploy a Dockerized application and specify resources and configuration. The whole process is not overwhelming and Google Cloud has an amazing dashboard and CLI so it’s easy to have control over everything. One very important thing to remember is to set up billing alerts - do it for all your apps deployed there, or you’ll learn the hard way - and it’s not only my opinion apparently