Training your AI in the Matrix

It’s said that data is the precious oil of the twenty-first century, meaning only huge monopolies like Google and Facebook could ever dream of creating slick artificial intelligence. When it comes to user behavior, this is true enough; I’ve slogged through billions of user events to create some pretty sweet adaptive features. When you really count ’em up though, the vast majority of interesting AI can be created by putting it through it’s paces in The Matrix! In this article I’ll cover the philosophy at a high level and later delve into details, and for those looking to solve similar problems, steal my code as you wish.

Can simulation solve my problem?

Welp, the first step in knowing if you need a ton of real-world data or not is by understanding AI at it’s highest level. Simply put, know that having some well-defined input should lead to some well defined output is pretty much enough! A good metaphor is that your machine learning algorithm is a black box “universal function approximator”, learning to map the inputs to the outputs by learning the fundamental pattern that connects them. Next question, can you automate creating a diverse enough set of inputs with roughly correct corresponding outputs? If so, congratulations, you can build your own Mr. Smith. Lastly you’ll need to have a simulation, be it a game, physics engine, or simply procedural rules for something as simple as checkers.

Has to make you wonder if you’re just sample input for a godly AI, eh? Step back a bit more and you’ll see that most subjects outside of 1984-like mass psychology can pretty much be simulated without massive data at hand.

Your synthetic AI in the real world

The huge benefit of training in The Matrix is that you can much more easily factor out bad models and algorithms since you have, in some sense, unlimited data to test against. Factoring out what features or models might be best, the only consideration left is the analogue of how good your input generation is versus what reality will provide. Here, randomness is your best friend and enemy; too much can confuse or over-generalize your model, too little will lead to a narrow-minded AI only aware of it’s tiny Matrix world. Focus on randomizing only the parameters of your input that could actually vary in the real world, and within the thresholds of reality.

Looking toward the future

With the plethora of realistic physics and graphics at our disposal now, it’d seem our future could be rife with bots that interact with the physical world accurately. Incubated in the Matrix and graduating to the real world like real children, but without the awkward puberty part.

Example: Construct a 3D room from images

As a great test to the theory here, why not take a shot at deriving a 3d room from images? Breaking this down really comes down to finding the corners of the main wall across 4 pictures and gluing the points together. Let’s focus on the well-defined problem of finding the 4 corners of the main wall in a picture.

Generating Training Data in the Matrix

With the Unity3D game engine at hand, it wasn’t hard to download a free, photo-realistic bedroom from the Unity store for starters. From there it’s a matter of positioning the camera randomly in the room, taking snapshots and calculating the 2D points of the wall from the camera’s perspective using the game engine. The snapshots become the inputs to the AI, and the corner locations of the wall the output we want the AI to learn. Get the scripts here.

Generated Snapshot with corresponding corner locations. The other 2 corners are given but off-screen so the AI is forced to generalize what a wall is based on geometry.

Out of laziness, the only randomness used here is moving and jittering the camera at different angles a person might take. A more production-ready script might also randomize the lighting, room size, drop random objects all over the place, and hang some Picasso’s on the wall since we know our real life users will be classy.

Designing the Neural Network

When it comes to designing a neural network for your problem, the best advice here is steal steal steal! Even the brilliant people that design these networks don’t know what the heck is going on, it’s largely just taking random stabs in the dark until something works! Don’t think you’re smarter than PHD’s with lots of time on their hands, steal their stuff and adapt it when possible, specifically in the world of neural nets.

What’s interesting is that for most problems you don’t even need to put thought into the model to use since newer tools like AutoML will try a bunch for you and figure out something close-to-optimal with little work on your part. It might be the best kept secret in machine learning. Computer vision is a bit trickier though, so there’s really only a few great models to choose from.

Here we can grab the state of the art RetinaNet which is setup for drawing bounding boxes around objects and labeling them with the correct object class like “dog” or “cat”. For finding the main wall there’s only 1 “class”, so this part was pretty easy. But bounding boxes? No! We need quadrilaterals since the main wall will almost always be sheared, bounding boxes won’t help much here in a 3D reconstruction. With some elbow grease, it’s not terribly difficult to modify all the dependent functions (phew, more than anticipated!) that only worked on boring bounding boxes, and viola, the hack job mods work flawlessly!


Though I’ve only tried a handful of real world images as a sanity check, all of the corner points of the walls were spot on! During training it was interesting to see the classification accuracy improve so quickly since there’s only 1 class of object, but even the 4 points of the quadrilateral during training converged only a bit slower than the vanilla bounding-box based RetinaNet. I’m really impressed with the results and have to admit this is the most fun I’ve had on a keyboard for awhile now. Now go ponder the plethora of problems that have become solvable just in the past year thanks to tools like Unity and RetinaNet!

Centralized Reporting with ELK Stack

There are a lot of third party services out there that provide you with analytics on your app.  This is just about always the best instant solution, but eventually you’re always limited by whatever features they do or don’t provide.  A custom reporting system (or at least a collection of hacks) is always necessary after a certain point and usually gets pretty ugly.  If you have a solid enough dev team, the best solution is to roll your own central reporting using the ELK (ElasticSearch/Logstash/Kibana) stack from the get-go! Continue reading “Centralized Reporting with ELK Stack”